Photon randomly disconnects from the Particle Cloud

I don't know if this is a hardware, firmware, Internet or cloud issue, but I have a Photon 2 running pre-production firmware in a development environment that randomly - about once or twice per day - stops communicating with the cloud.

Between such outages, the application runs perfectly for hours on end. When it disconnects, the LED glows a continuous cyan color - no "breathng" - indicating that it is solidly connected to WiFi but not to the Particle cloud. A reboot via a power cycle restores everything to normal operation. Much less frequently. the Photon will completely disconnect from WiFi (flashing green) but it then reconnects quite quickly.

Note that the Photon is running Device OS 5.9.0. Note, also, that the WiFi signal strength is very strong (three mesh WiFi routers). I have another in-production Photon at another site, running the same OS version and similar - but not identical - code, that never exhibits this behavior even though it reports a signal strength of only "fair".

The symptoms surely suggest that the problem might be somewhere between my WiFi network and the Internet, but I have about dozen other devices that connect to the Internet via the same network including 3 computers, TVs and several IoT devices (Ring doorbell, thermostat, garage door openers, security cameras, etc.) and I have never seen this issue with any of them. Nonetheless, I re-booted my ISP's fiber modem and, subsequently, did the same with the closest router - about 20 feet away - while the Photon was off line. No reconnect. That then suggests maybe the code has hung, but I can find nothing that would suggest that is even possible; remember that the code runs for many hours without interruption and the outages appear to be completely random.

Any ideas as to why this might be happening?

The firmware:

// This #include statement was automatically added by the Particle IDE.
#include <NCD2Relay.h>

NCD2Relay controller;

SYSTEM_THREAD(ENABLED);

String data = String(10);
int status;
int GPIOStatus3;
int setZone (String zone);
int PhotonOff = 0;
long int publishInterval = 600; // 10 minutes
long int lastPublish = Time.now();
long int StartWaitTime = Time.now();
long int StartOnesTime;
unsigned long lastReset = System.millis();
bool Restart = true;
bool Disable = false;
String myID = System.deviceID();


//STARTUP(WiFi.selectAntenna(ANT_AUTO)); 

void setup(){
 
//WiFi.setCredentials("MTWA", "Cr3st0n!");
WiFi.setCredentials("Pubbuds", "d0wnunder");

waitUntil(Particle.connected);
controller.setAddress(0,0,0);
lastPublish = Time.now();
lastReset = System.millis();
Particle.function("Disable", DisableAuto);  
Particle.variable("GPIO3", GPIOStatus3);
Particle.variable("Is_Disabled", PhotonOff);
Time.zone(-5);
StartOnesTime = Time.now();
myID = myID.substring(21);
}

// Function to disable or enable transmissions
int DisableAuto(String command){ 
String Command = command;
if (Command.equalsIgnoreCase("yes")){
    Disable = true;
    int (PhotonOff = 1);
    StartWaitTime = Time.now(); // Start disable interval timer
    Particle.publish ("Disabled", "Photon " + myID + " disabled manually");
    return 1;
}

if (Command.equalsIgnoreCase("no")){
    Disable = false;
    int (PhotonOff = 0);
    Particle.publish ("Disabled", "Photon " + myID + " re-enabled manually");
return 1;
}
return -1;
}


void loop(){
  
int(GPIOStatus3 = controller.readInputStatus(3));


if ( Particle.connected() ){
    if ( Restart == true){
        Particle.publish("Restart", "Photon " + myID + " restarted");  // Send status to log
        Restart = false;
    }
}


if ( millis() > lastReset + 3600000 ){ // Reinitialize the MCP23008 chip on the NCD relay board every hour
    controller.setAddress(0,0,0); // Reinitialize the MCP23008 chip on the NCD relay board
    lastReset = System.millis(); // Reset last initialization time to current program time
}
   
   
if ( Time.now() >= lastPublish + publishInterval){ // If it's been 10 minutes since last publish
    int status = controller.readInputStatus(3); // Read input terminal 3
    if ( status == 1){ // Need water
        if ( Time.now() - StartOnesTime >= 9000  & Disable == false){ // If 1's have been sent for more than 2-1/2 hours
             Particle.publish ("Test",myID + " sent ones for " + String((Time.now() - StartOnesTime)/60) + " minutes");
             status = 0;
             Particle.publish("Test", "Photon " + myID + " sent " + String(status));
           //Particle.publish("Wait", "Tank too long"); // Send an email alert
             Disable = true; // Disable all transmissions
             PhotonOff = 1;
             StartWaitTime = Time.now(); // Start disable interval timer
             Particle.publish ("Disabled", "Photon " + myID + " disabled automatically");
             StartOnesTime = Time.now();
             lastPublish = Time.now();
        }
    }else{
        if ( Disable == true){
            Particle.publish ("Disabled", "Photon " + myID + " re-enabled automatically");
        }
        Disable = false;
        StartWaitTime = Time.now(); // Reset wait time
        StartOnesTime = Time.now(); // Reset StartOnesTime
    }

    if ( Disable == false){
       // Particle.publish("Tank", String(status));
        Particle.publish("Test", "Photon " + myID + " sent " + String(status));
        PhotonOff = 0;
        lastPublish = Time.now();
        delay(20);
    }    
}

if ( Time.now() - StartWaitTime >= 43200 & Disable == true){ // If the Photon has been disabled for 12 hours
    Disable = false; // Re-enable transmissions
    PhotonOff = 0;
    StartWaitTime = Time.now();
    StartOnesTime = Time.now(); // If the next transmit interval is 1 rather than a 0
    Particle.publish ("Disabled", "Photon " + myID + " re-enabled automatically");
}

}

Typical Google Sheet log entries:

Hi @blshaw45 -

I am no programmer (not even close) but have you maybe thought of utilizing the watchdog timer to reset the device when a connection to Particle Cloud times out?

Something along these lines (UNTESTED)

#include "Particle.h"

// Set the watchdog timeout period (e.g., 10 seconds)
SYSTEM_THREAD(ENABLED);
ApplicationWatchdog watchdog(10000, System.reset);

void setup() {
    // Initialize your application
    Particle.connect();
}

void loop() {
    // Check if the device is connected to the Particle Cloud
    if (Particle.connected()) {
        // Feed the watchdog to avoid a reset
        watchdog.checkin();
    } else {
        // Optionally log or handle the timeout
        Log.warn("Particle Cloud connection timed out!");
        // If the cloud is not connected within the watchdog period, the device will reset
    }
}

Regards, Friedl.

Very interesting Friedl! I never heard of the watchdog timer. I am not much of a programmer either, but this surely sounds like a viable option to at least determine when the disconnect occurs (I can add a time stamp), even if not an explanation as to why. The only downside is the code forces a reset which may have an affect on how my applicaion works. But even that is better than having the app simply stop. I'm going to give it a try right now and report back.

UPDATE: I added a Particle.publish() command to log the reset time in the Google sheet that I use to log all events. But then I read Particle's docs which say not to do that. Luckily, my existing code automatically logs all restarts anyway, so the objective will be achieved that way.

I added your suggested code verbatim. Now I'll just wait to see what happens.

Hey there, another alternative is the Hardware watchdog.
Same logic as proposed by Friedl.

https://docs.particle.io/reference/device-os/api/watchdog-hardware/#watchdog-hardware

The software watchdog has more risk of getting entangled with user firmware, as opposed to the hardware one, so it might be a better solution.

Also, for your device blocking, are you able to physically connect an usb wire to it to capture any logs?

1 Like

Oops; just saw your suggestion after I installed Friedl's code. I will try to teach myslf how to use the Hardware watchdog.

Yes, I can connect a USB cable (in my test environment), but what would I expect to see in a log? And what would I connect the cable to in order to read a log?

I would connect the device to your computer and run:

particle serial monitor --follow

You should be able to observe the same logs as when the device is running normally.

Or you can add something like this to loop() to have a log coming out every 10 seconds:

    static unsigned long lastPublish = 0;
    if (millis() - lastPublish >= 10000)
    {
        lastPublish = millis();
        Log.info("my program is running");
    }
1 Like

@gusgonnet Oops sorry... was the code for the software WTD? :face_with_hand_over_mouth: Goes to show I guess :laughing:

I recently ran into the same issue with a P2 board design on which I ran very simple code on to cycle through some RGB colors. After couple or hours it would get stuck. There I did in fact implement the hardware WTD (I think, haha) which solved the problem.

@blshaw45 For sure the hardware WTD would be the better option, my apologies, it was my intention to point you in that direction :smile:

Interesting Friedl. I left your software watchdog code in place overnight just to see what, if anything, would happen. My "restarts" log this morning showed that, indeed, the Photon restarted in the middle of the night. There is no other aspect to the code or to the application environment that would cause a restart other than flashing a revised firmware file which, of course, I did not do.

The fact that you, too, have seen a Photon get stuck, even with very simple code, leads credence to the suspicion that the issue is not unique to me, my environment or to my code but may be specific to the Photon in general. If so, it'd be nice if Particle could introduce a fix. Periodic, random resets are not conducive to a reliable application.

But as I mentioned in my initial post, I have another Photon 2 (purchased at the same time as the one I am testing with, so likely from the same production run) in production for well over two months that has never (yet, anyway) exhibited the symptom.

Might you or Gustavo offer sample code for the hardware watchdog?

Damn, Gus, I wish I was a more experienced programmer so I wouldn't have to bother folks here on the forum always asking for help. But then, you all are improving my abilities in that regard with your advice and counsel.

I just tried your suggested sample code to populate a log via USB but it doesn't work. Here's what my Macbook saw, i.e. nothing:

"Polling for available serial device...Opening serial monitor for com port: " /dev/tty.usbmodem1102 "
Serial monitor opened successfully:"

Nothing thereafter.

Thoughts?

oh, it might be that the code is missing the logger definition on top?
Add this right after SYSTEM_MODE() and before setup(), like so:

SYSTEM_MODE(AUTOMATIC);

SerialLogHandler logHandler(LOG_LEVEL_INFO);

void setup()

Let me know if you see logs printing now.

1 Like

Hi @blshaw45

Here is some code I used which I THINK was for the hardware timer, but please, let me reitterate, I am by no means a skilled programmer so if @gusgonnet advises differently, please follow his advice :slight_smile:

/* 
 * Project myProject
 * Author: Your Name
 * Date: 
 * For comprehensive documentation and examples, please visit:
 * https://docs.particle.io/firmware/best-practices/firmware-template/
 */

// Include Particle Device OS APIs
#include "Particle.h"

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(AUTOMATIC);          

SerialLogHandler logHandler(LOG_LEVEL_INFO);

void setup() {

  Serial.begin(115200);
  System.enableFeature(FEATURE_RESET_INFO);


  if (System.resetReason() == RESET_REASON_WATCHDOG) {
      Log.info("RESET by hardware WATCHDOG");
      }

  Watchdog.init(WatchdogConfiguration().timeout(5s));
  Watchdog.start();

}

void loop() {

//  Some application code runs here

  Watchdog.refresh(); 
  delay(100);
}

This code simply restarted and logged the restart if it took the loop longer than 5s to run. This was due the LED blinking I had in the loop that was delayed by blocking delays. As it was only test code I dod not bother using non-blocking delays.

Hope this helps.

Friedl provided code above (thanks @friedl_1977 !), but to close the loop I wanted to say this: let the docs guide you.
It's ok that they look cryptic at first (it took me a while until they started "talking" to me), but this is where I would start looking:

and then:

Cheers,

1 Like

Continuous Cyan color with photon 2 results in “lockup”, and only a power cycle resolves the issue. I have this issue too.. I gave up and switched back to argon, and have no issues

Yes Zac, that's my experience. But I hope someone can explain why. I have one Photon 2 that has been running flawlesly for over two months. But my test unit can't make it through 24 hours without a lockup.

Hey Brian,
there were few photon2 that got out the door with an issue with external antennas configured at manufacturing. You might have this info already (I can't see a reference to it in this topic), so please ignore if you do.
Can you configure the antenna yourself?
example: add to setup() one of these lines:

WiFi.selectAntenna(ANT_INTERNAL); // selects the CHIP antenna
WiFi.selectAntenna(ANT_EXTERNAL); // selects the u.FL antenna

thanks

Hmmm. Both the Photon 2 that is in production reliably and the one in my test environment that locks up randomly have the following command commented out:

//STARTUP(WiFi.selectAntenna(ANT_AUTO));

Should I re-enable that or try ANT_EXTERNAL instead? Note, again, that the production Phone 2 does not exhibit the problem. Only my test unit does. And the LED on the test unit shows a steady cyan after lock-up which implies that it is solidly connected to WiFi, just not the Particle Cloud.

FWIW, the "happy" Photon 2 has serial number P051AF3210433A4. The unhappy one has serial number P051AF323043C78. Fairly close it would seem.

do you have an external antenna on your TEST photon2?
if so, I would add to setup() only on that device for now:

WiFi.selectAntenna(ANT_EXTERNAL)

Yes, it has an external antenna. I'll add that command.

OK Gus, I enabled the ANT_EXTERNAL command and have let it run for over 24 hours. At one point yesterday afternoon the Photon disconnected from WiFi for a couple of hours and would not reconnect until I cycled the power. That has never happened before, i.e. on the prior rare disconnect occasions, it would reconnect on its own relatively quickly. It has (so far) been stable since then.

But here is another mystery: since, as mentioned, I have another Photon 2 running perfectly in a production environment, I decided to flash the problem (test) Photon with the almost-exact same code running on the production device. The only difference in the code between the production and test version is a change in the Particle.publish event name (so to not interfere with the production application). That one-word change required a re-compile, but it would not compile. Below is the list of errors generated. Any idea what that is all about? Note that I did include the NCD2Relay.h library file.

Hey, I'm not able to read, is this the line you edited?