Cyan flash of death

I have two photons running in the same location, which regularly suffer the cyan flash of death after about 2 to three days running.

Both units lock up, flashing cyan.

I’m running net connections to particle (publishing one event), to data.sparkfun.com using http , and blynk.
Lucky for me i have a laptop recording diagnostics - this setup is an hours plane ride away.

system.threading is enabled and the particle connection is in automatic mode.

I have wifi fail detection code that looks like this:

if (!Particle.connected())
  {
      Serial.print("Reconnecting to particle ");
      Serial.println(retry);
      if((!WiFi.ready()) or (retry>=5))
      {
        // wifi not connected so lets restart the wifi
        Particle.disconnect(); // disconnect gracefully
        WiFi.off();  // power off wifi chip
        delay(1000); // wait 1 sec
        load_ssids(); // restart wifi
        
      }
      Particle.connect();  //if the Wifi is not connected then try and connect now.  The Send_Any_Data will check if it is conected later and send all it can
      if (Particle.connected)
      {
          Serial.println("Success!");
          retry=0;
      }
      else
      {
          retry++;
          Serial.println("failed");
      }
  }

My debug stream on fail looks like this:

[100242687] Heartbeat timeout
[100247356] Connecting to blynk-cloud.com:8442
Membrane Pressure: 0.78
O2 pressure: -0.236508
Reconnecting to particle 0
Success!
Stop Pump
[100253141] Connecting to blynk-cloud.com:8442
Make_Float_Go_Up
Stop Pump
[100270731] Connecting to blynk-cloud.com:8442
Membrane Pressure: 0.78
O2 pressure: -0.354762
Turn on Pump Fill
[100276518] Connecting to blynk-cloud.com:8442

which suggests an attempt to reconnect to particle, which particle.connected() reports as successful, even though it clearly isn’t, as blynk is repeatedly trying to connect. Shortly after this the device locks up, and it takes a power cycle to recover - I have installed wifi controlled power points for this purpose.

I’ve been trying to track down the issue for weeks. Can anyone shed any light on why these things lock up like this?
I know this has been an issue in the past.

I’m desperate for assistance on this one.

That’s the tricky part with non-complete code:
Where do you call WiFi.on() once you had called WiFi.off()?
How frequently might this snippet be executed in case of connection loss?
What’s happening in load_ssids()?
Is your device accessible otherwise (e.g. Particle.function() or Particle.variable()) or can it Particle.publish() while in this state?
Do you see a “device came online” message in the console?

You may want to waitFor(Particle.connected, 30000) after your Particle.connect() call.
Also AUTOMATIC reconnect might not play well with your own code, you might rather try SEMI_AUTOMATIC.

1 Like

Sorry for the incomplete code, but its (1) too big, (2) under NDA so I have to be a bit careful.
load_ssids() is responsible for initialising the wifi module:

void load_ssids(void)
{
    WiFi.on();
    WiFi.disconnect();
   
    if (!WiFi.clearCredentials()) {
         Serial.println("SSID clear failed");
        return;
    }
    for(unsigned int i = 0; i < sizeof(ssid_creds)/sizeof(ssid_creds[0]); i++) {
        SSIDCredential *p_ssid = &ssid_creds[i];
        WiFi.setCredentials(p_ssid->ssid, p_ssid->password, p_ssid->auth, p_ssid->enc);
    }
    WiFi.connect();
    waitFor(WiFi.ready,60000);  // changed to timeout after 60 seconds if no wifi PM 18-8-16
    delay(1000);
     Serial.println(WiFi.SSID());
     Particle.publish("SSID",WiFi.SSID());
}

There’s no ‘device came online’ in the console when the recovery code executes. Can’t reflash either.
The check for particle connection occurs every 30 seconds. Based on the debug messages, its only executing once when the lockup happens. I know the wifi access point is still functioning as I can still access my remote pc on the same network.
Nothing else in loop() runs in the lockup condition - i’d be getting messages on my debug port otherwise.

Will try your code adjustments tomorrow. Thanks.

@twospoons, resetting and setting wifi credentials is a sure way to wear down the part so you may want to do that only once (or on command) since up to 5 sets of credentials can be stored in the Photon. How many are you trying to set?

As @ScruffR pointed out, you need to have the correct SYSTEM_MODE() configured and I noticed that you execute a Particle.publish() in load_ssids() but it is not preceded by a Particle.connect(). You can replace:

    WiFi.connect();
    waitFor(WiFi.ready,60000);  // changed to timeout after 60 seconds if no wifi PM 18-8-16

with the following which will connect the WiFi and then the Cloud:

    waitFor(Particle.connected,60000);

Of course this assumes that WiFi is on. You may want to disable the credentials settings portion of the code and manually set them once to see if this is part of the problem. Do you have SYSTEM_THREAD() enabled?

1 Like

So I have made changes as suggested. I have a new wifi restart routine that looks like this:

    void restart_wifi(void)
{
    WiFi.off();
    Serial.print("1");
    delay(2000);
     Serial.print("2");
    WiFi.on();
     Serial.print("3");
     WiFi.connect();
      Serial.print("4");
    waitFor(WiFi.ready,60000); // wait for wifi connection (timeout 60seconds)
    if(!WiFi.ready()){
        Serial.print("timeout");
        return;
    }
    Serial.print("5");
    Particle.connect(); // need this in semi-auto mode
     Serial.print("6");
    waitFor(Particle.connected,60000);  // changed to timeout after 60 seconds if no wifi PM 18-8-16
    if(!Particle.connected){
        Serial.print("timeout");
    }
     Serial.println("7");
     Serial.println(WiFi.SSID());
     Particle.publish("SSID",WiFi.SSID());
     Serial.print("8");
    
}

One of the two test units has locked up after 5 days running, the other is still going.
I’ve pinpointed the lockup to this line:

waitFor(WiFi.ready,60000); // wait for wifi connection (timeout 60seconds)

That’s supposed to have a 60 second timeout, right? But that’s where its got stuck.

I have

SYSTEM_THREAD(ENABLED);

SYSTEM_MODE(SEMI_AUTOMATIC);
as suggested, and the wifi restart no longer rewrites the SSIDs.
Wifi coverage is poor where these units are, but I know this code works as I’ve seen it operate correctly without locking when the coverage was worse than usual.
Damn these intermittent failures are hard to debug! Especially when it takes days to check.