Strange lock up observation [v0.4.5/v0.4.4]

Hi all - I haven’t quite managed to work out a replication for this yet but I am posting it anyway to check if anyone has observed the behaviour I am seeing.

We have three photons running in our development product, one of them running 0.4.4 firmware and two of them 0.4.5 firmware as of last week. We pulled in 0.4.5 to get some of the bug fixes, hence our mixed environment.

The behaviour we see is a seemingly random lock up of the main application loop - we can determine this because the device becomes unresponsive and we no longer see a small amount of serial output that the main loop prints. The onboard LED flashes cyan when this happens.

We had previously saw this flashing cyan LED when there was the ‘zombie photon’ issue (I think this was fixed in 0.4.4) but in that case the main application loop was still running, we just couldn’t connect.

We run in MANUAL mode and handle process and connections in a function call from the main loop:

void checkConnectionStatus(){
  if (Spark.connected()) {
    Spark.process();
  }

  if (millis() - oldTime >= 2000) {
    if (retryCount < 10) {
      if (!WiFi.ready()) {
        setDeviceState(DEVICE_WIFI_CONNECTING);

        WiFi.connect();
        retryCount++;
      }
      else if (!Spark.connected()) {
        Spark.connect();
        retryCount++;
      }
    }
    else {
      WiFi.off();
      retryCount = 0;
      WiFi.on();
    }
    oldTime = millis();
  }
}

A note on our environment because if this is connection related, it may be relevant - we have about 30-40 clients on the access point (including the photons) and the signal strength where the photons operate is about 35-45%.

Not fitting your symptom description, but reading your code snippet I thought I’d bring it up anyway.

One thing that used to help is to not retrigger a WiFi.connect() or Particle.connect() while a previous attempt is still running.
I usually do this with a oneShot flag that gets set after the first try and gets reset once the try has succeeded or timed out.
I’m not sure if this was already incorporated into FW, but it used to be an issue.

Another minor thing - you could just return from your function immediately after Particle.process().

But without seeing more of your loop() there might be other causes to your symptoms not related to this function.

Could you add some debug/assert statements to narrow down the place where it locks up?

1 Like

Sorry @ScruffR - been travelling for a few days so only getting back to this now - thanks for your suggestions.

One thing that used to help is to not retrigger a WiFi.connect() or Particle.connect() while a previous attempt is still running.

I've implemented this so I will see how it pans during todays testing.