Particle.connect() blocking main loop permanently, even with SYSTEM_THREAD(ENABLED)

I have a particle photon, and my device eventually goes offline after a few days, and the only way to recover is to reboot. The main loop is blocked when this occurs, although software timers still seem to work.

I added some logic to try to reconnect to wifi and/or the particle cloud, along with some verbose serial logging to figure out exactly where it was blocking. Here are the relevant bits of my code:

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

void setup() {
  WiFi.selectAntenna(ANT_AUTO);
  Particle.connect();
}

void loop() {
  Serial.println("Loop-de-loop");

  if (!Particle.connected()) {
    Serial.println("Oh Noes, Detached Connection!");

    if (!WiFi.ready()) {
      Serial.println("No wifi");

      if (WiFi.connecting()) {
        Serial.println("Wifi is connecting");
      } else {
        Serial.println("Connecting to wifi, please stand by");
        WiFi.connect();
      }
    } else {
      Serial.println("We gots wifi");

      Serial.println("Attempting connection");
      Particle.connect();
      Serial.println("Waiting for connection");
      waitFor(Particle.connected, 10000);
      Serial.println("After connect");
    }
  } else {
    Serial.println("We're connected. Life is good.");
  }
}

And here is the serial logging I captured at the point when it disconnected/blocked:

Loop-de-loop
We're connected. Life is good.
Loop-de-loop
We're connected. Life is good.
Loop-de-loop
We're connected. Life is good.
Loop-de-loop
We're connected. Life is good.
Loop-de-loop
We're connected. Life is good.
Loop-de-loop
Oh Noes, Detached Connection!
We gots wifi
Attempting connection

At this point, the photon is blinking green, and never recovers, no matter how long I wait. After I reboot, it immediately connects to the particle cloud with no problem, and stays connected for another few days, until it disconnects and gets blocked again.

Should Particle.connect() be blocking, when the system thread is enabled? Or what is the best to way to make sure this doesn’t happen?

I’m now considering adding my own watchdog implemented as a software timer that just reboots the device if it detects the main loop getting blocked.

Also, I should probably mention this is on 0.6.3 firmware (and on many previous versions as well)

1 Like

I have encountered something similar to this as well. My product stays disconnected 99% of the time to conserve battery life, but every once in a while a connect (green fast blink) will go awry and block user code (which it’s not supposed to do) and lock up the device.

I have noticed the watch dog not catching the hang either.

I wrote up my own little watchdog-esque thing using software timers… but now I’ll have to wait a few days until it happens again to see if it works.

Yes, Particle.connect() was made blocking in SEMI_AUTOMATIC mode as it always meant to be (but in some versions slipped through to not be) that way (and was documented from the beginning). This is also irrespective of SYSTEM_THREAD(ENABLED) since you are calling the function from the application thread, so the application thread will be blocked till the function succeedes or times out.
One way to prematurely interrupt the ongoing connection attempt is to have a timer (SW or HW) which will issue a Particle.disconnect().

You can read the discussion with Particle regarding this change here
https://github.com/particle-iot/firmware/issues/1399
https://github.com/particle-iot/firmware/issues/1449

The reason for that is that the system internally tickles the Application Watchdog which keeps it from timing out. A discussion about that can be found here
https://github.com/particle-iot/firmware/issues/1382

Unfortunately, Particle.connect() is never timing out though.

Also, it was my understanding from one of your other comments (Particle.disconnect does not interrupt Particle.connect) that doing Particle.disconnect() from a software timer wouldn’t work either. Is that no longer the case?

Thanks for the comments!

In that statement I didn’t really say it would or wouldn’t work, I merely stated the fact that when the docs mention calling Particle.disconnect() from an interrupt then the test case with calling it from a software timer isn’t the same thing and hence the statement may still be true until proven wrong in the meaning of the sentence.

From that statement one can’t deduce whether the primary assertion of the OP that it doesn’t work from software timers is either true or false. And I must admit, I haven’t tested it either.

However if Particle.disconnect() actually didn’t do what it’s meant to do, you could still pull the plug by issuing a WiFi.disconnect() or even a WiFi.off(). That should work in any case.

Gotcha, thanks. That’s actually how I have it implemented now, with a software timer and a few system event monitors to move it back through wifi.off -> wifi.on -> wifi.connect -> particle.connect.

I’ll keep watch on the serial logs to see if/when it gets disconnected, and see if my reconnection logic will interrupt Particle.connect(), and hopefully eventually get reconnected.

I still think there’s some bug here somewhere. I shouldn’t have to jump through so many hoops just to keep this thing connected :). Ideally I’d just set it to automatic mode and let 'er rip.

Wait... is that also true of MANUAL??

Nope, not for MANUAL
But WiFi.connect() will be blocking for the most part in any mode.

2 Likes

@BDub

Please add this to the list of things that need to be clearly and unambiguously documented.

Right–that doc is here:

https://docs.particle.io/reference/firmware/photon/#system-modes

Some excerpts:

Semi-automatic mode

The semi-automatic mode will not attempt to connect the device to the Cloud automatically. However once the device is connected to the Cloud (through some user intervention), messages will be processed automatically, as in the automatic mode above.
…
Once the user calls Particle.connect(), the user code will be blocked while the device attempts to negotiate a connection. This connection will block execution of loop() or setup() until either the device connects to the Cloud or an interrupt is fired that calls Particle.disconnect().

1 Like

This document does not hint at the behavior that @ScruffR has described about WiFI.connect():

connect()

Attempts to connect to the Wi-Fi network. If there are no credentials stored, this will enter listening mode (see below for how to avoid this.). If there are credentials stored, this will try the available credentials until connection is successful. When this function returns, the device may not have an IP address on the LAN; use WiFi.ready() to determine the connection status.

// SYNTAX
WiFi.connect();

Since 0.4.5 It's possible to call WiFi.connect() without entering listening mode in the case where no credentials are stored:

// SYNTAX
WiFi.connect(WIFI_CONNECT_SKIP_LISTEN);

If there are no credentials then the call does nothing other than turn on the Wi-Fi module.

This document never says explicitly that Particle.connect() does not block while in MANUAL mode:

Particle.connect()

Particle.connect() connects the device to the Cloud. This will automatically activate the Wi-Fi connection and attempt to connect to the Particle cloud if the device is not already connected to the cloud.

void setup() {}

void loop() {
if (Particle.connected() == false) {
Particle.connect();
}
}

After you call Particle.connect(), your loop will not be called again until the device finishes connecting to the Cloud. Typically, you can expect a delay of approximately one second.

In most cases, you do not need to call Particle.connect(); it is called automatically when the device turns on. Typically you only need to call Particle.connect() after disconnecting with Particle.disconnect() or when you change the system mode.

Manual mode

The "manual" mode puts the device's connectivity completely in the user's control. This means that the user is responsible for both establishing a connection to the Particle Cloud and handling communications with the Cloud by calling Particle.process() on a regular basis.

SYSTEM_MODE(MANUAL);

void setup() {
// This will run automatically
}

void loop() {
if (buttonIsPressed()) {
Particle.connect();
}
if (Particle.connected()) {
Particle.process();
doOtherStuff();
}
}

When using manual mode:

  • The user code will run immediately when the device is powered on.
  • Once the user calls Particle.connect(), the device will attempt to begin the connection process.
  • Once the device is connected to the Cloud (Particle.connected() == true), the user must call Particle.process() regularly to handle incoming messages and keep the connection alive. The more frequently Particle.process() is called, the more responsive the device will be to incoming messages.
  • If Particle.process() is called less frequently than every 20 seconds, the connection with the Cloud will die. It may take a couple of additional calls of Particle.process() for the device to recognize that the connection has been lost.

This is heavily misleading:

Under System Threading Behavior,

System modes SEMI_AUTOMATIC and MANUAL behave identically

which is not true, except in this narrow aspect:

both of these modes do not not start the Networking or a Cloud connection automatically

There is nothing that brings the material together and presents it in a cohesive way. There are bits and pieces of information scattered throughout the API reference, but even if you collect all the information in there together, there are enough gaps that you will be led astray even if you're reading carefully.

This is an API reference, but the "Reference Manual" is missing. There is no Theory of Operation discussion or examples showing recommended ways to handle common scenarios-- leaving new users (including experienced engineers) to trial-and-error their way through building an application.

I’m experiencing similar kinds of things with the Electron in areas of poor signal reception. I am however in AUTOMATIC mode. I do call Particle.connect() in code called by loop() if Particle.connected returns false I am also in SYSTEM_THREAD(ENABLED) operation.

I understand that Particle.connect() is blocking in SEMI-AUTOMATIC, but is the same true for AUTOMATIC or does it just set the flag for reconnection later?

I have reset timeouts set for cloud connectivity being off for 20 minutes, but they would only get triggered in loop so my code has a chance to finish what it’s doing, but if Particle.connect() was blocking I could see how that wouldn’t reset anything, though I’d tentatively expect the ApplicationWatchdog to trigger.

@JesusFreke - have you discovered anything further in this?

I haven’t discovered anything further, but after I added my software timer-based timeout for Particle.connect(), which calls Wifi.off(), I haven’t experienced a hang yet. Although, I’m not sure if the timeout has actually been triggered yet. I had to disconnect the serial monitor due to some other development I’m doing.

1 Like

Interesting thread…

Like others here my code runs with SYSTEM_THREAD(ENABLED), and on both Electron & Photon. This mode means the product will function regardless of WiFi credentials existing or a network connection being present but this still leaves the fringe case of a poor signal causing multiple disconnect/reconnects and because of this interfering with the operation of loop() and if I understand correctly by inference anything in loop that uses millis().

This potentially means (for example) a pump stays running flooding a greenhouse, the keypad/display stops working etc. What could be nice if when operating in threaded automatic/semi-automatic mode a callback was made to the application thread before such a reconnect attempt was made, this could give the application chance to notify the user, set that pump to a safe state or other such niceties.

I’ve read quite a few bits and pieces on the forum today but with the changes made over time its not always easy to establish whats current and whats not. I would appear that there are several functions (e.g Cellular.connected) that currently do not quite behave as one might expect making crafting solutions to problems like this … interesting. I tould be nice to have an official blog post/example or just some more detail in the reference.

4 Likes

The partial solution to the problem as you’ve stated is to leave all networking related calls in the context of loop(), and then to make a separate thread or two to manage any tasks that require realtime responsiveness. I wouldn’t necessarily expect your above problems to be addressed in the System firmware anytime, since there are some cases where that is probably preferred behavior (generally is simpler to use if you are OK with blocking your primary thread).

While I personally have had a number of issues with connectivity and freezeups, my IO thread for Serial1 or CAN input has worked flawlessly up until the very moment I trigger a restart on the device. Same goes for my watchdog / reset management thread. You can pretty easily have the main thread be networking only, and move all other tasks to a secondary thread. Obviously, make sure you are using libraries and such that are threadsafe. As an example, the MQTT libraries or anything that uses TCP are NOT fully threadsafe, and must be in the main loop() thread. Anything that smells like networking probably should stay in the main thread, but for example I have my IO and soon my SD card operations running on independent threads.

See rickkas7’s tutorial for more details on threads that aren’t otherwise particularly documented yet

2 Likes

fwiw, for my specific issue, I haven’t had any problems after I implemented a timeout for Particle.connect(), which resets wifi. e.g.

Timer watchdogTimer(CONNECT_TIMEOUT, doTimeout, true);
bool reconnecting = false;

void loop() {
  if (!Particle.connected()) {
    if (!WiFi.ready()) {
      if (!WiFi.connecting()) {
        WiFi.connect();
      }
    } else {
      watchdogTimer.start();
      Particle.connect();
      watchdogTimer.stop();
      waitFor(Particle.connected, 10000);
    }
  }
}

void handleWifiOff(system_event_t event, int param, void *blah) {
  if (reconnecting) {
    WiFi.on();
  }
}

void handleWifiOn(system_event_t event, int param, void *blah) {
  if (reconnecting) {
    WiFi.connect();
  }
}

void handleWifiConnected(system_event_t event, int param, void *blah) {
  if (reconnecting) {
    reconnecting = false;
    System.off(handleWifiOff);
    System.off(handleWifiOn);
    System.off(handleWifiConnected);
  }
}

void doTimeout() {
  System.on(network_status_off, handleWifiOff);
  System.on(network_status_on, handleWifiOn);
  System.on(network_status_connected, handleWifiConnected);
  reconnecting = true;
  WiFi.off();
}
1 Like

Hi there,

I never used watchdogs or interrupts, but am right now facing a similar problem to the one you were having.

My photon works great until there happens some problem with Wifi. Than it tries to reconnect but if that doesn’t work - it looks like it is stuck. After reboot everything is great again.

Could you please give some more information about how your solution works for noobs like me? Didn’t find any „tutorial” for this issue and it would be great if you could evaluate on this:slight_smile:

Do you use SparkIntervalLibrary?

Where in your code do these functions „handleWifiOff” etc. get executed (i don’t see it being called anywhere in setup or loop)?

How long „CONNECT_TIMEOUT” is/should be? I understand that it should be long enough to establish connection?

Is your solution suitable if I want my device to work even if there is no internet connection? I don’t see any System.reset being called.

If providing a more detailed solution or a „working example” instead of pseudo-code, is to much work… please point me in the right dorection since my own research didn’t give right info.

As a first step try SYSTEM_MODE(MANUAL)

Also make sure that your device does not run out of big enough chunks of heap space (e.g. by avoiding to use String).

These instructions (probably also executed during setup()) hook up the respective functions to system events, whenever one of these events occures the system will call the hooked function(s).

2 Likes