Any way to completely avoid blocking completely when attempting to connect to Particle cloud?

I'm having issues completely eradicating blocking in my specific application. There are some instances where my board loses connection to the cloud and then tries to reconnect. When attempt to reconnect to the cloud, I am witnessing my loop() code block for up to 20 seconds. For my application, I can not have my loop code blocked at all.

I have done some research and have found that I should avoid Particle.publish() if I am not connected to the cloud. I have avoided that but the blocking persists (although, before I removed attempting to publish when unconnected to the cloud, I saw my blocking time at 100+ seconds)

I am using
SYSTEM_THREAD(ENABLED); SYSTEM_MODE(SEMI_AUTOMATIC);

I do not want to include all my code because it's a large file but will include some snippets.
I am calling this function in my setup():

void initParticle(){

  // Synchronize time with Particle cloud
  Particle.syncTime(); 
  
  //Particle function declarations that can be access in Particle Console 
  Particle.function("startSleep", startSleep);
  Particle.function("reboot", rebootBoard);
  Particle.function("hourPressed", hourPressed);
  Particle.function("minutePressed", minutePressed);
  Particle.function("secondPressed", secondPressed);

  //Connect to allow webhook integration response
  Particle.subscribe("hook-response/button_pushed", myHandler, MY_DEVICES);
  Particle.subscribe("hook-response/online_check", myHandler, MY_DEVICES);

  // Connect to the Particle cloud
  Particle.connect();
  delay(1000);
}

My code where I am checking if I am connected and publishing if so, and trying to connect if not:

if (Particle.connected()){
        bool success;
        success = Particle.publish("online_check", data);
        if (!success){
          //If we fail here, we know signalflag should be set high
          Serial.println("Failure to send");
          signalFlag = true;
        }
        else {
          Serial.println("Message sent successfully!");
          //Set an alarm to timeout response
          pingResponseAlarm.setAlarm(timeoutTime);
        }
      }
      else {
        Serial.println("We are not connected to the Particle cloud! The Particle board should reconnect automatically..");
        signalFlag = true; 
        Particle.connect();
      }

If this message times out (10 seconds), this function gets called which disconnects me from the cloud:

void pingResponseTimeout(){
  Serial.println("Here in pingResponseTimeout");
  signalFlag = true; 
  Particle.disconnect();
  loadingDisplayIterationAlarm.unSetAlarm();
}

I am witnessing this blocking take place only when attempting to connect to the cloud. Is there any way I can avoid blocking or some type of workaround?

In order to eliminate blocking of the application loop thread you need to move the Particle.publish() out of loop into a worker thread. You can see how to do it in this library.

You need to move anything that directly accesses the cellular modem out of loop. For example, calling Cellular.RSSI() must be moved to a thread, though it possibly could be the same one used for publishing.

Also you do not need to Particle.connect() in SEMI_AUTOMATIC again when disconnected. You should only call that once to start the first connection. It will automatically reconnect when necessary.

Thanks for responding. I will look into this library and give it a shot.

Should this calling of "Publish" be an issue if I am not publishing anything before checking that my board is connected to the Particle cloud?

Also - in one area of my code I am manually disconnecting from Particle cloud. It is my understanding that in "SEMI-AUTOMATIC" mode, if you manually disconnect from the cloud, the board will not attempt to automatically reconnect. Therefore, I am calling "Particle.connect()" in a different part of my code after I disconnect. The only reason I have this functionality included in my code was because it was taking a while (roughly 30 seconds to a minute) for my board to recognize it doesn't have service and therefore recognizing it is not connected to the cloud anymore. So, I put a manual round-trip ping that checks for service connection and if it times out (10 seconds), it disconnects from the cloud. In that 10 second timeout window, the board does not attempt to publish any messages. Then, next time I try to publish a message, it will recognize it is not connected to the cloud and "Particle.connect()" is called again.

Does this make sense? Is this a poor approach?

I am worried that "Particle.publish()" when not connected to the cloud is not my issue, and instead is due to a caveat case described in the "System functions" documentation:

Asynchronous system functions do not block the application thread, even when the system thread is busy, so these can be used liberally without causing unexpected delays in the application. (Exception: when more than 20 asynchronous system functions are invoked, but not yet serviced by the application thread, the application will block for 5 seconds while attempting to put the function on the system thread queue.)

Synchronous system functions always block the caller until the system has performed the requested operation. These are the synchronous system functions:

WiFi.hasCredentials(), WiFi.setCredentials(), WiFi.clearCredentials()
Particle.function()
Particle.variable()
Particle.subscribe()
Particle.publish()
For example, when the system is busy connecting to Wi-Fi or establishing the cloud connection and the application calls Particle.variable() then the application will be blocked until the system finished connecting to the cloud (or gives up) so that it is free to service the Particle.variable() function call.

Could I be calling 20 synchronous functions? What is considered a synchronous function?

You need to check for Particle.connected() before calling Particle.publish(). If you do not, you will always block when disconnected, for a period from 20 seconds up to 10 minutes.

However, the reason we recommend doing it from a worker thread is there is an unavoidable race condition where the cloud becomes unaccessible between the check and the publish, or the cloud had not realize it was disconnected at the time of the check.

Ignore the text you quoted. Those apply to non-threaded mode only, which you should never use.

I can't tell for sure whether your ping functionality will cause problems from the description, but beware of excess data usage or data operations usage from the ping. Also if your test is causing frequent disconnections from the cloud, that could also cause excessive data usage. The cloud reconnection algorithm will reset the cellular modem after 10 minutes of failing to connect. Frequent manual disconnections will prevent this from working, which may cause the modem to never fully reset and never be able to connect.

Thanks for all this!

So - I'm assuming that "publishVitals" will cause this issue as well? I just realized that there is a part of my code where I am publishing vitals without verifying connection to the cloud.. I just added a check to verify that we are connected to the cloud before calling "publishVitals" and now it seems like it fixed my blocking issue.

It wouldn't hurt to check before calling Particle.publishVitals(). It could block if the system thread is currently blocked. But it does check if cloud connected before trying to publish and returns SYSTEM_ERROR_INVALID_STATE if not connected, so it does behave differently than Particle.publish().

I see. It seems to be working much better now but I am still getting small blocks occasionally of <5 seconds.

The problem is that I am publishing vitals quite frequently (every 10 seconds), and it seems to take my board up to roughly 1-2 minutes before it recognizes that it is not connected to the cloud anymore.

This is why I had to implement a round trip ping with my own defined timeout of 10 seconds to determine if the board is not cloud connected. This seems like somewhat of an issue that publishing when not connected to cloud could cause such a massive blunder (20 seconds to 10 min of blocking) when also it could take the board a substantial amount of time to recognize it is no longer connected to the cloud.

The worker thread solution solves this problem by putting the blocking in the worker thread and allowing the rest of your code to continue execution. That's why we do in all of our professional projects and things like the Tracker Edge and Monitor Edge firmware.

1 Like