Particle stuck connecting to cloud when disconnected from cell

My device firmware config is the following:

  • SYSTEM_MODE(AUTOMATIC);
  • SYSTEM_THREAD(ENABLED);
  • 3rd Party SIM with keepAlive set in connection event
  • Using MQTT for data logging
  • The result of Cellular.RSSI(); is translated to a color on an external RGB LED, where black indicates RSSI > 0 (an error or disconnection condition from the modem)

I consistently get the following series of states on my device after a few hours of proper operation:

Normal Operational State

  • Electron RGB LED = breathing cyan
  • Particle.connected() = true
  • MQTT client.isConnected() = true
  • ext Cell LED = Green (good connection)

Connection Lost, Particle “connected”

  • Electron RGB LED = breathing cyan
  • Particle.connected() = true
  • MQTT client.isConnected() = false
  • ext Cell LED = Black (Cellular.RSSI() returned > 0 )

-----note - no cloud_status_disconnected event thrown between these, at least not that Serial could catch in time

Connection Lost, Particle “connecting”

  • Electron RGB LED = blinking cyan
  • Particle.connected() = true
  • MQTT client.isConnected() = false
  • ext Cell LED = Black (Cellular.RSSI() returned > 0 )

Particle cloud disconnected for 7 minutes, triggering timeout condition in a watchdog thread that resets device, after which it works as normal for a while

My take

I find it strange that it’s possible for Particle.connected() to return true when Cellular.RSSI() would return a value indicating disconnection. Cellular.ready() appears to be returning true most of the time the above behavior is occurring.

My signal strength is good where I’m testing - the device should never disconnect due to signal unavailability.

Any thoughts on what could cause this? I will try and monitor with more reporting on the various network change events to see if I can learn anything more.

edit:
there doesn’t appear to be an event for network_status_disconnected - is there any way to detect the loss of cellular connection?

Also, as a make work, I can use that RSSI value to explicitly disconnect from the Particle Cloud if it hits a value indicating bad connection, and maybe also cycle the modem. Since I’m using automatic mode, if I disconnect from the particle cloud and reconnect, but the connect fails, will the system firmware keep managing the reconnection after that like normal or do I need to just manage the whole connectivity?

You could have a look at this

Thanks for the link. I’ve looked at that code before, and just went through it again - I’m not seeing anything that provides me with information beyond what I already have - I’m in a test bench with Serial logging and hooks attached to all the connection and cloud events and changes in state, so I’m seeing everything unfold. Only thing I haven’t been doing is pinging a specific server, but since I also have my MQTT connection that seems like similar information for me.

I’m currently seeing a scenario where Cellular.ready() is returning true though both the Particle Cloud and MQTT disconnect. Since Cellular.ready() returns true, they just keep trying to reconnect and block for minutes at a time until my timeout condition is finally triggered. On reset, with or without modem reset, it immediately connects with no issues.

Is there any way to help the System firmware re-evaluate the condition underlying Cellular.ready()? Once I’ve hit a certain timeout for my MQTT connection being disconnected, I can take mitigating action, but I’d prefer to avoid resetting the modem completely if I have any other way. Again, Cellular.RSSI() returns a “not connected” value, so the firmware should be trying to reconnect in theory, I’m just not seeing that behavior. I suppose I can keep tabs on the Cellular.RSSI() value and manually call Cellular.disconnect() and then Cellular.connect() if I see it disconnected without Cellular.ready() indicating so. I’m just worried about the underlying cause of the behavior not being addressed.

EDIT: After some testing I’ve realized that all these calls are non-blocking, and that I should be using a waitUntil or set a flag and come back. However, this has led to my realizing that Cellular.disconnect seems to be wonky, so I made a new post for that to keep this on topic. Thus, the primary question here is more or less resolved, since I’ll just turn Cellular Off and back On fully. If for some reason that doesn’t work well, I’ll reopen this. Wish this didn’t happen at all, but I have the mitigation in place.

ORIGINAL REPLY:

Update, I’m able to catch the condition when Cellular.RSSI returns >0, but some strange behavior results. Here is the portion of my code that runs:

if (rssi < 0) {
        if (rssi > -85) cell_color = green;
        else if (rssi > -100) cell_color = yellow;
        else cell_color = red;
    }
    else {
        // we aren't connected at all or modem threw an error
        cell_color = black;


        if (Cellular.ready() == true)
        {
            // this is a problematic state where the system firmware doesn't
            //    realize that the modem is disconnected, so let's manually
            //    reconnect.
            debugPrint(MSG_TYPE_ERROR, "Cellular.RSSI returned error val but Cellular.ready() true");

            if (Particle.connected()) Particle.disconnect();
            if (Particle.connected()) debugPrint(MSG_TYPE_ERROR, "Particle cloud didn't disconnect");
            Cellular.disconnect();
            Cellular.connect();
            Particle.connect();
        }

    }

In Serial, I get the “Cellular.RSSI returned…” message at the expected time. However, I don’t see any events for networking or cloud changes. My events are set up as follow, and do normally trigger, for example when I connect for the first time.

//..............................................................................
//..............................................................................
void cloud_status_handler(system_event_t event, int param)
{
    if (param == cloud_status_connecting)
    {
        debugPrint(MSG_TYPE_DEBUG, "Connecting to Particle Cloud...");
    }
    else if (param == cloud_status_connected)
    {
        // init the keepAlive interval to maintain connection to Particle cloud
        #if Wiring_Cellular
        Particle.keepAlive(keepAliveInterval);
        #endif
        debugPrint(MSG_TYPE_DEBUG, "Connected to Particle Cloud");
    }
    else if (param == cloud_status_disconnecting)
    {
        debugPrint(MSG_TYPE_DEBUG, "Disconnecting from Particle Cloud...");
    }
    else if (param == cloud_status_disconnected)
    {
        debugPrint(MSG_TYPE_DEBUG, "Disconnected from Particle Cloud");
    }
}
//..............................................................................



//..............................................................................
//..............................................................................
void network_status_handler(system_event_t event, int param)
{

    if (param == network_status_connecting)
    {
        debugPrint(MSG_TYPE_DEBUG, "Connecting to network...");
    }
    else if (param == network_status_connected)
    {
        debugPrint(MSG_TYPE_DEBUG, "Connected to network");
    }
    else if (param == network_status_off)
    {
        debugPrint(MSG_TYPE_DEBUG, "Network off");
    }
    else if (param == network_status_on)
    {
        debugPrint(MSG_TYPE_DEBUG, "Network on");
    }
    else if (param == network_status_powering_on)
    {
        debugPrint(MSG_TYPE_DEBUG, "Network powering on...");
    }
    else if (param == network_status_powering_off)
    {
        debugPrint(MSG_TYPE_DEBUG, "Network powering off...");
    }
}
//..............................................................................

I don’t see any event print to Serial after I hit the code block up top, or at any point before. I would expect a “Disconnecting from Particle Cloud” either at that line or before at some point if already disconnected (though the LED stays cyan this whole time). I would also expect that when I call Cellular.disconnect() followed by Cellular.connect() I should get a network_status_connecting event. Am I missing something, or is the System firmware not responding to those function calls? I’m going to try and catch the issue and print out as much info on the current state in that moment, but it will probably take a few hours to reproduce.

edit: and I know that Particle.connect() is too soon after Cellular.connect() to be successful, but since I’m in SYSTEM_MODE(AUTOMATIC) my assumption is that the line serves to notify the System firmware that I want to reconnect again.