Handling being stuck in the 'Flashing cyan' state

The issue: One of our remote devices this week was stuck flashing cyan (i.e. lost connection to the cloud) and non responsive to any cloud interactions and failed to send its weekly SMS (sms are not sent via the cloud).

The concerns:

  • The software watchdog didn’t kick in (it was set to 60 seconds).
  • It was fixed with a simple power restart.
  • Non-cloud code failed to operate (3rd party SIM SMS’s)
  • The device was in Multi-threaded mode

The current challenge:
How do we prevent Cloud connectivity loss from blocking non-cloud code from running and/or to trigger a restart to give the device a chance to recover from being blocked from the cloud?

Initial thought:

  • Run some checking code every few minutes to see if the Cloud is disconnected, if so - restart the device. But I’m not sure this will work, given the watchdog didn’t trigger and other non-cloud services didn’t trigger.

Open to suggestions, ideas if others have built in other ‘error handling’ code.

My current thoughts below, but it isn’t quite there yet.

    int countOfNotConnected = 0;
    int maxCountOfNotConnectedBeforeRestart = 100;
    int timeBetweenCloudConnectionChecks = 30; //in seconds
    int timeOfLastCloudConnectionCheck = 0;

void cloudConnectionCheck()
    {
        if ((timeOfLastCloudConnectionCheck + timeBetweenCloudConnectionChecks) < Time.now())
        {
            if(Particle.connected() == false)
            {
                countOfNotConnected = countOfNotConnected + 1;
            }
            else
            {
                countOfNotConnected = 0;
            }
            timeOfLastCloudConnectionCheck = Time.now();
        }
    
        if (countOfNotConnected >= maxCountOfNotConnectedBeforeRestart)
        {
            countOfNotConnected = 0;
            System.reset();
        }
    
    }

That is a common issue with the watchdog as it will always reset when your code drop out of loop() or calls Particle.process().
It would only trigger if your code had trapped the flow in some kind of loop that doesn't call Particle.process()

We have addressed this issue with Particle already, but the current remedy for such cases seems to be an external watchdog that pulls the RESET line.
https://github.com/particle-iot/firmware/issues/1382
Let's hope 0.8.0-rc.3 will bring a viable solution.

2 Likes

I second that as the only current option to reliably reset the device under your condition.

I’d limit that statemen with the term “currently” :wink:

Does that mean we should expect the built-in watchdog to work in the near future or do you think the new product line will solve this first?

If you look at the issue I linked you’ll see that it’s currently slated for 0.8.0-rc.3

1 Like

I just looked! That is GREAT News!

image

1 Like

Excellent!

In the mean time, any suggestions for how to trigger this state again and test non cloud sms functions?

I.e. Is there a way I can test by blocking cloud access without using disconnect or removing the antenna? That way I can test the full scenario of a blocked cloud state?

Was your device still reporting Cellular.ready() == true?
It's not easy to trigger a cloud disconnect without a simultaneous cellular disconnect.
But since you have a 3rd party SIM I'd assume a relatively short keep alive periode, so when you set Particle.keepAlive() too high and don't actively communicate with the cloud you should lose the cloud connection after a while but keep the cellular connection intact.

1 Like

I’ll give it a shot and see what happens :slight_smile:
Thanks!