Cellular Stopped (Breathing White)

After happily running a simple 60 second call-in script for 24 (ish) hours, I returned to find the Electron had stopped calling in for maybe 18 hours. Battery level is fine, data is not paused. Repowering brings it back to life. These kinds of incidents are very worrying when the end product is intended to be used in remote locations and just have to be relied upon.

There are several other threads on Electrons just stopping but all have slightly different symptoms, so I decided to start a new thread.

Two questions:

Is anyone aware of an actual bug that can cause this?

Does anyone know of a watchdog logic that can be used when the device is asleep (stop, deep or off). I think I read that deep sleep actually reboots the whole Electron (is that right?), that would, hopefully correct anything that had gone off the rails. But the cost is a lot of extra data usage reconnecting to the network.

Hey there JethroNull!

When you say “Calling in”, what do you mean? A Particle.publish event? A webhook? Am I correct in parsing that the ping was every 60 seconds?

Can you share the sketch that you were using? With respect to sleep logic, @BDub has done most of the work related to deep sleep on the Electron.

@will, @BDub, "Calling-in" means to me, the whole waking the modem, connecting and passing a Particle.publish event. Yeah, pinging (that whole process) every minute.

Sketch below:

// Version 2 with ScruffR's help to optimize.

const int voltOut = A5;
const int ldrPin = A0;
const int ledPin = 7; // This is your internal LED
const int led = D0; // This is where your external LED is plugged in. The other side goes to a resistor connected to GND.

int txsess, rxsess, txtot, rxtot;

CellularData data;
FuelGauge fuel;

void setup() {
pinMode(voltOut, OUTPUT); //supply 3.3v at pin A5
pinMode(ldrPin, INPUT); //assign A0 as an input
pinMode(ledPin, OUTPUT);
digitalWrite(voltOut, HIGH); //initialize A5 at 3.3v
pinMode(led,OUTPUT); // Our LED pin is output (lighting up the LED)
Cellular.resetDataUsage();
delay(4000);
}

void loop() {

if (!Particle.connected()) return;
digitalWrite(ledPin,HIGH);
int value = (analogRead(A0) / 10); 
double voltage     = fuel.getVCell();
double SoC         = fuel.getSoC();
CellularSignal sig = Cellular.RSSI();
double rssi        = sig.rssi;
double qual        = sig.qual;
if (Cellular.getDataUsage(data)) 
{
    txsess = data.tx_session;
    rxsess = data.rx_session;
    txtot = data.tx_total;
    rxtot = data.rx_total;
}
else
{
    txsess = 2;
    rxsess = 2;
    txtot = 2;
    rxtot = 2;
}
//Particle.publish("GS Exp.", String::format("%4d, %4d, %4d, %4d, %.1fV, %.1f%%, %.0f, %.0f", txsess, rxsess, txtot, rxtot, voltage, SoC, rssi, qual), PRIVATE);
Particle.publish("GS_Exp4", String(voltage), 60, PRIVATE);
data.tx_session = 0;
data.rx_session = 0;
//data.tx_total = 0;
//data.rx_total = 0;
Cellular.setDataUsage(data);
digitalWrite(ledPin,LOW);
//Cellular.off();
//System.sleep(SLEEP_MODE_DEEP, 600); //sleep for 10 minutes
//System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY);
System.sleep(D0, RISING, 60);

}

I don’t see the part of the code in there where you are shutting down the modem and restarting it…what I’m thinking is that unless you are maintaining a PDP context between those sessions, connecting and disconnecting from a tower every minute may be considered abusive cellular behavior on the network, and you may be being blocked by the cellular tower.

One way to find out for sure is by capturing the logs created by the Electron’s modem during startup. @BDub is also a good individual to chime in here.

Getting blocked by the network, seems plausible. Well I had been trying to use SLEEP_NETWORK_STANDBY, but that seems to be buggy (see Can’t Get CellularData to Work thread). Without that I am not sure whether System.sleep retains context or not.

This does reset the system. When 0.6.x is released for the Electron, it will cost about 135 bytes to resume the session after deep sleep :smile: You will have to use SLEEP_NETWORK_STANDBY as well to keep the modem's PDP context active.

System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY); should work well for what you are currently trying to do. I'd want to debug why that's not working.

You can create your own simple watchdog that checks for being disconnected for too long, and then software reset the system. It's potentially not a good idea to reset too often, so I've set it here at 30 minutes. It should be higher than the longest typical time to connect which is 5 minutes. There should also be an incremental backoff time involved here as well.

// will require multi-threading
static uint32_t disconnectedTime = millis();
if ( !Particle.connected() ) {
    if ( millis() - disconnectedTime > (30*60*1000) ) {
        System.reset(); // reset the MCU if disconnected for 30 minutes
    }
}
else {
    disconnectedTime = millis(); // we are connected, reset the timer
}
1 Like

I’d already figured that System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY); should be the best bet and 0.6.x sounds even better. Any ETA? But, yeah, SLEEP_NETWORK_STANDBY still behaving weird for me. Let me know if I can do anything to help you debug that weirdness. Or if it’s something dumb I’m doing, don’t spare my blushes, I’m a hardware guy :wink:

What do I need to do to keep the PDP context? Thanks for the simple watchdog. That would only work for the cell modem but I guess I can keep a wider scope watchdog too for the whole thing.

I forgot to mention the above code should be used with multi-threading on, or user code will be blocked when disconnected and reconnecting.

It also slipped my mind that we have this already as an API here :smile:
https://docs.particle.io/reference/firmware/electron/#application-watchdog

So

if (Particle.connected()) {
    wd.checkin(); // resets the AWDT count
}

If you just leave the modem powered during sleep, it will automatically take care of it as long as you don't sleep for too long. Definitely less than 1 hour, but also possibly less than 23 minutes.

This is what we are currently wrapping up and testing now.

What are the symptoms?

OK, I’ll play with the 23min-1hour timing and see what we get.

I’ll checkout the watchdog api.

For a pretty complete run down of the problems we have with SLEEP_NETWORK_STANDBY take a look at

Can't Get CellularData to Work about message 18 onward.

It started out to be about not getting CellularData (usage). @ScruffR pointed me to use SLEEP_NETWORK_STANDBY, which did make the data usage stuff work but added quite a few wrinkles. Sorry it’s a long thread but there is a lot of detail there.

@BDub, I’ve been trying System.sleep(D0, RISING, 1800, SLEEP_NETWORK_STANDBY);

30 mins is a long time to wait for results so this comment is iffy, but it looks like I’m only getting every 2nd or 3rd (or maybe less) publish events getting through. The Electron wakes for a while, pulses cyan, then goes back to sleep.

Is the 23 min PDP context life thing affecting the Electron even if the modem is not actually turned off?

Yes, try sleeping for 1320 seconds (22 minutes) and see if that helps. After 23 minutes the server-to-device connection times out, so publish acknowledges will not be received. They may still get through though, but it sounds like for you they are not.

1 Like

OK, looking good so far. What would be the optimum way to publish, say, every hour, or less? To save power and keep data costs down?

@BDub Suddenly DRASTICALLY reduced data usage. At the same time, the “Connected to Host” (or whatever it said) messages every few calls has gone. Did you do something marvelous with your end?

It kind of looks like you are seeing the advantage of not having to re-handshake with the server due to cellular network timeouts. If you get a timeout, and try to publish… it will fail and ultimately force a full handshake. By keeping the network alive (ping every 23 minutes or less) you can publish without having to handshake all over again. If you sleep in stop mode for 20 minutes, wake up and send a dummy publish of 1 character, that will be pretty close to the same amount of data in a keep alive ping. I don’t believe there is an exposed way to send the ping… so you might as well send a dummy publish for the moment. Then you can go back to sleep/stop. Do that 3 times and on the third time send your real data. That’s how I’d sleep for 1 hour currently with the lowest data usage. BTW, love you charts :wink:

Hey @BDub. That all makes sense, except that that chart was with a 20min publish rate (before and after the drop in data usage). The bigger data publish events (probably full handshake events) came back a few times since the last screen grab as you can see below. I thought maybe the network was forcing handshakes due to poor signal strength, but you’ll also see that the signal strength changes very little and I can’t see any correlation.

Obviously there is a HUGE difference in data usage. We really need to keep it in the minimal data usage zone. Can you think of another experiment we can do to figure this out?

BTW, the charts are Grovestreams. Once I get this little wrinkle resolved I’m going to create a How-To for Particle-To-Grovestreams. They make a great pair.