Cellular Stopped (Breathing White)

JethroNull · June 14, 2016, 5:17pm

After happily running a simple 60 second call-in script for 24 (ish) hours, I returned to find the Electron had stopped calling in for maybe 18 hours. Battery level is fine, data is not paused. Repowering brings it back to life. These kinds of incidents are very worrying when the end product is intended to be used in remote locations and just have to be relied upon.

There are several other threads on Electrons just stopping but all have slightly different symptoms, so I decided to start a new thread.

Two questions:

Is anyone aware of an actual bug that can cause this?

Does anyone know of a watchdog logic that can be used when the device is asleep (stop, deep or off). I think I read that deep sleep actually reboots the whole Electron (is that right?), that would, hopefully correct anything that had gone off the rails. But the cost is a lot of extra data usage reconnecting to the network.

will · June 15, 2016, 5:55pm

Hey there JethroNull!

When you say “Calling in”, what do you mean? A Particle.publish event? A webhook? Am I correct in parsing that the ping was every 60 seconds?

Can you share the sketch that you were using? With respect to sleep logic, @BDub has done most of the work related to deep sleep on the Electron.

JethroNull · June 15, 2016, 6:10pm

@will, @BDub, "Calling-in" means to me, the whole waking the modem, connecting and passing a Particle.publish event. Yeah, pinging (that whole process) every minute.

Sketch below:

// Version 2 with ScruffR's help to optimize.

const int voltOut = A5;
const int ldrPin = A0;
const int ledPin = 7; // This is your internal LED
const int led = D0; // This is where your external LED is plugged in. The other side goes to a resistor connected to GND.

int txsess, rxsess, txtot, rxtot;

CellularData data;
FuelGauge fuel;

void setup() {
pinMode(voltOut, OUTPUT); //supply 3.3v at pin A5
pinMode(ldrPin, INPUT); //assign A0 as an input
pinMode(ledPin, OUTPUT);
digitalWrite(voltOut, HIGH); //initialize A5 at 3.3v
pinMode(led,OUTPUT); // Our LED pin is output (lighting up the LED)
Cellular.resetDataUsage();
delay(4000);
}

void loop() {

if (!Particle.connected()) return;

digitalWrite(ledPin,HIGH);

int value = (analogRead(A0) / 10);

double voltage     = fuel.getVCell();
double SoC         = fuel.getSoC();
CellularSignal sig = Cellular.RSSI();
double rssi        = sig.rssi;
double qual        = sig.qual;

if (Cellular.getDataUsage(data)) 
{
    txsess = data.tx_session;
    rxsess = data.rx_session;
    txtot = data.tx_total;
    rxtot = data.rx_total;
}
else
{
    txsess = 2;
    rxsess = 2;
    txtot = 2;
    rxtot = 2;
}

//Particle.publish("GS Exp.", String::format("%4d, %4d, %4d, %4d, %.1fV, %.1f%%, %.0f, %.0f", txsess, rxsess, txtot, rxtot, voltage, SoC, rssi, qual), PRIVATE);
Particle.publish("GS_Exp4", String(voltage), 60, PRIVATE);

data.tx_session = 0;
data.rx_session = 0;
//data.tx_total = 0;
//data.rx_total = 0;
Cellular.setDataUsage(data);

digitalWrite(ledPin,LOW);

//Cellular.off();
//System.sleep(SLEEP_MODE_DEEP, 600); //sleep for 10 minutes
//System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY);
System.sleep(D0, RISING, 60);

}

will · June 15, 2016, 6:24pm

I don’t see the part of the code in there where you are shutting down the modem and restarting it…what I’m thinking is that unless you are maintaining a PDP context between those sessions, connecting and disconnecting from a tower every minute may be considered abusive cellular behavior on the network, and you may be being blocked by the cellular tower.

One way to find out for sure is by capturing the logs created by the Electron’s modem during startup. @BDub is also a good individual to chime in here.

JethroNull · June 15, 2016, 6:43pm

Getting blocked by the network, seems plausible. Well I had been trying to use SLEEP_NETWORK_STANDBY, but that seems to be buggy (see Can’t Get CellularData to Work thread). Without that I am not sure whether System.sleep retains context or not.

BDub · June 15, 2016, 8:41pm

This does reset the system. When 0.6.x is released for the Electron, it will cost about 135 bytes to resume the session after deep sleep You will have to use SLEEP_NETWORK_STANDBY as well to keep the modem's PDP context active.

System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY); should work well for what you are currently trying to do. I'd want to debug why that's not working.

You can create your own simple watchdog that checks for being disconnected for too long, and then software reset the system. It's potentially not a good idea to reset too often, so I've set it here at 30 minutes. It should be higher than the longest typical time to connect which is 5 minutes. There should also be an incremental backoff time involved here as well.

// will require multi-threading
static uint32_t disconnectedTime = millis();
if ( !Particle.connected() ) {
    if ( millis() - disconnectedTime > (30*60*1000) ) {
        System.reset(); // reset the MCU if disconnected for 30 minutes
    }
}
else {
    disconnectedTime = millis(); // we are connected, reset the timer
}

JethroNull · June 15, 2016, 8:52pm

I’d already figured that System.sleep(D0, RISING, 60, SLEEP_NETWORK_STANDBY); should be the best bet and 0.6.x sounds even better. Any ETA? But, yeah, SLEEP_NETWORK_STANDBY still behaving weird for me. Let me know if I can do anything to help you debug that weirdness. Or if it’s something dumb I’m doing, don’t spare my blushes, I’m a hardware guy

What do I need to do to keep the PDP context? Thanks for the simple watchdog. That would only work for the cell modem but I guess I can keep a wider scope watchdog too for the whole thing.

BDub · June 15, 2016, 9:35pm

I forgot to mention the above code should be used with multi-threading on, or user code will be blocked when disconnected and reconnecting.

It also slipped my mind that we have this already as an API here
https://docs.particle.io/reference/firmware/electron/#application-watchdog

So

if (Particle.connected()) {
    wd.checkin(); // resets the AWDT count
}

If you just leave the modem powered during sleep, it will automatically take care of it as long as you don't sleep for too long. Definitely less than 1 hour, but also possibly less than 23 minutes.

This is what we are currently wrapping up and testing now.

What are the symptoms?

JethroNull · June 15, 2016, 9:45pm

OK, I’ll play with the 23min-1hour timing and see what we get.

I’ll checkout the watchdog api.

For a pretty complete run down of the problems we have with SLEEP_NETWORK_STANDBY take a look at

Can't Get CellularData to Work about message 18 onward.

It started out to be about not getting CellularData (usage). @ScruffR pointed me to use SLEEP_NETWORK_STANDBY, which did make the data usage stuff work but added quite a few wrinkles. Sorry it’s a long thread but there is a lot of detail there.

JethroNull · June 21, 2016, 6:43pm

@BDub, I’ve been trying System.sleep(D0, RISING, 1800, SLEEP_NETWORK_STANDBY);

30 mins is a long time to wait for results so this comment is iffy, but it looks like I’m only getting every 2nd or 3rd (or maybe less) publish events getting through. The Electron wakes for a while, pulses cyan, then goes back to sleep.

Is the 23 min PDP context life thing affecting the Electron even if the modem is not actually turned off?

BDub · June 21, 2016, 10:34pm

Yes, try sleeping for 1320 seconds (22 minutes) and see if that helps. After 23 minutes the server-to-device connection times out, so publish acknowledges will not be received. They may still get through though, but it sounds like for you they are not.

JethroNull · June 22, 2016, 5:50pm

OK, looking good so far. What would be the optimum way to publish, say, every hour, or less? To save power and keep data costs down?

JethroNull · June 24, 2016, 7:31pm

@BDub Suddenly DRASTICALLY reduced data usage. At the same time, the “Connected to Host” (or whatever it said) messages every few calls has gone. Did you do something marvelous with your end?

BDub · June 26, 2016, 7:32am

It kind of looks like you are seeing the advantage of not having to re-handshake with the server due to cellular network timeouts. If you get a timeout, and try to publish… it will fail and ultimately force a full handshake. By keeping the network alive (ping every 23 minutes or less) you can publish without having to handshake all over again. If you sleep in stop mode for 20 minutes, wake up and send a dummy publish of 1 character, that will be pretty close to the same amount of data in a keep alive ping. I don’t believe there is an exposed way to send the ping… so you might as well send a dummy publish for the moment. Then you can go back to sleep/stop. Do that 3 times and on the third time send your real data. That’s how I’d sleep for 1 hour currently with the lowest data usage. BTW, love you charts

JethroNull · June 26, 2016, 3:55pm

Hey @BDub. That all makes sense, except that that chart was with a 20min publish rate (before and after the drop in data usage). The bigger data publish events (probably full handshake events) came back a few times since the last screen grab as you can see below. I thought maybe the network was forcing handshakes due to poor signal strength, but you’ll also see that the signal strength changes very little and I can’t see any correlation.

Obviously there is a HUGE difference in data usage. We really need to keep it in the minimal data usage zone. Can you think of another experiment we can do to figure this out?

BTW, the charts are Grovestreams. Once I get this little wrinkle resolved I’m going to create a How-To for Particle-To-Grovestreams. They make a great pair.

Topic		Replies	Views
Issues with SLEEP_MODE_DEEP Troubleshooting	12	2183	April 4, 2017
Electron: Entering listening mode (flashing blue) by error Troubleshooting	33	7823	March 6, 2018
Electron connection time after DEEP SLEEP Troubleshooting	3	1430	March 14, 2018
Electron sleep problems, yet again Troubleshooting	39	3897	August 20, 2019
Shutdown Cellular w/ SYSTEM_THREAD(ENABLED) Firmware	2	2176	January 6, 2017

Cellular Stopped (Breathing White)

Related topics