Electrons going permanently offline around midnight after a week using SLEEP_MODE_DEEP


#1

I have had 3 different electrons that have stopped communicating with the particle cloud in the last month, all happen right around midnight. I am using a 15 minute SLEEP_MODE_DEEP on two of them, and on the 3rd was experimenting with a 10 minute sleep using SLEEP_MODE_DEEP with SLEEP_NETWORK_STANDBY. They all had different versions of deviceOS as well, anywhere from 0.7.0 on one, to 1.0.1 and 1.2.1-rc2 on the others. The device with 0.7.0 was up for 2 months just fine, and then it froze the day before one of the other ones froze - very strange coincidence. The two newer deviceOS units typically go about a week before they freeze. Given how infrequent it happens, it’s very hard to know how to troubleshoot. Not to mention the devices are sitting out in the middle of a field. To get them back online I have to disconnect the solar and unplug the battery. They then come right back online with a minute or two.

My code has no time related activity such as doing something special at midnight that might be the culprit. All in all, my code is pretty simple. Using default automatic mode without system thread - boot/wake up, talk to a number of i2c/serial/analog sensors, publish a few events to the cloud and then sleep in a loop.

Not sure what to do from here from a troubleshooting perspective. These things are solar powered and out in a field. Any troubleshooting advice that would hopefully leading to tracking down the problem as quick as possible since it takes so long for the issue to surface? I am supposed to be rolling out a dozen or two of these in the next month and will have to put that on hold since I can’t risk them locking up in the field (literally).


#2

I probably should have made it clear that the diagnostics event sent 15 minutes before they go offline shows their batteries are over 90% charged, so its not a lack of power in the middle of the night :slight_smile: Would have been my first thought if I read this.

Anyway, I’ve decided to rework my design and add a watchdog timer that triggers two power switch ICs if there is watchdog kick in the last X minutes. Borrowed the idea and some of the design from this thread: Electron Carrier Board so thanks for that @chipmc!

I will feel much better about deploying units in such remote areas with the addition of the watchdog timer, especially since I’ve been reading threads and it sounds like there are cases where the Cellular modem can get into weird states, and a dance of Cellular.on(), pauses and Cellular.off() seem to be the only way to recover.

The one thing my design is doing that is different than the carrier board is I simply use the watchdog’s RESET line directly to the power chips ON/OFF line, no fancy self-kill from the Electron, and with my firmware, there is no way it would WKUP fast enough to kick the dog in 20ms, so I just ping the DONE line every X minutes, and a full solar/battery disconnect/reconnect will happen if no watchdog kick happens.

Fingers crossed…