I ran into this today testing my new battery weather station. I’m getting ready to deploy a new low power weather station for my parents house using an Argon, a few sensors, and a big LiPoly battery plugged directly into the Argon. The Argon is plugged into a USB 5W solar panel. It had been running my charge test for about 24 hours, and then stopped checking in (it should wake every 15 minutes, send me some weather data, go back to sleep using ULTRA_LOW_POWER).
When I went to look at it, it was flashing rapid light blue, the core was connected, but couldn’t finish the handshake is what I got to. I was able to reset using the reset button and it reconnected fine after that. Given the battery drain when I hit reset, it was in that state for the full hour (not much sun this afternoon). Is there a way to time limit this potential error so the unit either goes back to sleep OR resets if it fails for more than a few seconds/minutes? This could be pretty bad as this unit is going into a place where it will be impossible to even recognize this happened, much less fix it (it will be in a sealed box for weather and stuck under the panel away from people).
I’d recommend some sort of hardware watchdog for scenarios like this. There is a very guide how to do this via an application note from Rick: AN023 Watchdog Timers | Datasheets | Particle
I personally use the AB1805-LiPo version with very good success. I now have a lot more confidence in deploying Particle Devices as it will do a deep reset (power down for 30 seconds) if it is unable to connect. I believe this is the same circuitry within the Tracker and therefore comes will supported with libraries for a much more straightforward and simple implementation of a hardware watchdog instead of your own home brew version.
I like that, I have a design with it, but for times sake (my parents head home in a few days) I was going to try to skip the extra circuit (PCB would take too long). I have a feather doubler with my connectors on one side, the Argon on the other. I'll just live with this until I can get the new feather PCB ordered and tested.
I had hoped that it would be possible to time that operation out somehow, but for now, I'm not going to spend a lot of time worrying about it. I'll be back out there in June, and can replace the existing board with a new one at that time.
You can detect that situation in software. If you are using SYSTEM_THREAD(ENABLED) monitor both Cellular.ready() and Particle.connected(). If you are cellular connected but not cloud connected for more than a few minutes, the device is having trouble handshaking. Since you were planning of sleeping anyway, just put the device into sleep mode for 15 minutes and try again later rather than retrying continuously.
@rickkas7, a question about this. I am testing right now, and have noticed that 3 or 4 times in a row, it resumes from where sleep was called, which is what I expect. However, every now and then, it will start from setup() again. I don’t see any errors reported before it goes to sleep, nor do I see anything when it wakes up. I’m watching for wakeup at my desk so I can jump on serial. So why would it effectively “reset”? If this is a bug in my program that might cause this, what should I look for?
Overall, this isn’t an actual problem. My code checks in at specific intervals. I’m perfectly fine if those oscillate by one or two seconds each way, but when it restarts from setup, the checkin interval is off by almost 20 seconds more. Not critical, but it throws off my data visualization a bit and makes it more difficult to read.
I’ll probably just modify how long it sleeps depending on where it starts from.
The system never does a full device reset (go back through setup again). Even if it resets the modem by powering it down, it does not go through setup again.
Logging the reset reason is a good first start, but it may not tell you anything unless it’s an SOS panic.
If you have a hardware watchdog or application watchdog, that would be something to check. Also if you have a low memory reset handler.
The device should not reset like that, but regardless it’s a good idea to base your sleep time off the actual time because even in the absence of a device reset, connecting to cellular can take a varying amount of time.
I think it was the watchdog. Through all of this, I forgot i had that running, and I suspect that’s where it was breaking. Just instrumented it to handle the new logic correctly and am testing.