I have 50 Particle Electrons that are set to connect one per hour. Occasionally, during a connection attempt the code will block and the only way to recover is to physically reboot the device. Since my Electrons are installed in remote locations, this unfortunately isn’t a routine option.
Both Particle.connect() and Particle.publish() appear to sometimes block the code; both my FW timer (inside void loop()) and Watchdog don’t kick in. This is also mentioned in several topics on this issue, but I’m not quite sure yet what the best way is to handle this instead. I posted my code below; could someone highlight what the best way for me is to mitigate this problem?
Without a hardware based watchdog it’s basically impossible to accomplish the full system reset once the device locks up without external hardware. The device has a hardware watchdog built in but it’s not enabled or useable for some reason which causes us to add external watchdog circuits.
I’m adding a watchdog chip to a Photon design now for this very reason.
I assume it’s not the expectation we will have the STM32 Hardware Watchdog feature in 0.8.0 anytime soon?
And I suppose regardless of the watchdog used, an Electron in Deep Sleep won’t pet the dog and therefore needs to be woken up in time to do so? I’ll get started on re-creating this HW design today (likely going for a 5m interval)
From what I experienced, Particle.connect() locking up a device appears to be related to limited cellular connectivity? My devices that connect easily (e.g. <30s from a cold state) usually do fine, but devices that struggle to connect often are the ones locking up as well.
My understanding from the comments in your other Post was this would best be accomplished with MANUAL Mode :
But you have reasons to not want to modify your code for Manual Mode.
So, If you use an external watchdog, you still have the possibility of the Electron burning through battery for the full length of your timer interval during a Code Crash (1 hour cycle would be too long, especially considering the next connection attempt could fail/lockup).
If you perform 5 min Deep Sleep with an external watchdog as @RWB suggests, the only thing the Electron would do every wake cycle would be to Pet the H/W watchdog, increment a counter, and enter another 5 min Deep Sleep until it’s time to publish. That seems realistic without requiring too much modification of your Code and shouldn’t take too many milliseconds of precious battery power.
I guess you have to decide what’s the power cost of waking at a short interval to kick the external watchdog, verses the power lost during a LONG interval while your Electron is “locked-up”. But either way, the external watchdog “should” eliminate the need to manually reset the Electron.
Or Manual & Threaded using (waitFor(Particle.connected, connectionFail) and not have a hardware revision
Switching to Manual mode is easier than adding more HW to my design, so I would favour that, but I wasn’t 100% sure if Manual mode is a 100% failsafe method for blocking code. Reliability is key for my application. I believe Particle.connect() and Particle.publish() are the only two blocking commands that cause issues for me right now, but it’s hard to be sure.
I give my devices up to 3mins. to connect - so a 5min. timer seems appropriate to me. A timer like 1h is too long if the code blocks more than just a few times per year. Like you said, my idea would be to have 2 timers in my FW - one that fires every x min. to pet the HW watchdog, and one that fires every 1h. to publish.
But 5min. right now is a random number - I’ll have to do the calculations to see what the most optimal number is power-wise.
Some of the Elites would have to speak to that, but my guess is that external watchdog would be the only way to approach 100%. But for me, it’s [Manual/Threading] worked for a few projects that were extremely sensitive to wasting precious battery (primary, no recharging available). But then again, I don’t have 50 Electrons running Manual/Threaded for a decent sample size.
I don’t know what all your code does, but what impact would using Manual/Threading, but No timers, No ApplicationWatchdog, no external watchdog have? It would just be a 1 hour deep sleep, wake up, allow up-to 3 minutes for a successful connection, then go back to deep sleep for 1 hour no matter what (the 1-shot approach)? You could throw in a system.reset every 24 hours or once a week.
But there will always be a tiny chance that the cell modem can get stuck in a funny state and continue to waste power. H/W watchdog would be the best thing that I can think of to mitigate that, but you will need to recognize that situation first (I’m not sure how to).
So I did some calculations, and want to go with a setup that resets my device every 20mins (unless the dog was petted). The type of reset doesn’t matter much, because my device should go into Deep Sleep mode right away anyway (unless the accelerometer actually detects movement).
These are the latest schematics from the thread you linked:
Schematic part 1: Circuit Protection - I already have this, using the TPS61099.
Schematic part 2: Power Control - I don’t need extra Power buttons.
Schematic part 3: Carrier Board - I don’t need a Temperature sensor or FRAM. Just the Watchdog.
Am I correct in thinking that the only part from this entire schematic I need would be the TPL5010 (Watchdog) with its 3 resistors (values appropriate for a 20m. timer)?
I was curious if you (or anyone else) could explain to me if there’s any difference between these two pieces of code? With my original code, I would still have the occasional blocking Electron that required a manual reset to be done.
So when Particle.connect() ends up blocking, it still wouldn’t time out after 3 minutes would it? Seems like both pieces of code effectively do the same thing. I’ve had devices block with my Original code (System Threading + Manual) so I’m guessing FW-side there isn’t much else to do here to reduce the issue?