On Gen 3 devices (Argon, Boron, B Series SoM, Tracker SoM, Tracker One, Monitor One, E404X) and the P2 and Photon 2, you can now enable the hardware watchdog in the MCU (nRF52840 or RTL8721). This is significantly more effective than the application (software) watchdog, and does not require external hardware.
Is the watchdog still active during sleep modes? I.e. Do I need to call something like Wachdog.stop() before sleeping and then Watchdog.start() after I wake up from sleeping?
Is there a max duration the Watchdog can be set to? I know the AB1805 had a max duration of I think 124 Seconds. Is there a limit here?
What is your recommendation for a “sleepy” gen 3 device. Should we just set the watchdog timeout to be longer than the maximum duration of a sleep cycle. This would be ideal as it seems more like a “catch all” and the watchdog would never turn off once set. If I sleep for say 20 minutes at a time, I’d like to just use: Watchdog.init(WatchdogConfiguration().timeout(1260s)); This would be 21 minutes.
Does Particle.Process() or any other library call Watchdog.Refresh() already or do I need to explicitly call it? I’d very much prefer to keep it isolated if possible this way we have full control of how we want to pet the watchdog. My thought is using the Watchdog.Refresh() as a “cloud side” watchdog. I.e. if I don’t get an ACK from a Particle.Publish() or possibly only call it in a Particle.Subscribe() then something on the cloud connection is hung and this would be an easy way to accomplish that.
It does seem like you could do some sort of a cloud-based refresh of the watchdog. I haven’t tested it, but the timeouts are long enough that it could work. Of course that will use more data operations.
By default this could be false so if someone doesn’t explicitly add it, it’ll disable it like it is now but I personally would like to keep it enabled during sleep. It makes it more of a broader “catch all”. For example, maybe somehow the sleep duration was set wrong or something else happened that it wasn’t falling asleep properly or waking up from sleep properly. What do you think?
I’d love to continue to use the AB1805 watchdog for the short term duration hardware watchdog (i.e. 124 seconds) but then would use this watchdog as a “cloud side” watchdog. With a max duration of over 2 hours or even much longer for the nRF52840, I’d like to keep it enabled all the time even during sleep modes and pet it only during an ACK of a publish event OR possibly a dedicated Particle.Function(). I’d have my backend that processes webbooks from a device (Python + SQL). It would keep track of when the watchdog was last pet and then call a Particle.Function() to pet the watchdog maybe once per hour? I’d have no problem at all burning 24 data operations a day/device for this extra cloud side watchdog functionality. If I needed could do every 2 hours even.
Generally speaking, my customers are “OK” with an occasional hang up for 1-2 hours but it’s a pain in the butt if they have to travel to the site to give it a hard power off. I’ve had a few scenarios where the device was “connected”. I could ping it, I could push new firmware to it, but it stopped sending data out and it was unable to process Particle functions. The AB1805 wasn’t resetting it so it’s like user application firmware was running but was still locked up somehow. This would be the catch all for that edge case issue.
I agree with @chipmc I currently am on 4.X branch with the fleet. Although very tempting to migrate now to the 5.X branch for this feature, I’m debating on the time and/or risks involved on testing and being on the non LTS release for the fleet. Would be great to see this on the LTS 4.X branch as well. I’d have a lot more confidence deploying now to hopefully fix those few edge cases of devices locking up on the cloud side if it was a minor revision.
And yes, I’m keeping AB1805 either way as I also require the RTC that is also sub second accurate. Good stuff!
Anyone got it to work? I’m still having problems that the Watchdog triggers on a B524, even though I call Watchdog.refresh() more than often enough - is it not possible to ‘dynamically’ calculate a watchdog timeout as I do below?
I’m starting the watchdog timer as follows:
// Start watchdog timer
// Calculate watchdog timer using sleep time and a timeout factor
std::chrono::seconds watchdogTimeout(DeviceConfigurationConstants::WATCHDOG_TIMEOUT_FACTOR * Class::instance().getSleepTime()); // 4 * 60s, i.e. 4 * a std::crono::seconds of 60
// Clamp to min. 3 minutes
if (watchdogTimeout.count() < 180)
watchdogTimeout = 180s;
// Initialize Watchdog
// Start HW watchdog
I might be missing something, but I’m scratching my head at this. I really want the timeout to be dynamic in relation to the set sleep time, but my calculations using chrono literals might be acting weird? It might be deserving of a separate topic, but this thread seems to be the main place for testing the new functionality.
@jgskarda , a few years ago I started playing with an external relay timer as a 'Cloud Side" Watchdog. I would only Pet the Watchdog by subscribing to a Webhook response. This ensured Round-Trip Cloud Connectivity, and more importantly ensured that the backend service (that actually collects the normal data) was working. I used a separate “WDT” Webhook so this schedule was decoupled from normal data publishes (which remained NO_ACK).
It seems like a similar WDT Webhook (with Subscribe) could be used with the Hardware watchdog in Device OS 5.3.0+, since it’s a “free” addition at this point.
You could publish to the WDT Webhook every 6 hours to “Pet”, and reset after 12 hours if no Cloud & Backend connectivity (round-trip).
This seems safer than relying on a Cloud-Side Function Call during the few times that you need to reset the device because of Cloud-Side or Backend Problems.
I prefer the “Round-Trip” Approach as a fail-safe (but on a longer schedule) because that’s what the IoT device is normally used for, pushing data to a backend service. If I can’t receive data after 12 hours…then automatically pull the plug and start over. It seems to me the “Round-Trip” is the best approach to prevent the dreaded “Somebody has to go on a field trip to reset the dang thing”, as a final measure.
And you can leave your existing external Watchdog alone and let it handle Firmware Crashes on a tighter schedule. This one is a free add-on for OS5.3.0
This seems pretty cool. I’m not familiar with hardware watchdogs so is there more documentation about what exactly they do and how they could be used? I read through the Particle API, but still trying to understand how I might use it. For example, students often have devices lock up with infinite loops–would this be a use case for this?