Over 20 Devices just went down

I'd say a power switch, considering Reset generally couldn't recover from this incident, or the incident 6 weeks prior on March 14.

I've added a cheap ($5) Timer Relay board to my critical Particle Devices, which stay powered 24/7:
image
The Count-down timer gets reset with a Digital Pin after each successful "WDT" webhook response.

I use the "WDT" publish/subscribe on a 30 minute schedule (adjustable) as a failsafe. If the timer runs out (the webhook response didn't make it back to the Electron/Boron), then the Particle's power will be switched off and will remain in the un-powered state for a user defined length of time (I use 15 seconds).

The separate "WDT" publish/subscribe schedule also allows me to continue to send No Acknowledge Publishes for regular sensor data (thousands per day), to not impact the cellular data usage significantly.

This is not a neat or clean as a "real" watchdog IC solution. But the easily adjustable ON/OFF Countdown Timer values and the visual feedback are nice, and it's only 1-wire plus power.

I cant risk another Cloud Incident requiring physical visits to all my customer's sites.

1 Like