Help: Boron hardware watchdog unusable - always winds-up stuck in DFU mode - despite documentation's suggestion

I am using the code that has been referred to on this forum to enable the internal hardware watchdog (not the fake OS-level software-based ApplicationWatchdog class from Particle) on the Boron.

It invariably winds-up in DFU mode after resetting with the hardware watchdog.

Therefore, rather than guaranteeing stability, the hardware watchdog does the frustrating opposite.

It guarantees that your Boron will boot-back-up into an irreparable manual-reset-requring DFU mode, requiring you to drive hours or send a team to your remote site where you were expecting to use the Boron for remote monitoring.

I can find one other person who has reported this: nRF52840 Hardware Watchdog Question

Yet the Particle documentation still says:

The application watchdog requires interrupts to be active in order to function. Enabling the hardware watchdog in combination with this is recommended, so that the system resets in the event that interrupts are not firing.

https://docs.particle.io/reference/device-os/firmware/boron/#application-watchdog

So Partcle officially recommends me to do something that is proven with 100% repeatability to make my device enter a permanent useless, broken state requiring a manual human reset? Great!

This is particularly concerning because of all the cellular reconnection problems I have proven Particle has re-introduced into the Boron sometime after 1.3.1-rc1. (Proof: Stable cellular reconnection newly ruined with Boron 2.0.0-rc1, and perhaps earlier)

I am having Borons randomly die and need power cycling in far-away, remote locations, and Particle’s product’s hardware watchdog is failing, because it works but then the Particle OS decides to put the device in dead DFU mode.

I understand @chipmc has a supervisory circuit board, but after reading that long thread, it is totally not a solution. That project apparently 1) never was finished, 2) never had a working external watchdog, and 3) never made it on the retail site where you could order one and they would fully assemble it (not the one just to get the boards, but with the components pre-built).

If I had that level of expertise and time I would simply hook up a literal power relay to a small Atmega and forcibly unpower-repower my failing Boron as required - totally defeating the point of the internal hardware watchdog.

Why doesn’t Particle’s flagship product have OS software on it that declines to render the hardware watchdog of the chip itself totally useless, by choosing to put the device into DFU mode on start? It seems like a cruel trick. It seems like the Particle code is saying upon such reboots, “Ah, so you activated the internal hardware watchdog instead of using our failing software-based timer watchdog, huh? You’re trying to actually use our hobbyist product for high-reliability applications by thinking you can bypass our untested OS code’s flaw by power-cycling with the internal hardware watchdog, aren’t you, huh? Well take THIS proceeds to boot Device OS into useless DFU mode as if to mock and frustrate the user

This wouldn’t be so pressing if Particle’s flagship product had a stable product that would reconnect to the cloud and not enter permanent states of disconnectivity with perfect power and cellular signal (e.g., this morning I had the embarassing experience of having to text a client at a remote site to open up the enclosure and power-cycle the never-reconnecting flashing-green-for-three-days 1.5.0 Boron, which resulted in an instant and perfect connection after the manual power cycle).

But given Particle’s disastrous killing of their Boron product post-1.3.1-rc1 by making it enter states of permanent disconnectivity, the hardware watchdog is a must. The software watchdog I have tested, and does NOTHING to recover Borons in such states, unlike power cycle and I’m assuming hardware watchdog.

The following code puts Boron LTE into permanent, endless, yellow-flashing DFU mode on just a few WDT restarts:

SYSTEM_MODE(MANUAL); SYSTEM_THREAD(ENABLED);
void setup() {
    WatchDoginitialize();
    WatchDogpet();
    RGB.control(true); RGB.color(255,0,0);
    delay(2000); //Red LED for 2s on startup to indicate correct startup, not having been killed into DFU mode
}
void loop() {
    RGB.control(true); RGB.color(255,255,0); delay(1000);
    RGB.color(255,0,255); delay(1000); //Alternate colors until 10s HWDT triggered
}
#define WATCHDOG_TIMEOUT_MS 10*1000
#define WDT_RREN_REG 0x40010508
#define WDT_CRV_REG 0x40010504
#define WDT_REG 0x40010000
#define WDT_RR0_REG 0x40010600
#define WDT_RELOAD 0x6E524635
void WatchDoginitialize() { // https://youtu.be/Xb6dkEHLASU
    *(uint32_t *) WDT_RREN_REG = 0x00000001;
    *(uint32_t *) WDT_CRV_REG = (uint32_t) (WATCHDOG_TIMEOUT_MS * 32.768);
    *(uint32_t *) WDT_REG = 0x00000001;
}
void WatchDogpet() { *(uint32_t *) WDT_RR0_REG = WDT_RELOAD; }

Why is this happening? Is there anyway to make this not happen, so that it will never reboot to DFU mode but rather start normally, so we can use the hardware watchdog?

@Paul_M,

First of all, I want to say as someone who places devices in remote places, I can totally understand how frustrating it is to have them go off-line and require a manual intervention. I have been working with the Particle platform for a few years now and have felt some of the pain you are referencing with deviceOS updates.

That said, I do think you can have a stable platform with this product and I think you will find that folks are ready to pitch in and help you in this effort. As you called out the work I am doing on the carrier board, I wanted to provide an update. I have been delayed in updating this work because of some of the deviceOS issues around sleep, the new power configuration API and PMIC lock-ups. However, I am convinced that these issues have been resolved or have reasonable workarounds. Therefore, I went ahead and placed an order for the 3rd Generation Carrier Boards and they will be delivered from MacroFab today. I was going to test them and update the carrier board thread to close out that effort as complete.

If you are interested, please send me a dm and I can sell you a board for testing. Once I field these for a month or so, I plan to place a larger order and will invite folks to pile on. I do not make money on hardware but if more people order together, the price goes down for everyone.

My carrier board does have a hardware based watchdog and my intent is to continue to improve this over time through watchdog firmware updates. I can say that I have devices that have worked reliably for years and my hardware and software is open source so you are welcome to any of it that is helpful.

Thanks,

Chip

3 Likes

Thank you @chipmc for your helpful offer and gracious contributions to the community. I share your optimism that Particle can be used as a stable cellular platform as long as certain elaborate precautions are taken. The first is the necessity of not going beyond 1.3.1-rc1 on the Boron until there is rigorous cellular reconnection study/analysis/proof done, because I have shown for a fact it got broken afterwards. The second is hardware watchdog. Hopefully using the Boron chip itself, but at this time, seemingly, external only.

I will send you a PM. However, if we are going to get into the topic of external-electronic-supervisory-watchdog device instead of internal-Boron-chip-hardware device for doing a watchdog, is anything more than this necessary? https://www.amazon.com/gp/product/B07BT32T1M/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

I saw someone else in a previous thread on this general topic use that board to accomplish an external hardware watchdog.

Regardless I would like to try and test your board even if but nothing to assist, so I will PM you.

So, it’s not currently possible to use the Boron internal hardware chip watchdog AND have the Device OS not eventually boot up to locked in DFU mode?

@Paul_M,

I am sorry but I have not used the internal watchdog timer hardware so I will have to defer to someone who has.

As for your link, I think that is a very cool piece of kit but, what you need is a timer that can be easily “set” or a watdog that can easily be “pet” by your Particle device.

If you are looking for a timer module, I would suggest you take a look at this one - it is even cheaper:

or this one from Adafruit:

These devices can be connected to the reset or enable pins and you can control their operation from the Boron which I did not see an easy way to do with the Amazon timer you linked.

If these breakouts would work for you, I would suggest that an independent hardware based watchdog timer might give you more confidence than relying on a timer that is part of the nRF52840 which you are wishing to monitor / manage. In my experience, none of the devices I have used with an external watchdog timer has ever gone into the DFU state.

Again, I hope this help and happy to discuss further.

Chip

1 Like

You will likely find that anything (also manually) that pulls the reset pin repeatedly enough times, in a similar reset-restart-reset pattern as with the internal watchdog, will result in DFU mode.

[Edit: Nope, it seems I can perform pin reset and System.reset() all I want on 1.5.2]

Chip, thanks for the TPL5110 note. I have determined the TPL5110 is not suitable because it isn’t really a full power switch. It rather, as you note, can send pulses to control the RST and EN pins. And, I have read enough on these forums to know that manipulating RST - and even EN - can be insufficient to recover from bad states.

Today, I actually got the U6030 working as an external hardware watchdog pursuant to @Rftop 's post here:

You need to put the U6030 on Mode 5 and, sadly, it needs a 12v input (not a problem for my current deployment, but I have some sites which are 6V only).

Does anyone know of a 5V equivalent version of the U6030?

This little device, along with usage of V1.3.1-rc1, appears to be the magic trick allowing me to use Boron LTE in a stable, reliable way.

And although it requires 12V in, I am currently triggering it/petting it with 3.3v logic level digital Boron D8 out - and it is working just great. I have T1 (watchdog interval) set to 300s, and T2 (shutoff/reset duration) set to 30s.

I would love to find the same exact board but able to use 5v or even 3.3v power.

1 Like

@Paul_M,
I use Mode 8 [Edit: Mode 7] and connect the Boron to the Normally Closed (NC) pin on the Relay.
The Timer Relay Board uses 580 mW when the relay is energized and 150 mW when it’s not.
By using Mode 8 [Edit: Mode 7] and the NC output, the Timer Relay spends most of the time in the “lower” power state.
The only time the Relay is energized is when the Webhook response hasn’t made it back to the Boron, so the Relay is energized (580 mW) to completely Power Down the Boron for the selected amount of time.

image
The Relay Timer Board is obviously NOT a low powered device, but Mode 8 helps.

For anyone that’s interested, as Paul mentioned the Relay Timer requires a 12V power source.
You will need to add 150 mW (~12mA @ 12V) to your power budget.
But this usually isn’t a problem for Remote Installs since you will be moving up to a 12V SLA battery and 12V Solar Panel anyway. I normally don’t even bother with Sleep at that point.

It works fine for “mains” powered units using a 12V DC power supply.

Remember, you can’t also use the Boron’s Li-Po with this setup.
That would prevent the Boron from shutting down and rebooting during a cloud failure event as the Relay Cycles.

In my experience, I’ve never had a $5 Timer Relay board fail on the test bench or in the field.
I’ve purchased them in large lots for $2 each in the past.

1 Like

Thanks @Rftop for that helpful information.

Why do you have it on Mode 8 instead of Mode 5?

My understanding is that Mode 8 will only power cycle after a timeout once, whereas Mode 5 will keep power cycling indefinitely until the signal is received.

What is your understanding of Mode 5 vs. 8?

1 Like

Sorry, I fat-fingered. I use Mode 7.
But the answer to your question is:
I’ve never seen that nice datasheet that you linked to.
I started using the boards about 2 years ago and just blindly changed modes to try and determine what the functions were. I couldn’t find any documentation.
I set the Retries to 9999 in Mode 7 (basically unlimited), but your mode 5 looks even better.

Thanks for the Link to the datasheet !

1 Like