Help: Boron hardware watchdog unusable - always winds-up stuck in DFU mode - despite documentation's suggestion

I am using the code that has been referred to on this forum to enable the internal hardware watchdog (not the fake OS-level software-based ApplicationWatchdog class from Particle) on the Boron.

It invariably winds-up in DFU mode after resetting with the hardware watchdog.

Therefore, rather than guaranteeing stability, the hardware watchdog does the frustrating opposite.

It guarantees that your Boron will boot-back-up into an irreparable manual-reset-requring DFU mode, requiring you to drive hours or send a team to your remote site where you were expecting to use the Boron for remote monitoring.

I can find one other person who has reported this: nRF52840 Hardware Watchdog Question - #12 by Jack_v_D

Yet the Particle documentation still says:

The application watchdog requires interrupts to be active in order to function. Enabling the hardware watchdog in combination with this is recommended, so that the system resets in the event that interrupts are not firing.

https://docs.particle.io/reference/device-os/firmware/boron/#application-watchdog

So Partcle officially recommends me to do something that is proven with 100% repeatability to make my device enter a permanent useless, broken state requiring a manual human reset? Great!

This is particularly concerning because of all the cellular reconnection problems I have proven Particle has re-introduced into the Boron sometime after 1.3.1-rc1. (Proof: Stable cellular reconnection newly ruined with Boron 2.0.0-rc1, and perhaps earlier - #4 by robc)

I am having Borons randomly die and need power cycling in far-away, remote locations, and Particle's product's hardware watchdog is failing, because it works but then the Particle OS decides to put the device in dead DFU mode.

I understand @chipmc has a supervisory circuit board, but after reading that long thread, it is totally not a solution. That project apparently 1) never was finished, 2) never had a working external watchdog, and 3) never made it on the retail site where you could order one and they would fully assemble it (not the one just to get the boards, but with the components pre-built).

If I had that level of expertise and time I would simply hook up a literal power relay to a small Atmega and forcibly unpower-repower my failing Boron as required - totally defeating the point of the internal hardware watchdog.

Why doesn't Particle's flagship product have OS software on it that declines to render the hardware watchdog of the chip itself totally useless, by choosing to put the device into DFU mode on start? It seems like a cruel trick. It seems like the Particle code is saying upon such reboots, "Ah, so you activated the internal hardware watchdog instead of using our failing software-based timer watchdog, huh? You're trying to actually use our hobbyist product for high-reliability applications by thinking you can bypass our untested OS code's flaw by power-cycling with the internal hardware watchdog, aren't you, huh? Well take THIS proceeds to boot Device OS into useless DFU mode as if to mock and frustrate the user"

This wouldn't be so pressing if Particle's flagship product had a stable product that would reconnect to the cloud and not enter permanent states of disconnectivity with perfect power and cellular signal (e.g., this morning I had the embarassing experience of having to text a client at a remote site to open up the enclosure and power-cycle the never-reconnecting flashing-green-for-three-days 1.5.0 Boron, which resulted in an instant and perfect connection after the manual power cycle).

But given Particle's disastrous killing of their Boron product post-1.3.1-rc1 by making it enter states of permanent disconnectivity, the hardware watchdog is a must. The software watchdog I have tested, and does NOTHING to recover Borons in such states, unlike power cycle and I'm assuming hardware watchdog.

The following code puts Boron LTE into permanent, endless, yellow-flashing DFU mode on just a few WDT restarts:

SYSTEM_MODE(MANUAL); SYSTEM_THREAD(ENABLED);
void setup() {
    WatchDoginitialize();
    WatchDogpet();
    RGB.control(true); RGB.color(255,0,0);
    delay(2000); //Red LED for 2s on startup to indicate correct startup, not having been killed into DFU mode
}
void loop() {
    RGB.control(true); RGB.color(255,255,0); delay(1000);
    RGB.color(255,0,255); delay(1000); //Alternate colors until 10s HWDT triggered
}
#define WATCHDOG_TIMEOUT_MS 10*1000
#define WDT_RREN_REG 0x40010508
#define WDT_CRV_REG 0x40010504
#define WDT_REG 0x40010000
#define WDT_RR0_REG 0x40010600
#define WDT_RELOAD 0x6E524635
void WatchDoginitialize() { // https://youtu.be/Xb6dkEHLASU
    *(uint32_t *) WDT_RREN_REG = 0x00000001;
    *(uint32_t *) WDT_CRV_REG = (uint32_t) (WATCHDOG_TIMEOUT_MS * 32.768);
    *(uint32_t *) WDT_REG = 0x00000001;
}
void WatchDogpet() { *(uint32_t *) WDT_RR0_REG = WDT_RELOAD; }

Why is this happening? Is there anyway to make this not happen, so that it will never reboot to DFU mode but rather start normally, so we can use the hardware watchdog?

@Paul_M,

First of all, I want to say as someone who places devices in remote places, I can totally understand how frustrating it is to have them go off-line and require a manual intervention. I have been working with the Particle platform for a few years now and have felt some of the pain you are referencing with deviceOS updates.

That said, I do think you can have a stable platform with this product and I think you will find that folks are ready to pitch in and help you in this effort. As you called out the work I am doing on the carrier board, I wanted to provide an update. I have been delayed in updating this work because of some of the deviceOS issues around sleep, the new power configuration API and PMIC lock-ups. However, I am convinced that these issues have been resolved or have reasonable workarounds. Therefore, I went ahead and placed an order for the 3rd Generation Carrier Boards and they will be delivered from MacroFab today. I was going to test them and update the carrier board thread to close out that effort as complete.

If you are interested, please send me a dm and I can sell you a board for testing. Once I field these for a month or so, I plan to place a larger order and will invite folks to pile on. I do not make money on hardware but if more people order together, the price goes down for everyone.

My carrier board does have a hardware based watchdog and my intent is to continue to improve this over time through watchdog firmware updates. I can say that I have devices that have worked reliably for years and my hardware and software is open source so you are welcome to any of it that is helpful.

Thanks,

Chip

3 Likes

Thank you @chipmc for your helpful offer and gracious contributions to the community. I share your optimism that Particle can be used as a stable cellular platform as long as certain elaborate precautions are taken. The first is the necessity of not going beyond 1.3.1-rc1 on the Boron until there is rigorous cellular reconnection study/analysis/proof done, because I have shown for a fact it got broken afterwards. The second is hardware watchdog. Hopefully using the Boron chip itself, but at this time, seemingly, external only.

I will send you a PM. However, if we are going to get into the topic of external-electronic-supervisory-watchdog device instead of internal-Boron-chip-hardware device for doing a watchdog, is anything more than this necessary? https://www.amazon.com/gp/product/B07BT32T1M/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1

I saw someone else in a previous thread on this general topic use that board to accomplish an external hardware watchdog.

Regardless I would like to try and test your board even if but nothing to assist, so I will PM you.

So, it’s not currently possible to use the Boron internal hardware chip watchdog AND have the Device OS not eventually boot up to locked in DFU mode?

@Paul_M,

I am sorry but I have not used the internal watchdog timer hardware so I will have to defer to someone who has.

As for your link, I think that is a very cool piece of kit but, what you need is a timer that can be easily “set” or a watdog that can easily be “pet” by your Particle device.

If you are looking for a timer module, I would suggest you take a look at this one - it is even cheaper:

or this one from Adafruit:

These devices can be connected to the reset or enable pins and you can control their operation from the Boron which I did not see an easy way to do with the Amazon timer you linked.

If these breakouts would work for you, I would suggest that an independent hardware based watchdog timer might give you more confidence than relying on a timer that is part of the nRF52840 which you are wishing to monitor / manage. In my experience, none of the devices I have used with an external watchdog timer has ever gone into the DFU state.

Again, I hope this help and happy to discuss further.

Chip

1 Like

You will likely find that anything (also manually) that pulls the reset pin repeatedly enough times, in a similar reset-restart-reset pattern as with the internal watchdog, will result in DFU mode.

[Edit: Nope, it seems I can perform pin reset and System.reset() all I want on 1.5.2]

Chip, thanks for the TPL5110 note. I have determined the TPL5110 is not suitable because it isn't really a full power switch. It rather, as you note, can send pulses to control the RST and EN pins. And, I have read enough on these forums to know that manipulating RST - and even EN - can be insufficient to recover from bad states.

Today, I actually got the U6030 working as an external hardware watchdog pursuant to @Rftop 's post here:

You need to put the U6030 on Mode 5 and, sadly, it needs a 12v input (not a problem for my current deployment, but I have some sites which are 6V only).

Does anyone know of a 5V equivalent version of the U6030?

This little device, along with usage of V1.3.1-rc1, appears to be the magic trick allowing me to use Boron LTE in a stable, reliable way.

And although it requires 12V in, I am currently triggering it/petting it with 3.3v logic level digital Boron D8 out - and it is working just great. I have T1 (watchdog interval) set to 300s, and T2 (shutoff/reset duration) set to 30s.

I would love to find the same exact board but able to use 5v or even 3.3v power.

1 Like

@Paul_M,
I use Mode 8 [Edit: Mode 7] and connect the Boron to the Normally Closed (NC) pin on the Relay.
The Timer Relay Board uses 580 mW when the relay is energized and 150 mW when it’s not.
By using Mode 8 [Edit: Mode 7] and the NC output, the Timer Relay spends most of the time in the “lower” power state.
The only time the Relay is energized is when the Webhook response hasn’t made it back to the Boron, so the Relay is energized (580 mW) to completely Power Down the Boron for the selected amount of time.

image
The Relay Timer Board is obviously NOT a low powered device, but Mode 8 helps.

For anyone that’s interested, as Paul mentioned the Relay Timer requires a 12V power source.
You will need to add 150 mW (~12mA @ 12V) to your power budget.
But this usually isn’t a problem for Remote Installs since you will be moving up to a 12V SLA battery and 12V Solar Panel anyway. I normally don’t even bother with Sleep at that point.

It works fine for “mains” powered units using a 12V DC power supply.

Remember, you can’t also use the Boron’s Li-Po with this setup.
That would prevent the Boron from shutting down and rebooting during a cloud failure event as the Relay Cycles.

In my experience, I’ve never had a $5 Timer Relay board fail on the test bench or in the field.
I’ve purchased them in large lots for $2 each in the past.

1 Like

Thanks @Rftop for that helpful information.

Why do you have it on Mode 8 instead of Mode 5?

My understanding is that Mode 8 will only power cycle after a timeout once, whereas Mode 5 will keep power cycling indefinitely until the signal is received.

What is your understanding of Mode 5 vs. 8?

1 Like

Sorry, I fat-fingered. I use Mode 7.
But the answer to your question is:
I've never seen that nice datasheet that you linked to.
I started using the boards about 2 years ago and just blindly changed modes to try and determine what the functions were. I couldn't find any documentation.
I set the Retries to 9999 in Mode 7 (basically unlimited), but your mode 5 looks even better.

Thanks for the Link to the datasheet !

1 Like

Can someone help out with an example code of resetting the Boron with the U6030 and a wiring diagram.
I myself had 3 Borons all go offline at the same time in remote locations. Had to physically go there to reset them. Wasn’t happy!

Take it from me, you DON’T want to power-cycle watchdog the VUSB input pin. That will actually have the opposite effect, guaranteeing your Boron will be dead-forever-until-human on the first reset. They say use the EN pin but I think the only real way to do a watchdog for Boron is to make a robot that physically pulls it out of your pin sockets and pushes it back in, or else the leakage current from even one externally powered UART RX pin will prevent boot. See my thread here: (Video) New Boron FAILS to reboot on VUSB power cycle - external watchdog incapable of resetting device

Particle said over a month ago that there was going to be a technical memorandum/application note on Boron watchdog implementation, which to my knowledge does not yet exist: (Video) New Boron FAILS to reboot on VUSB power cycle - external watchdog incapable of resetting device

By far the best results (that is, the only acceptable results) I have been able to have with Boron are 1) MANDATORY usage of 1.3.1-rc1 and NOT newer, and 2) no watchdog.

Thanks for the info @Paul_M
I only have a water level sensor on the 2 Borons and a relay on the 3rd.
All units are powered with a 12v AGM battery and solar.
I thought that by using a relay to disconnect the power to Borons, it should reset them to get a new connection to the cloud.
Like the robot arm idea, although not very practical! :smile:
Surely there must be an easier way

You could use the U6030 to toggle VUSB ONLY if all your sensors are running off the 3.3V out on the Boron. If they are separately powered by the 12V AGM battery, leakage current will cause the condition I documented in my thread. My robot reference is a sarcastic and frustrated manifestation of my belief that there are less-than-ideal ways to accomplish this with the Boron given its sensitivity. You could use optical isolation I guess on all sensor inputs.

Thanks @Paul_M
I am in Europe and with the 2g/3g Borons at 50%-60% signal, it can take anywhere from 2mins to 5mins to get a connection. Been times it wouldn’t connect, so resetting and waiting again helped.
Could you give me an example code and timers recommended in such a case?
Thanking you in advance.
I also incorporate publish and subscribe between the 2 Borons.
It’s essential if 1 Boron is down and not responding to the other, to perhaps reset both.
1 Boron monitors a water holding tank which sends a publish that water is down to 60%. This in hand triggers a relay on the other Boron to start the well pump to refill the tank and turn off at 90%.
You can see in this case that if the monitoring Boron goes down and level almost refilled, the pump keeps running and overflowing the holding tank. At 25tonnes per hour, it’s a problem!

Boron LTE in here in New England connects in 30 seconds so sorry to hear about the worse situation.
The connection management is easy.
Manual mode takes care of it automatically, but my devices use SEMI_AUTOMATIC and the following loop:

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

void setup() {
    Cellular.on();
    Cellular.connect();
    Particle.connect();
}
void loop() {
delay(1000);
if(Cellular.ready()) {
if(!Particle.connected()) {
             Particle.connect();
}
} else {
        Cellular.on();
        Cellular.connect();
        Particle.connect();
    }
}

@Paul_M
Like this?
When I went the other day to reset all 3 Borons, I noticed that they where flashing cyan for hours.
How can I include a timer to check for if (!Particle.connected()) to reset with a U6030?

 #include <blynk.h>

SYSTEM_MODE(SEMI_AUTOMATIC)
SYSTEM_THREAD(ENABLED);

char auth[] = "xxxxxxxx"; 

BlynkTimer Blynktimer; 

const uint32_t msRetryDelay = 60000; // retry every 1min
const uint32_t msRetryTime  = 30000; // stop trying after 30sec
bool keepAlive_set = false;

     
bool   retryRunning = false;
Timer retryTimer(msRetryDelay, retryConnect);  // timer to retry connecting
Timer stopTimer(msRetryTime, stopConnect);     // timer to stop a long running try

WidgetLED pumpled(V1);
#define CurrentSensor A2 
#define battVolts V2
#define vBatteryCalibrate V16
#define BatteryBankSensor A5    
#define NUM_SAMPLESv 90 //Battery Bank Voltage Samples
#define NUM_SAMPLES 90
float vout1 = 0.0;                      //Battery Volts A3
double BatteryBankVolts = 0.0;               
float AR1 = 200000.0;             
float AR2 = 10000.0;              
int value1 = 0;

float VoltageDividerCal = 2118;
float Voltage = 0.0;
float Amps = 0.0;
float Depth = 0.0;
int sum = 0;
int sumv = 0;
int RawValue = 0;
unsigned char sample_count = 0;
unsigned char sample_countv = 0;


void retryConnect(){
  if (!Particle.connected())   // if not connected to cloud
  {
    Serial.println("reconnect");
    stopTimer.start();         // set of the timout time
    Particle.connect();        // start a reconnectin attempt
  }
  else                         // if already connected
  {
    Particle.publish("connected", PRIVATE);
    Serial.println("connected");
    retryTimer.stop();         // no further attempts required
    retryRunning = false;
  }
}
void stopConnect(){
    Serial.println("stopped");
    if (!Particle.connected()) // if after retryTime no connection
    stopTimer.stop();
}

void setup(){
  Serial.begin(115200);
  Cellular.on();
  Cellular.connect();
  Blynk.config(auth);
  Particle.process();
  Particle.connect();
  Blynktimer.setInterval(15 * 60 * 1000L, sendUptime); // 15 minutes
  System.enableUpdates();
  }
  
 void loop(){
    static uint32_t ms500 = 0;
  if (millis() - ms500 < 500) return;
  ms500 = millis();
  Particle.process();
  readanalog_task();
  readcurrent_task();
  Blynk.run();
  Blynktimer.run();
  
  if (Cellular.ready())
  {
      if (!Particle.connected()){
          Particle.connect();
      }
  }
  else 
  {
      Cellular.on();
      Cellular.connect();
      Particle.connect();
  }
  if (!keepAlive_set && Particle.connected())
    {
        Particle.keepAlive(30);
        keepAlive_set = true;
    }
    
  if (!retryRunning && !Particle.connected()){
   // if we have not already scheduled a retry and are not connected
    Serial.println("schedule");
    stopTimer.start();         // set timeout for auto-retry by system
    retryRunning = true;
    retryTimer.start();        // schedule a retry
  }
  
}





void readanalog_task(){
    while (sample_countv < NUM_SAMPLESv) {
        sumv += analogRead(BatteryBankSensor);
        sample_countv++;
        delay(1);
    }
   value1 = ((float)sumv / (float)NUM_SAMPLESv); // read the value at analog input A3 Battery Volts    200k
   vout1 = (value1 * 3.326) / 4095;
   BatteryBankVolts = vout1 * ((VoltageDividerCal - 50)/ 100);     // 11.28282828282828
   sample_countv = 0;
   sumv = 0;
}
void readcurrent_task(){
    while (sample_count < NUM_SAMPLES) {
        sum += analogRead(CurrentSensor);
        sample_count++;
        delay(1);
    }
    RawValue = ((float)sum / (float)NUM_SAMPLES);
    Voltage = RawValue * (3.38 / 4095.0);
    Depth = map(Voltage,0.0,2.93,0.0,100.0);
    Amps = Voltage * 30;
    sample_count = 0;
    sum = 0;
}
void sendUptime(){
  Blynk.virtualWrite(battVolts, BatteryBankVolts);
  Blynk.virtualWrite(V66, Depth);
  Blynk.virtualWrite(V67, Voltage);
}
BLYNK_CONNECTED() { // runs once at device startup, once connected to server. 
  Blynk.syncVirtual(vBatteryCalibrate); 
}
BLYNK_WRITE(vBatteryCalibrate){     // Blynk app WRITES Slider widget on V16
    VoltageDividerCal = param.asInt();
}

@hydrotruth, I use Automatic Mode and Threading with the U6030 Relay board.
Each Boron periodically Publishes to a general "WDT" Webhook integration and also Subscribes to the same using System.deviceID().
My particular "WDT" Webhook sends data to ThingSpeak, which replies with a specific response when successful.
The Boron Event Handler then Pets the U6030's CH1 Pin resetting the countdown timer.

If the Countdown timer ever reaches Zero (data isn't reaching ThingSpeak for any reason), the Relay Board powers down the Boron (and all 12V sensors) for 30 seconds, which is obviously adjustable.

This might be an atomic bomb approach, but it removes the need for all the Manual or Semi-Automatic Connection Code, and it's the only reliable way I've been able to use a Boron.

I've handled this a few different ways with Water Tanks + Remote Wells.
The Simple Version:
The Tank's Boron can send a Heartbeat to the Well during the Pumping State to confirm that Water is still required. The Well can have a Pre-Set Time Limit Coded to stop pumping unless the Tank is actively still requesting water.

But you will need to decide if you prefer FailSafe or FailSecure.
For some Applications, I prefer to overflow the tank instead of running out of water.
Murphy's Law says the Telemetry System will want to fail during a Fire Event.....and the town burns down...... :joy:...... verses the possibility of overflowing the tank during a Telemetry Failure.

3 Likes

Do you mind sharing your code with this approach?
Never used a watchdog before. Seeing an example will assist me with proofing my project.

Here's the basic code and description in the bottom comment:
https://go.particle.io/shared_apps/5f84bd63e6f0b000092a1d81

The Pub/Sub as the WDT also works with the U6030 Relay Board without using ThingSpeak, but I'm normally logging other data to TS anyway.

As @Paul_M convinced me earlier in this Thread, Mode 5 is a better choice than Mode 7 as I originally used.

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.