(Video) New Boron FAILS to reboot on VUSB power cycle - external watchdog incapable of resetting device

Because words cannot describe how frustrated I am with Particle's failing hardware, I will skip the literary dramatics and cut to the chase.

I've been told recently it's critical to use an external power-switching watchdog on the Boron for critical applications:

I've done just that for my recent deployment, and as a result, the entire Boron is DEAD - in permanent disconnectivity - hours and hundreds of dollars away - and refuses to boot up in perfect power and cellular

I have reproduced where the Boron will enter some sort of brownout/crash/no-LED death state up switching the +5V VUSB-in with the same U6030 external timer that was previously mentioned.

There is a lot of information (and a video) so please stick with me:

  1. Prior to deployment of my recent remote station, I observed where power switching VUSB does NOT ACCOMPLISH A RESET, and rather I needed to physically remove the Boron and plop it back in, or pull the power on the entire circuit to make this happen. It was connected over RX/TX and SCL/SDA to a powered Teensy 3.5 microcontroller. So it seems whatever tiny, pathetic, puny pull-up current from the RX/TX UART is preventing the Boron from rebooting ever. The SCL/SDA pullups were connected to 3V3 on the Boron so those couldn't have been powered when off.

  2. I was incredibly frustrated to witness this behavior as it appeared just 1 day before my deployment and utterly runined the hopes I had that using an external hardware watchdog would take care of these issues and get me a never-permanently-dying-until-manually-power-cycled-by-a-human Boron.

  3. Fickle, random, surprise, and unrepeatable behavior with the Boron: when I had first tested the U6030 as an external watchdog switch on +5V VUSB on the Boron, it worked fine. It would shut off the Boron, then when the power came back on, the Boron would boot normally. I tested this a handful of times a week or two again.
    Then, randomly, when I tested on 8/18 days later on the same exact board, after it had been up for a few days, switching back on the VUSB power would produce this dead-Boron result:

VIDEO: https://youtu.be/vZ-UY8WSM4A

  1. The video is from 8/18, before my deployment. After I took the video, I took the risk of keeping the U6030 power switch but with a brand-new-from-box Boron, which I tested did work when rebooted by the U6030 (just like the first one HAD, temporarily, until the video above). What's the definition of insanity again?

  2. After I left my arduous two-day deployment involving hours of driving, an expensive and logically difficult car ferry, and a hotel reservation, the Boron was up for a total of 4 hours before disconnecting and never again reconnecting, an obvious result of the same behavior as I had capture on video above for the first Boron.

  3. The cellular signal strength was a whole 48% at the site - better than some of my other sites which always reconnect on Boron 1.3.1-rc1 - so that is not an issue.

  4. All referenced Borons used the 1.3.1-rc1 which successfully reconnects absent an external hardware watchdog (longest proof period of this claim for me has been 9 months from September '19 to June '20).

7b. (Added in post edit): When repowered, the yellow LED flashes (no LiPo connected), but the main LED is dead and the device does not boot.

  1. The frustrating irony is that, as a result of using an external hardware watchdog in addition to the stable V1.3.1-rc1 version as an attempt to make it EVEN MORE reliable, the result has been the total opposite.

  2. This is Particle's fault. An extremely tiny pullup current should not totally incapacitate the Boron from rebooting when VUSB is totally depowered by a physical relay switch and then repowered. Additionally, it is Particle's hardware's fault because whatever issue there is should be repeatable. But, as we saw above, both Borons at first would successfully reboot with VUSB switch, and then mysteriously and randomly degrade with little use into the condition of never booting back up when power is toggled.

  3. Similar phenomena have been alluded to on this forum before, and Particle hasn't fixed it: Locked up Boron, ignores ~EN pin: how can this happen? - #9 by hwestbrook

  4. I originally had digital connections to a few of the A0-A4 input pins that had pullups. I sacrificed my laborious hand-soldered perfboard work by cutting these wires, leaving only the RX/TX and SCL/SDA pins connected to the powered Teensy. There was an SD card attached to the 3.3V and SPI pins, but that was not powered by anything other than the Boron's 3.3V regulator so it's not at fault. Further, the pullups on the SCL/SDA pins were from the Boron's 3.3V regulator. So how is it possible that connecting merely to another powered microcontroller over UART causes the Boron to be unable to reboot when power cycled through switching VUSB?

  5. Are there any suggestions on how to actually use the Paricle Boron in a high-reliability environment, or is it a totally useless toy product for hobbyists, incapable of being reset with a power switching external watchdog?

  6. Does Particle expect me to, instead of using a VUSB switching watchdog, design a robotic arm inside my enclosure that will grasp the Boron, physically remove it from the pin headers, and push it back in, in order to accomplish fault-condition power switching?

  7. Is Particle going to fix the EN pin not always resetting the board to make this problem go away?

  8. Is there hope in trying a low-side switch on GND? Could not the same leakage condition still exist in reverse voltage? Is there even hope in trying this given the unpredictable variability I have spoken of before? It's the ultimate "boy who cried wolf" situation with Particle. I have faith that a method will finally work, and then I deploy, and then the failing Boron hours away disconnects and never reconnects, damaging my mental health. If I try a low-side switch and it seems to work, how do I know I won't deploy it (with a totally redone board for this purpose, mind you), only to find this blackout-condition to rear its head again after I leave?

  9. Am I expected to install a massive array of physical power switching relays to be connected to each of every pin of the Boron, and expend a huge amount of current during reset conditions to disconnect and reconnect not only VUSB, but also every single pin?

  10. Would somehow using a double-switch on both GND and VUSB be guaranteed to fix this?

  11. How should I proceed with the goal of getting an external hardware watchdog on the Boron that actually makes it more reliable, instead of ensuring it will fail and enter permanent disconnectivity requiring an expensive servicing mission?

I'm thankful for all the help I can get. Thank you for your time and patience.

Here is a brand new video simplifying my whole original post and showing the issue in one clear comparison:

The video shows:

  1. The Boron failing to reboot when high-side switched on by VUSB;
  2. and then, immediately thereafter, successfully booting when connected directly to USB and nothing else.

Some general feedback on this post - my experience has been that asking more than one or two questions at a time within a post turns people off from answering sometimes. I find the most effective format is to provide a paragraph of context / problem summary, bullet points of the relevant configuration (fw version, connected peripherals, etc), and then have your theory and work so far in just a few paragraphs if possible. Not trying to be an ass or anything but want to make sure you aren't missing out on some of the input from some of the really wicked smart, dedicated members of the community (who know way more than I!)

The Particle stuff is frankly fairly consumer level a lot of the time. That doesn't mean it automatically sucks, but it does mean that those of us using them in industrial environments have to do our due diligence at figuring out how to make sure it ultimately works super reliably. It's the nature of the beast unfortunately.

Some suggestions on troubleshooting and bug reporting - you suspect the issue is related to UART. I would do the following if it was me:

  1. Use a scope to look at VUSB & the UART lines through power-on.
  2. On your PCB, flash tinker to the Boron and see if the issue is replicable.
  3. If it does not appear, add the UART init code into a blank sketch and see if the issue is replicable.
  4. Once you have reproduced the issue with the minimal firmware code (either of the above two), pop the boron on a breadboard and wire the usb power and the UART over to a Teensy (ideally also on a breadboard).

If you can reproduce the issue on the breadboard, it's really likely it's a legit bug. If you can't, there is a good chance there is some weird PCB issue or firmware issue with your design. While it maybe is the thing the Boron should be tolerant to, it could be easily fixable if so.

Is there hope in trying a low-side switch on GND

Could be worth trying, but I doubt you'd see a difference.

Am I expected to install a massive array of physical power switching relays

Certainly not, and I've never heard of anyone having had to do this in the past 2 years of Borons. There is definitely some specific issue here.

@justicefreed_amper I appreciate the recommendation and help. Right as you posted, I posted a new video doing exactly what you recommended reduced to the bare-bones with a breadboard. However, I did not flash tinker as you suggest. I can redo it what that good suggestion. However, the video shows a proper control test. The video shows the Boron booting when powered by USB connector but not when VUSB power switched on when connected to RX/TX to Teensy. But, I will redo this with your suggestions. Thank you for your advice.

1 Like

That new video is nicely specific, thanks. Because you are in multi-threaded mode, some of your initialization and setup code may execute before the state machine starts the cell connection process (and thus the LED change). Therefore, it’s possible that the UART connection is getting hung up in something in the application firmware.

If it works fine with Tinker firmware, I would try delaying the UART init in your main code for a second or something and see if that helps. There may be some dependency there. I also would strongly recommend just seeing if this behavior goes away in single-threaded operation.

I will have to test this with Tinker because you have a good note about the UART. I really think it’s an electrical problem with the Boron design where a tiny amount of pin input leakage current from another MCU will cause a failure to cleanly boot, but in order to prove this theory, I do have to test with Tinker. Especially because you are right, my setup code inits UART before cellular:

 const uint8_t val = 0x01;
    dct_write_app_data(&val, DCT_SETUP_DONE_OFFSET, 1);
    wd.checkin();
    pinMode(OUT_ALIVE_PIN, OUTPUT);
    digitalWrite(OUT_ALIVE_PIN, LOW);
    Serial1.begin(115200);  
    serBuf.setup();
    Particle.function("getVoltage", readVoltage);
    wd.checkin();
    Cellular.on();
    wd.checkin();
    Cellular.connect();
    wd.checkin();
    Particle.connect();
    wd.checkin();
    snprintf(dummyStr, sizeof(dummyStr), "%d",Time.now());
    Particle.publish("ON_TIME", dummyStr, NO_ACK);
    delay(1000);
    Wire.setSpeed(CLOCK_SPEED_400KHZ);
    Wire.begin(0x59);
    Wire.onReceive(receiveEvent);
    unsigned long tS = millis();
    Particle.publishVitals(60*30);
    wd.checkin();

Is that code with SYSTEM_THREAD(ENABLED)?

Keep in mind that the UART pins literally go straight into the nRF52840 MCU, so it’s not like the Boron does anything special with it. Just straight into the processor, so if there is a hardware bug it would be with the MCU and therefore likely a defect.

Now if some register is not getting initialized properly or something I suppose there could be some weird stuff, but if you can’t see something happening on a scope I’m skeptical of some mysterious “electrical” thing and am instead finding it more likely that you are getting a hangup in the UART init process where there is some failure that isn’t handled properly or something in the system firmware (which you can probably then compensate for)

Yes, SYSTEM_THREAD(ENABLED) and SEMI_AUTOMATIC.

I will flash Tinker and prove that the UART pin brownout state causes switching VUSB to fail to reset the Boron in a manner it wouldn't if nothing but VUSB and GND were connected to anything.

Yeah, just good to make sure we can explicitly rule out some of this easy stuff :slight_smile:

Also, if another boron doesn’t demonstrate the same behavior in this situation, it’s just really likely it’s broken somehow. Which could btw totally happen if the grounds aren’t matched on your Teensy and your power input. Or weren’t at any point in the past. Definitely double check you don’t have a loop or discrepancy in GND voltage.

1 Like

Hey there @Paul_M,

At first glance it looks like the way you’ve implemented the HW watchdog might create some undesired issues that are contributing the video you posted.

The lowest hanging fruit is that I would recommend applying the watchdog on the EN pin, which is designed for resetting the device in an application like this one.

I’ve asked for @rickkas7, who owns our technical documentation, to write up an official application note (HW design + firmware) for implementing a HW watchdog as it is a fairly common need among our customers.

Additionally, I’ve asked for someone on our Sales Engineering team to post here with some best practices around implementing HW watchdogs in the interim while we work on that application note.

3 Likes

@will Thank you very much Will for your attention and response. We are clear that, as has been recently recognized, power cycling VUSB is a killer because the leakage current from input pins is enough to prevent rebooting.

Please understand that the only motivation of going through the extra hassle of switching VUSB rather than toggling EN (much easier, no relay required) is because of reports/sentiments on this forum that there are circumstances where switching EN is not enough to recover the device from a state needing such intervention in the first place.

I don’t know how true those sentiments are about EN not being an acceptable solution, but I look forward to your technical memorandum which will hopefully discuss this.

Will, thank you for your response. Particle has good support which should be acknowledged amid the frustrations I’ve had with the Boron product.

1 Like

EN should be 100% reliable as long as you can guarantee that current will not flow back into the nRF52840 via GPIO pins or pull-ups to 3V3.

On the Boron, pulling EN low disconnects a load switch (U2, XC8107) that de-powers both the cellular modem supply (VSYS) and the power into the 3.3V regulator (U1, XC9258) powering 3V3.

The place where you can get into trouble is when you have external circuitry that is powered by something other than 3V3. If this is powered and connected to a GPIO, the power flowing back into the MCU from the GPIO, or through a pull-up to 3V3, can be enough power to keep the nRF52840 from resetting. As long as you avoid that, the MCU will definitely reset when the power is removed.

4 Likes

This topic was automatically closed 182 days after the last reply. New replies are no longer allowed.