Tips for deploying reliable LTE Boron that stays up

The video explains whats going on better than me but yes you are correct, you have been using a software watchdog and not the hardware watchdog which is better.

I’m not sure why they have not given us access to the hardware watchdog as the video shows you how to do but I’m sure there is a reason for it.

This will work for the Argon & Boron

1 Like

@RWB Thank you for this! I can’t believe that this isn’t common knowledge. Maybe I was the odd idiot and everyone else knew this, but I doubt it, because I’ve spent a significant amount of time on these forums and working with Particle Boron since May 2019, and I never until now knew that ApplicationWatchdog was fake news.

I write this as I try to get the azure integration properly functioning (particle plaftorm is currently wrapping my json object in a string) for an industrial use case and this thread is worrying the hell out of me.

I didn’t find that video until a few months ago but then again I had put the IOT stuff down for awhile as I dealt with the curve balls life threw at me over the last 9 months.

The Application watchdog is doing what I need it to do now and I haven’t needed to enable the Hardware watchdog yet but if i was doing a remote Boron setup I certainly would be using that hardware watchdog version.

Usually if you can reset the device it will eventually connect, so you just need a reliable way to make sure it resets if it gets hung up due to a long bout of cellular connection issues which will certainly happen over time.

1 Like

I write this as I try to get the azure integration properly functioning (particle plaftorm is currently wrapping my json object in a string) for an industrial use case and this thread is worrying the hell out of me.

1 Like

A search on HW watchdog may reveal that it has not been enough to secure reliability in the past for the Boron. In addition the Boron needs a 30s power down capability to be sure, like this example here:

@thrmttnw Thanks for sharing this. I had already read that thread and 1) concluded it didn’t help me, being unable to order custom PCBs and solder tiny SMD components, and 2) was unable to answer the following question:

Are you saying that a Boron power cycle must be at least 30 seconds for it to have a cleansing effect?

Regarding the hardware watchdog option which I was excited about above, I did read one post saying that it caused the Boron to reboot into DFU mode. Is that a misbehavior you are referring to?

Is there a way to stably and reliably use the hardware watchdog?

The Particle documentation does refer to it.

From posts and my own experience 10 seconds is too little, 15 second may be just about enough so 30 seconds to be sure as according to the link quoted.

No, I am referring to the fact that having a HW watchdog has saved many devices from most but not all disconnects. In some cases the longer power cycle has been needed at least for Boron to recover from “having trouble connecting to the cloud”.

1 Like

Just wanted to chime in here and say, if you do go the external watchdog route, don’t use the Deep Reset Tutorial posted above if you want maximum reliability.

The EN pin has an inherent flaw, whereby it locks up the Boron if anything else that attaches to the boron can backfeed more than a minimal amount of power. Lots of sensors and circuits backfeed enough power to allow the Boron to be locked up this way.

I think @rickkas7 should post a warning on that tutorial and Particle should post a warning on its Boron data sheet. Knowing this could have saved me $10K and a week of my time!

We have around 100 Borons in the field (and 1000+ electrons) in remote locations all with external hardware watchdogs. I think you may be spinning out here – If I were you, I would go back to the drawing board on your project and implement a proper hardware watchdog. Sorry if this is a painful message to hear.

1 Like

@hwestbrook Thank you for sharing this. I think you’re right: there’s no way to use Particle Boron LTE for reliable, permanent remote monitoring without having an external supervisory circuit that will depower/repower the Boron after a period of time goes by that the Boron doesn’t send it any “I’m connected” pulses.

The question now for me is whether something as cheap and simple as the following $13 board from Amazon will be enough to accomplish this:

If so, it is pretty amazing that such a simple and cheap addition could have saved me so much time, money, and stress in my false reliance on Boron-only.

Something like that would probably work, as long as it is reliable. Cost doesn’t have a lot to do with reliability. More important is testing it under all the edge cases you can think of. Like “what if power goes out” or “what happens in a brownout” or “what happens if the boron gets stuck in a reset loop and triggers this watchdog”.

1 Like

I’ve use these Timer Relay Boards with Gen2 and Gen3. I have a post on this forum somewhere with a few details. I buy them in bulk for a few bucks…but you have to wait for the slow boat from China.

Previously, I used them for Electrons operating as private MESH GateWays which have “mains” power.
But it’s also a good fit for Solar Boron’s, you just have to change your power strategy to a 12V battery and Panel.
A good combo is this $28 Panel w/ Controller, plus a $15 12V SLA battery.

You can setup the Timer Relay to only restart it’s countdown when your final cloud endpoint (I use ThingSpeak) responds correctly. That way, it’s confirming you are actually getting the data to the endpoint, and not just connected to the Cloud. I typically use a separate webhook just for the WDT publish.

1 Like

If the external watchdog timer is connected to the RST pin, is that sufficient enough to cover all scenarios?

No, I believe at least two (likely 3 min) widespread Cloud Outages/Issues that required Boron LTE’s to be manually Power Cycled. Reset didn’t fix them.

2 Likes

Hey @Paul_M – thanks for taking the time to draft this note. Given your question, I wanted to spend a little bit of time providing a high level perspective on connectivity management and why we’re investing in LTS releases for Device OS.

What should be true

Let’s start with a statement – our goal is to provide automatic, reliable, and resilient Cloud connectivity management for all Particle devices. This means that:

  • Devices should not require manual reset to recover their cloud connection
  • Customers should rarely, if ever, need to manually manage connection health
    • If they need to do so, it should be done via tightly scoped Device OS APIs and not through brute force management of the cellular modem or MCU (which can interfere with Device OS and create separate, negative interactions of their own)

Where we are not delivering against these standards, it is imperative for us to provide high quality support to better understand the root cause of the issue so that it can be resolved. This is doubly true as we invest in Device OS 2.x, our first long term support release for Device OS.

The role of LTS (Device OS 2.x)

One of the biggest factors that has affected reliability of Particle devices is the hit or miss nature of historical individual Device OS releases. Because we have typically combined bug fixes and feature development into a single release branch, customers have reported varying levels of success with different versions of Device OS depending on what features they are using and whether/where regressions were introduced.

In the Spring, we announced our intention to build an LTS release branch for Device OS. Long Term Support (LTS) releases for Device OS are independent branches of Device OS that are feature-frozen in time. They do not receive updates with new features, API changes, or improvements that change the function or standard behavior of the device. You can learn more about LTS releases in our documentation, here.

The 2.0.0-rc.1 release that you referenced in your post is the first, alpha candidate for Device OS 2.0. We have frozen feature development for this release line and will continue to test, identify, diagnose, and resolve issues over the next several months until it reaches GA quality and delivers the most reliable development experience of any Device OS, including v1.3.1-rc.1 which you have had success with in the past.

What version of Device OS should I use?

Once LTS is released in General Availability, this question will have a very simple answer – we recommend that customers who value reliability over the latest and greatest features use the latest version of Device OS from our LTS release line (currently 2.x).

Right now, because we have only very recently released the first 2.x alpha release, it may not yet be the best for your production application. While we encourage you to test against this new release, you should only deploy it to remote devices when you are satisfied that it meets or exceeds the reliability of your existing Particle application.

In the meantime, we will continue to patch issues that we identify and will gladly continue to support customers on older versions of Device OS with issues that they discover.

1 Like

Mkr1500 and boron, different firmware and use cases, both got an attiny85 as a watchdog. A hardware watchdog should be standard on these, that it isn’t says alot abougwhere particle wants to be in the market.

We also track wtchdog resets, and report them back to our server to identify code issues and network issues. Its amazing how fewer xellulat issues we have i. Some areas now that rushhouris so light.

1 Like

@will Thanks Will, I appreciate this write-up and look forward to the future of your product. I can tell you that 1.3.1-rc1 delivered pretty well on your goal in terms of cellular. If you have your engineers literally take two Borons and compare cell reconnection with even 1.5.0, and certainly 2.0.0-rc1 but I have noticed even 1.5.0, you will see the issue for itself.

The LTS versioning is a good idea, but if Particle wants to sell to industrial high- reliability remote monitoring cellular clients, your two highest priorities MUST be:

  1. Enabling usage of the hardware watchdog by fixing the bugs that eventually locks the device in DFU mode; and
  2. Fixing the sometime-post-1.3.1-rc1 breaking of the reconnection behavior.

I have made recent threads on each of these with further detail.

As for myself, I am doing much better now having learned these painful, expensive lessons, and knowing to use the external circuit + older working OS version.

Thanks for the perspective, @Paul_M – if you can link me to the threads you’ve referenced I’ll append them to the internal conversation we’re having about this thread and the issues raised in it.

@will Sure thing:

https://community.particle.io/t/stable-cellular-reconnection-newly-ruined-with-boron-2-0-0-rc1-and-perhaps-earlier/57264/6

https://community.particle.io/t/help-boron-hardware-watchdog-unusable-always-winds-up-stuck-in-dfu-mode-despite-documentations-suggestion/57330/10

Thanks.

This is 3.3v watchdog with grove input , works great
https://www.amazon.com/dp/B00OL1N7R2/ref=cm_sw_r_cp_awdb_btf_t1_h0XmFb4JDV1MT