I have a system in the field that has worked well for at least 1 year. However, after that year and over the last 6 months, the device will connect and send a message to Ubidots every 20 minutes for roughly 4 days and then the device will drop offline and not connect again for roughly 2 days and then reconnect and work fine for another 4 days and repeat.
Notes + what I have tried:
AT&T Carrier
Cell tower: 310-410-8328-39356682
Using internal sim on Boron
5V wall power + 2000 mah battery connected
Automatic Mode
Thread enabled
OS 4.2.0
Ubidots LIbrary 3.2.3
About 82K of ram remains available
The device is not resetting. It continues to run the local control program the entire time, with or without cell communication. When online, I can retrieve reset reason and confirm that it was not due to watchdog, code, SOS, or error, etc.
The antenna has been upgraded to enhance signal strength. Now running at 90% strength and 50% quality.
Was using Ubidots library and integration method through particle. When it would regularly cycle as described.
Changed to Ubidots library and TCP method to reach Ubidots. This caused the system to run for 9 days before disconnect. It has stayed disconnected since.
Message rate average is about 1 message every 20 minutes, sending 1 variable. Sometimes 2 variables will be sent at the same time.
This problem seemed to appear when a local I2C display was added.
However, tests were run wth all I2C disabled with the same results. So, unsure if the issue was simply coincidental.
I installed a second system BRN404X nearby to see how it acted in similar conditions running the same code. It was not attached to any hardware but exhibited a similar issue. When it would drop offline, it stayed that way until reset.
I carefully looked for any arrays being overrun, etc..
Next step: upgrade OS to 6.0.0.
It was noticed that there would be only 1 or 2 disconnects for a few days and then there might be 5 to 8 disconnects on the day that service cuts off.
It is possible that I still have some kind of program bug or a former bug caused some kind of corruption. However, I have exhausted everything that I can think about.
Next step is to get to site and run device doctor.
I would welcome any advice. Have you seen similar?
The best way to troubleshoot this if you can attach a laptop to the device to receive the USB serial logs is to enable
SerialLogHandler logHandler(LOG_LEVEL_TRACE);
and monitor the logs after the problem occurs.
Make sure you have an out of memory handler. If the device starts resetting after adding one, you have a memory leak or heap fragmentation.
When the device is not uploading to Ubidots, is it responsive to the Particle cloud? You may want to add a Particle.function that you can query remotely to see if the cloud is working when the problem is occurring.
I had this problem a year ago. My Boron BRN404x with a SARA-R510S-01B-00 modem locked up the same way you're showing in your serial trace. (I reported that problem here too https://community.particle.io/t/boron-brn404x-modem-lockup/67618). My unit had been running OK for several months before this happened. I had in my firmware a System.reset(RESET_NO_WAIT); should it not be able to connect to the particle cloud for more than an hour.
This was not fixing it.
My unit was remote and I couldn't visit it for another 3 months during which it continued to firmware reset every hour. Once I got there I cycled power to it and it connected successfully within 1 minute.
Next, I added an external device that cycles power to the Boron once a day. I'm OK/functional should it lock up for a day, but I can't have it lock for longer than that.
As suggested by Gus at the time, I changed the System.reset to a deliberate watchdog timeout should this issue happen. Since last summer, my Boron has only been disconnected from the Participle cloud over an hour once, so I can't tell if this type of reset actually fixes the modem sudden lockup, since I don't actually communicate with the Boron every day.
(If I am able to verify with certainty that is does, I'll report it here).
I'm also experiencing a similarly regular connectivity loss in the field. I'm using 5 Boron BRN402 devices. All connect to cellular and Particle cloud every hour as intended and then fail to connect (blinking green) beginning at exactly hour 60 (i.e., 2.5 days). I left one of these Boron devices undisturbed to continue attempting to reconnect every 15 minutes and it recovered after an additional 60 hours (i.e., in total 5 days).
AT&T, internal SIM
Semi-automatic mode, system thread enabled
Device OS 6.1.1 and 6.2.1
10Ah lithium ion battery, nearly full 4.10V to 4.15V
I checked my code and libraries for any use of the heap (malloc/calloc/new) apart from global variables in setup. I added an out of memory handler, which is not triggered. I also added additional logic to System.reset after 7 hours of failed attempts. This additional logic is triggered and restores connectivity.
So if System.reset is sufficient for my case, is this still failure of the cellular modem in all 5 devices?
I will try changing the firmware to preemptively call System.reset every day and see if that is sufficient to prevent connectivity loss.
It's unclear what would cause the behavior you are seeing.
The device will always reconnect to the cloud after 3 days, because the DTLS session expires and needs to be reauthenticated. However that should only go back to blinking cyan, not blinking green, as the PDP session does not need to be reestablished.
It would be interesting to see if the daily reboot solves the issue. Normally I would suspect a memory leak but your out of memory handler is not being triggered. It could be memory corruption, however. In any case, it would be a good test even if not the actual solution.
My code wakes up every 15 minutes to take a measurement and publishes when there are 4 measurements pending (i.e., normally every hour). If there's 28 measurements pending, meaning it hasn't successfully connected/published in 7 hours, it restarts.
It looks like the daily reboot worked and so far no connectivity loss since then. I'd prefer to fix the underlying issue, but this is good progress.
@bwhitsitt Does periodically calling System.reset prevent connectivity loss on your end? I'm wondering if there is an issue specific to the BRN402 perhaps on older modem firmware.
I'm not able to confirm that System.reset would restore connectivity. However, I can confirm that powering down and back up would restore connectivity.
However, a new problem came up recently where powering down would not restore connectivity, and the device remains permanently unable to connect. I'm about to replace the device this weekend.
I could not trap any kind of memory issues or array overruns, etc. Since it worked well for so long, I didn't suspect a program cause anyway.
On 2/22/2025 replaced the Boron 402 with Boron 404X. Connectivity issues have been resolved.
There is now an average of 1 cloud disconnect events per day. However, the device recovers and reconnects.
It is worth noting that this failure mode presented 6 months ago as loss of communication for about 2 days and the normal communication for about 4 days without intervening. This pattern varied slightly but repeated for several months until it advanced to total communication loss. Device reset and power reset would not recover cell communication.
A new device running the same program now operates normally.
There are two differences between the 402 and 404X: the SIM card and the cellular modem. In the United States, the 402 can only connect to AT&T, and the 404X can connect to AT&T, T-Mobile, or for enterprise devices, Verizon. The cellular modem is probably the biggest difference; the 402 has the older u-blox R410-02-B-01 and the 404X has the newer R510S-01-B. The 404 (not-X) may have a R410-02-B-01 or R410-02-B-03.