Hi—we’ve had several Borons in the field for the past 18 months and have semi-recently began observing new connectivity-related behavior. (This is related to a prior thread I started at Confusing Boron connectivity issues , though the issue seems more pervasive than it did in that description and seems to warrant its own discussion).
For ~ 15 months, we had devices running at 97±0.5% uptime where I’m defining uptime as the percent of anticipated hourly cellular transmissions with actual data transmitted. These devices were running Device OS 4.2.0 through 2024 into early 2025. In mid-2025, we began seeing a severe dip in performance (some devices as low as 0% uptime) and so we began servicing the instruments to (1) update to 6.3.3 and (2) increase what we call MAX_TIME_TO_PUBLISH_MS from 20 seconds (which, again, worked 97% of the time) to 600 seconds. [MAX_TIME_TO_PUBLISH_MS is a check against the device’s clock to see how long it’s been trying to connect; once the clock exceeds that time, it goes to sleep so that it doesn’t burn too much more power.]
Frustratingly, however, these seemingly significant changes have not fixed the issue. We are currently sitting at < 20% uptime across all devices, with some < 5%. (BTW, all other indicators I can think to check are fine; for instance, internal datalogging on a micro-SD card works 100% of the time across all sensors, so firmware seems to be doing what we want other than connecting to the cloud).
We’re trying to minimize power draw, so keeping the cell modem on for even 10 minutes is undesirable (the old 20-second timeout was great!) but we can stomach it if needed. Much over 10 minutes starts to become problematic, though, especially in the winter when we get less charging of our solar panels.
One other idea we’re testing is a once-per-day longer timeout in case the device needs a SIM/IMSI/modem reset, such as:
The jump from 4.2.0 to 6.3.3 is pretty significant, and there are many things that have changed, so it’s pretty hard to say with certainty what’s going on here.
The bandmask was opened up, so depending on the SKU, it will not scan for more bands than it did prior to the change (~5.8.0 I believe), so that’s one area that could affect this. That being said - 10mins is far above the expected ~90s from a cold boot. Once the modem has a session to the local tower it generally connects much faster.
Another issue could be memory usage; if you are running with >~85% of your memory allocated, that presents a challenge when doing handshakes which are memory intensive.
This has been the case since the EtherSIM launch; a SIM needs between 3-5mins to switch carriers - and cycling through all three can take up to 15mins, with a modem reset at 10mins usually being the kick a device requires in troublesome situations.
Thanks for the quick reply. Good point about the massive leap in Device OS version.
…
with a modem reset at 10mins usually being the kick a device requires in troublesome situations
What is the “optimal” timeout, in this case? Surely it must be > 10 mins to allow for it to complete the reset. Is there official advice on how long to keep it awake to allow full reset? Is the 15 mins you reference the way to go?
What’s the best way to get you more info? I can give device IDs via a DM if that is the best way to go. I don’t have a ton in the way of logs, but of course can access the history in console if that’s instructive.
You don't need to wait > 10 minutes on every failure to connect if you are also using sleep mode. You can instead stay awake longer in failure state once per day, or whatever period is appropriate for the maximum amount of time the device should be offline in a failure condition.
Thanks, @rickkas7. So to try to translate from english to code, does my chunk above seem like a reasonable implementation, or are you recommending something else?
Also, maybe I bump that first one up to 900000 to give the modem time to reset? Appreciate the help!
Hi again—after two weeks of testing, I am not seeing significantly better results despite implementing this new approach to cellular timeout.
This is, to be frank, completely shocking because from 11-Mar-2024 until 14-July-2025, we had 97% uptime with a meager 20-second timeout. Then beginning 14-July-2025 and persisting until today, we’ve seen < 25% successful cell transmissions despite first lengthening the timeout from 20 seconds to 60 and now to the implementation above (as much as 15 mins once per day).
worth testing? We’re on a Boron 404x, and it appears that wake from hibernate based on time is not an option, and the device is deployed remotely so a button press isn’t viable either (in other words, I’m not really sure how to implement a hibernate option).
One other potential clue is that that the device usually works for a few days after flashing new firmware but then goes down (typically permanently) thereafter. So, again, I continue to have evidence that there’s a cell signal there but for whatever reason after a couple days it totally loses it.
Did you implement an out of memory handler? If the heap is fragmented and Device OS cannot allocate the memory to connect it will fail, and resetting the modem will not help and the device will not reconnect until reset.