Hi, folks. We have four Boron 404x's installed in the field in the USA. All four are running Device OS 4.2.0 (it wasn't broke, so we didn't fix it...). Two devices were installed 13 months ago and have been operating nearly continuously notwithstanding a few presumed cell signal drops over the year (estimating > 95% uptime). Two additional devices were installed 2 months ago. One of these two has been performing as expected, but the other is experiencing weird issues where it logs data internally to an SD card (an indicator it's mostly running as expected) but stops transmitting data over cell after 1 week of deployment.
We've seen the same issue twice: deploy sensor, observe nominal performance for 1–2 weeks, followed by complete and indefinite cessation of cellular transmissions for several weeks (zero transmissions). These two instances used two different Boron 404x's, so it's not specific to the Boron. We've also swapped antennas and observe the same phenomenon. That site shows 30–47.5% signal strength in the device-diagnostics file on console from the windows during which it was working. It's of course not out of the question that it's a signal droppage issue, but it's unusual that it'd work perfectly for 1 or 2 weeks and then completely stop transmitting (though otherwise work as expected with internal logging) for the next couple weeks. Any ideas or other places we should look for further troubleshooting?
What is the status LED when stuck? Is it blinking green, blinking cyan, or some other combination?
What is the free memory in Device Vitals before the devices stop transmitting? If the firmware has a memory leak it could not have enough memory to reconnect to the cloud, but your local code could still function normally.
The other thing to do is add:
SerialLogHandler logHandler(LOG_LEVEL_TRACE);
to your firmware. When the problem happens, attach a laptop to the USB port and capture the logs. Get more than 10 minutes of logs to make sure you get a full modem reset sequence.
However you should really upgrade to 6.2.1. There have been so many bug fixes; these are the cumulative releases notes between those two versions.
Thanks, @rickkas7!
The LED blinks green for ≈ 20 sec before turning white for a moment before everything turns off (sleep interval). Our firmware uses a 20-second MAX_TIME_TO_PUBLISH_MS
"timeout" to ensure we don't get stuck indefinitely in a trying-to-connect state. We could try extending this timeout, but I find it a bit strange that it'd work perfectly for one or two straight weeks at almost 100% uptime and then have zero successful transmissions for the two following weeks. I would expect more intermittent performance if it were just a periodic cell connection issue. This is deployed within city limits (but not overpopulated) and we haven't observed other cell coverage issues at this site.
Here are some interesting metrics from the diagnostics (copying just a few potentially relevant columns to try to avoid giving away private data). The device transmitted all expected data from 1–14 Apr then halted suddenly.
timestamp |
device.network.connection.disconnects |
device.network.connection.attempts |
device.cloud.coap.transmit |
device.cloud.coap.retransmit |
device.system.memory.used |
device.system.memory.total |
2025-04-13T17:00:29.845Z |
288 |
577 |
1198 |
41 |
70164 |
165340 |
2025-04-10T17:00:30.164Z |
216 |
433 |
901 |
30 |
70164 |
165340 |
2025-04-07T17:00:30.178Z |
144 |
289 |
604 |
27 |
70164 |
165340 |
2025-04-04T17:00:32.013Z |
72 |
145 |
307 |
8 |
70180 |
165340 |
2025-04-01T17:19:18.731Z |
0 |
1 |
10 |
0 |
70140 |
165340 |
2025-03-31T13:34:40.724Z |
0 |
1 |
10 |
0 |
70156 |
165340 |
2025-03-31T13:28:53.682Z |
0 |
1 |
10 |
0 |
70308 |
165340 |
2025-03-28T16:36:44.486Z |
0 |
1 |
10 |
0 |
70140 |
165340 |
We will follow your other suggestions as well—thanks again!
Here's the most relevant firmware chunk for what it's worth:
case PUBLISH_STATE: {
// Prep for cellular transmission
bool isMaxTime = false;
stateTime = millis();
while (!isMaxTime) {
//connect particle to the cloud
if (Particle.connected() == false) {
Particle.connect();
Log.info("Trying to connect");
}
// If connected, publish data buffer
if (Particle.connected()) {
Log.info("publishing data");
// bool (or Future) below requires acknowledgment to proceed
bool success = Particle.publish(eventName, data, 60, PRIVATE, WITH_ACK);
Log.info("publish result %d", success);
isMaxTime = true;
state = SLEEP_STATE;
}
// If not connected after certain amount of time, go to sleep to save battery
else {
// Took too long to publish, just go to sleep
if (millis() - stateTime >= MAX_TIME_TO_PUBLISH_MS) {
isMaxTime = true;
state = SLEEP_STATE;
Log.info("max time for publishing reached without success; go to sleep");
}
Log.info("Not max time, try again to connect and publish");
delay(500);
}
}
}
break;
20 seconds is not nearly enough time under any circumstances.
You should allow a minimum of 11 minutes to connect to cellular. Even if typically it takes 20 seconds, if the SIM requires an IMSI switch it will take at least 2-3 minutes. If the modem needs a full hardware reset, this will not happen until after 10 minutes of attempting to connect.
if you don't have enough battery power to do this every time, you should at least do it once an hour or every few hours if you are not able to connect, otherwise you may never be able to connect.
Fascinating—never knew this. OK, we'll bump our publish timeout way up. I seem to have misinterpreted the docs in this regard; thanks for this invaluable clarification!
We'll give that and the Device OS update a shot and see how things go.
I suppose an incomplete IMSI switch or HW reset could explain the suddenness of the failure, too. Thanks!