Extremely long LTE reconnect times on Boron 1.5.2 - need explanation of low-level modem behavior in Firmware

This is a more refined version of my previous thread where I incorrectly reported that a new 1.5.2 OS Boron wasn’t reconnecting to the cloud after new code flashes.

It does always reconnect, and it also does always take 20 - 40 minutes to do so after a new program flash from the Particle Web IDE.

However, the Boron is in an area with stable cellular connection and always connects to cloud within 30 mere seconds when power cycled.

So there is a decisively software-originated disparity in the reconnect interval between the following two situations: 1) power cycle (connects ALWAYS within 30 seconds), and 2) new code flash (takes ALWAYS 20 - 40 minutes to reconnect).

This is proof that this is a software issue because the reception is stable enough to instantly connect always upon power cycle.

While Particle fixed the memory leak bug that caused permanent lack of reconnect sans power cycle, it appears to have introduced this new issue which will cause significant problems for my remote sites which frequently experience RF-related disconnects and are reliant upon the Boron reconnect behavior.

To this end, my V1.3.1.-rc1 Boron has behaved phenomenally since October when I last touched it, and it has been disconnecting and reconnecting faithfully on its own for now 8 months. (Very good!)

But I am now servicing that station and nervous to deploy V1.5.2 to it.

Can a low-level Particle engineer such as @rickkas7 please explain to me what is the Boron firmware algorithm and software behavior for reconnecting, how it interacts with the U-Blox modem, and whether there is a setting we can use to aggressively reconnect at the expense of higher power/data usage, so that it’s not taking 20 - 40 whole minutes to reconnect every time signal is lost, under cellular reception circumstances so perfect that a power cycle reconnect will happen within 30 seconds?

Thank you.

What is the code doing before the OTA update happens?
What is the new code doing right after it comes back?
Are you using the Particle SIM or third party?

Internal default Particle SIM. I am 99% sure this is unrelated to my simple MQTT loop code but I will have to test with blank sketches to be able to tell you the same is observed on blank sketches (I bet you a dollar it is). I only use Boron to read UART packets from another MCU and upload them to the web through MQTT.

Regardless of when I am able to do so, is anybody still able to “explain to me what is the Boron firmware algorithm and software behavior for reconnecting, how it interacts with the U-Blox modem, and whether there is a setting we can use to aggressively reconnect at the expense of higher power/data usage”?

I’m not using LTE but expercing similar really long reconnect times after a cloud flash (boron 2G/3G).

0000002957 [hal] TRACE: Modem powered off
0000002959 [hal] TRACE: Powering modem on
0000003060 [hal] TRACE: Modem powered on
0000003062 [hal] TRACE: Setting UART voltage translator state 1
0000004064 [ncp.at] TRACE: > AT
0000005065 [ncp.at] TRACE: > AT
0000006066 [ncp.at] TRACE: > AT
0000007067 [ncp.at] TRACE: > AT
0000008068 [ncp.at] TRACE: > AT
0000008070 [ncp.at] TRACE: < OK
0000009071 [hal] TRACE: NCP ready to accept AT commands
0000009072 [ncp.at] TRACE: > AT+UGPIOC?
0000009076 [ncp.at] TRACE: < +UGPIOC:
0000009078 [ncp.at] TRACE: < 16,255
0000009079 [ncp.at] TRACE: < 23,255
0000009081 [ncp.at] TRACE: < 24,4
0000009082 [ncp.at] TRACE: < 25,5
0000009084 [ncp.at] TRACE: < 34,12
0000009085 [ncp.at] TRACE: < 35,12
0000009087 [ncp.at] TRACE: < 36,12
0000009088 [ncp.at] TRACE: < 37,12
0000009090 [ncp.at] TRACE: < 42,7
0000009091 [ncp.at] TRACE: < OK
0000009092 [hal] INFO: Using internal SIM card
0000009094 [ncp.at] TRACE: > AT+CPIN?
0000009098 [ncp.at] TRACE: < +CPIN: READY
0000009100 [ncp.at] TRACE: < OK
0000009101 [ncp.at] TRACE: > AT+CCID
0000009120 [ncp.at] TRACE: < +CCID: XXXXXXX
0000009122 [ncp.at] TRACE: < OK
0000009123 [ncp.at] TRACE: > AT+COPS=3,2
0000009127 [ncp.at] TRACE: < OK
0000009128 [ncp.at] TRACE: > AT+IPR=115200
0000009132 [ncp.at] TRACE: < OK
0000010133 [ncp.at] TRACE: > AT
0000010135 [ncp.at] TRACE: < OK
0000010136 [ncp.at] TRACE: > AT+IFC=2,2
0000010139 [ncp.at] TRACE: < OK
0000010141 [ncp.at] TRACE: > AT
0000011142 [ncp.at] TRACE: > AT
0000011144 [ncp.at] TRACE: < OK
0000011145 [ncp.at] TRACE: > AT+UPSV=0
0000011149 [ncp.at] TRACE: < OK
0000011150 [ncp.at] TRACE: > AT+CMUX=0,0,,1509,,,,,
0000011281 [ncp.at] TRACE: < OK
0000011382 [ncp.at] TRACE: > AT
0000011433 [ncp.at] TRACE: < OK
0000011434 [hal] TRACE: NCP state changed: 1
0000011436 [net.ppp.client] TRACE: PPP thread event LOWER_DOWN
0000011439 [net.ppp.client] TRACE: PPP thread event ADM_DOWN
0000011442 [net.ppp.client] TRACE: PPP thread event ADM_UP
0000011444 [net.ppp.client] TRACE: State NONE -> READY
0000011440 [ncp.at] TRACE: > AT+CIMI
0000011483 [ncp.at] TRACE: < XXXXXXXXXXX
0000011485 [ncp.at] TRACE: < OK
0000011487 [ncp.at] TRACE: > AT+CGDCONT=1,"IP","CHAP:vfd1.korem2m.com"
0000011533 [ncp.at] TRACE: < OK
0000011534 [ncp.at] TRACE: > AT+CREG=2
0000011583 [ncp.at] TRACE: < OK
0000011584 [ncp.at] TRACE: > AT+CGREG=2
0000011633 [ncp.at] TRACE: < OK
0000011634 [hal] TRACE: NCP connection state changed: 1
0000011636 [ncp.at] TRACE: > AT+COPS=0,2
0000011637 [net.ppp.client] TRACE: PPP thread event LOWER_DOWN

It get’s “stuck” on this latest lane for long time to then continue on with below to connect

0000103233 [ncp.at] TRACE: < OK
0000103234 [ncp.at] TRACE: > AT+CREG?
0000103236 [ncp.at] TRACE: < +CREG: 5,"015E","FBAC9",6
0000103239 [ncp.at] TRACE: < +CGREG: 5,"015E","FBAC9",6,"32"
0000103241 [hal] TRACE: NCP connection state changed: 2
0000103333 [ncp.at] TRACE: < +CREG: 1,"015E","FBAC9",6
0000103334 [net.ppp.client] TRACE: PPP thread event LOWER_UP
0000103337 [ncp.at] TRACE: < +CGREG: 0
0000103339 [hal] TRACE: NCP connection state changed: 1
0000103341 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000103343 [ncp.at] TRACE: < OK
0000103337 [net.ppp.client] TRACE: State READY -> CONNECT
0000103347 [net.ppp.client] TRACE: TX: 13
0000103346 [ncp.at] TRACE: > AT+CGREG?
0000103350 [net.ppp.client] TRACE: State CONNECT -> CONNECTING
0000103353 [net.ppp.client] TRACE: PPP phase -> 2
0000103355 [net.ppp.client] TRACE: PPP thread event LOWER_DOWN
0000103357 [net.ppp.client] TRACE: State CONNECTING -> DISCONNECT
0000103360 [net.ppp.client] TRACE: State DISCONNECT -> DISCONNECTING
0000103363 [net.ppp.client] TRACE: PPP phase -> 0
0000103365 [net.ppp.client] TRACE: PPP status -> 5
0000103367 [net.ppp.client] TRACE: PPP thread event DOWN
0000103369 [net.ppp.client] TRACE: State DISCONNECTING -> DISCONNECTED
0000103372 [net.ppp.client] TRACE: State DISCONNECTED -> READY
0000103384 [ncp.at] TRACE: < +CGREG: 2,0
0000103385 [ncp.at] TRACE: < OK
0000118486 [ncp.at] TRACE: > AT+CREG?
0000118534 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000118536 [ncp.at] TRACE: < OK
0000118537 [ncp.at] TRACE: > AT+CGREG?
0000118584 [ncp.at] TRACE: < +CGREG: 2,0
0000118585 [ncp.at] TRACE: < OK
0000133686 [ncp.at] TRACE: > AT+CREG?
0000133734 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000133736 [ncp.at] TRACE: < OK
0000133737 [ncp.at] TRACE: > AT+CGREG?
0000133784 [ncp.at] TRACE: < +CGREG: 2,0
0000133785 [ncp.at] TRACE: < OK
0000148886 [ncp.at] TRACE: > AT+CREG?
0000148934 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000148936 [ncp.at] TRACE: < OK
0000148937 [ncp.at] TRACE: > AT+CGREG?
0000148984 [ncp.at] TRACE: < +CGREG: 2,0
0000148986 [ncp.at] TRACE: < OK
0000164087 [ncp.at] TRACE: > AT+CREG?
0000164134 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000164136 [ncp.at] TRACE: < OK
0000164137 [ncp.at] TRACE: > AT+CGREG?
0000164184 [ncp.at] TRACE: < +CGREG: 2,0
0000164185 [ncp.at] TRACE: < OK
0000179286 [ncp.at] TRACE: > AT+CREG?
0000179334 [ncp.at] TRACE: < +CREG: 2,1,"015E","FBAC9",6
0000179336 [ncp.at] TRACE: < OK
0000179337 [ncp.at] TRACE: > AT+CGREG?
0000179384 [ncp.at] TRACE: < +CGREG: 2,0
0000179385 [ncp.at] TRACE: < OK

It’s because of an automatic network selection?

Do you experience similar traces @Paul_M?

You can get more lower level debug information by adding

SerialLogHandler logHandler(LOG_LEVEL_TRACE);

to your firmware code.

@mcen I don’t have time to get these traces, because I have subsequently determined two universal natural laws of the Particle universe:

  1. If you are using cellular, use 1.3.1-rc1, or else your Boron will most certainly be destroyed and enter permanent states of disconnectivity (which I have observed with 1.5.0 and newer);
  2. If you are using cellular, use an external power-switching hardware watchdog.

I am of high confidence that your reflashing issues above are similar to mine and are because Particle ruined the cellular firmware post-1.3.1-rc1. Downgrade and install an external power switching watchdog to save you extreme amounts of future pain and frustration. Take it from me. You’ll thank me later. (See my recent threads for more info)

1 Like

Hi @mcen, our support and engineering teams are interested in knowing more about what’s going on here. We’d like to investigate this further. Would you mind submitting a support ticket with your logs along with the associated device ID for this unit? We’d like investigate this device’s behavior and potentially reach out to our telephony partner to gather additional insight.

@mcen to clarify – this may be the result of a connectivity management issue in Device OS, but there does seem to be something weird happening on the network side making it possible that is an issue with the SIM.

The logs reveal that the device is connecting as a roaming connection, but then immediately disconnecting GSM and reconnecting as a HOME connection. This is an abnormal connect sequence that we’d like to explore deeper to find and resolve the root cause.

The purpose behind collecting more information via support is so that we can investigate corresponding cloud side logs and request additional network logging from our telephony partner.