Particle.keepAlive blocking for a long time

Hello,

Running an app in multi-threaded, SEMI_AUTOMATIC modes and have some custom code that manages the connection process to the cloud and does each step incrementally.

Steps are:

Particle.disconnect();
Cellular.off();
wait 4000 ms
Cellular.on();
wait 4000 ms
Check for response to simple “AT” command
Get IMSI, parse it, and figure out which credentials to use based on result

Then, the code runs the following sequence:

myLog.info("Attempting to set cellular credentials");
Particle.disconnect();
myLog.info("Particle.disconnect() returned");
cellular_credentials_set(cellCredentials->APN_name, cellCredentials->userName, cellCredentials->passWord, NULL);
myLog.info("cellular_credentials_set() returned");
Particle.keepAlive(cellCredentials->keepAlive);
myLog.info("Particle.keepAlive() returned");

Which results in the following debug messages:

0000013309 [app.network] INFO: Attempting to set cellular credentials
0000013309 [app.network] INFO: Particle.disconnect() returned
0000013310 [app.network] INFO: cellular_credentials_set() returned
0000023272 [app.network] INFO: Particle.keepAlive() returned
0000023272 [app.network] INFO: Changing networkManager state == CONNECTION_MODEM_READY -->changeState()
0000023273 [app.network] INFO: Exiting changeState() with... currentStateTimeout =  0
0000023275 [app.main] WARN: It has been 10429 since last loop!

How come calling Particle.keepAlive blocks for 10 seconds, especially if Particle.connect() hasn’t been called yet?

1 Like

Update:

It appears that calling Particle.keepAlive() is sometimes crashing the MCU:

0000202233 [app.network] INFO: Attempting to lookup cell credentials using stored firmware entries
0000202233 [app.network] INFO: WE HAVE A ZANTEL SIM CARD
0000202235 [app.network] INFO: Attempting to set cellular credentials
0000202235 [app.network] INFO: Particle.disconnect() returned
0000202235 [app.network] INFO: cellular_credentials_set() returned
0000326792 [system] WARN: Resetting WLAN due to WLAN_WD_TO()
Serial connection closed. (NOTE: This is where it resets)
Caught Interrupt.  Cleaning up.
Sebastians-MacBook-Pro:~ Sebastian$ Particle serial monitor
Opening serial monitor for com port: "/dev/cu.usbmodem1421"
Serial monitor opened successfully:
0000010065 [app.main] WARN: Waiting for serial input timed out!
0000010066 [app.main] INFO: Device ID: 3200360009513 
0000010066 [app.main] INFO: SRAM WAS NOT REINITIALIZED ON BOOT
0000010066 [app.main] INFO: SYSTEM RESET REASON IS: 
0000010067 [app.main] INFO: System panic
0000010067 [app.main] INFO: Now initializing global utilities!

It looks to me like the cellular modem stopped talking to you and so the watchdog timer went off which caused a reboot.

Is this reproducible using Tinker instead of your own code?

If you mean using Tinker with my custom network connection process built on top of it then yes, it is reproducible with thinker.

If you mean using Tinker and just hard-coding the APN credentials into the firmware with a STARTUP(set_cellular_credentials() )
call then no, I can’t reproduce it with tinker.

I guess what I’m asking is if there is a better way to setup dynamic APN credentials than the steps I took (see first post)

Well, then I would say that something in your custom network connection process is causing the cellular modem to go bonkers occasionally. I think the long delay you are seeing is a timeout that is ended by the watchdog and reboot.

I would try running the parts of your code that check the AT response and IMSI in as single-threaded a way as possible so that no other process can interfere or be interfered with by your grabbing AT response codes out of the modem. If you have your own threads you can block or delay them.

Do you take care to make sure that the AT response you get is the exact one that you expect and not just any valid response?

My thinking here is that the modem is intrinsically a single stream resource and you may be occasionally grabbing the data that should go to system thread. Even though the Particle cloud is not connected, the modem driver is still talking to modem.

Maybe @rickkas7 has some thoughts that could help here.

1 Like

I think you’re right about my code somehow affecting the state machine/comms between the Electron spark firmware and the modem.

I know that my AT command processing stuff is working fine as it gets passed that stage no problem. Here is the full log of that process described in step one (when its working):

0034731113 [app.network] INFO: Checking if the modem is on...
AT command response type = 0X110000 (TYPE_OK)
AT command response buffer = 
OK

Modem is on!
0034731125 [app.network] INFO: Response code was -2 (RESP_OK)
0034731127 [app.network] INFO: Returning modemResponsive() = TRUE
0034731127 [app.network] INFO: About to request the 4th byte of E.F. 6FAD from SIM card via modem
AT command response type = 0X400000 (TYPE_PLUS)
AT command response buffer = 
+CRSM: 144,0,"03"
:
Found MNC length == 3
AT command response type = 0X110000 (TYPE_OK)
AT command response buffer = 
OK
M: 144,0,"03"
:
0034731161 [app.network] INFO: Successfully parsed the MNC length..
================>  MNC length is: 3
0034731163 [app.network] INFO: About to request the IMSI number from the modem
AT command response type = 0 (TYPE_UNKNOWN)
AT command response buffer = 
302720607325223

Successfully Parsed IMSI: 302720607325223
AT command response type = 0X110000 (TYPE_OK)
AT command response buffer = 
OK
20607325223

0034731187 [app.network] INFO: Successfully parsed a potential IMSI number..
================>  parsed string is: 302720607325223
  IMSI = 302720607325223  |  MCC = 302  |  MNC = 720  | MNC_len = 3
0034731189 [app.network] INFO: Attempting to lookup cell credentials using stored firmware entries
0034731191 [app.network] INFO: WE HAVE A ROGERS SIM CARD
0034731191 [app.network] INFO: Attempting to set cellular credentials
0034731193 [app.network] INFO: Particle.disconnect() returned
0034731193 [app.network] INFO: cellular_credentials_set() returned
0034735478 [comm.protocol.handshake] INFO: Sending HELLO message
0034735864 [comm.protocol.handshake] INFO: Handshake completed
0034735864 [system] INFO: Send spark/device/claim/code event
0034736078 [system] INFO: Send spark/hardware/max_binary event
0034736291 [system] INFO: Send subscriptions
0034736502 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0034736502 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0034736503 [comm] INFO: Sending TIME request
0034736755 [comm.protocol] INFO: rcv'd message type=13
0034736755 [system] INFO: Cloud connected
0034736797 [comm.protocol] INFO: rcv'd message type=13
0034736797 [system] INFO: Cloud: disconnecting
0034737068 [system] INFO: Cloud: disconnected
0034737068 [app.network] INFO: Particle.keepAlive() returned
0034737068 [app.network] INFO: Changing networkManager state == CONNECTION_MODEM_READY -->changeState()
0034737069 [app.network] INFO: Exiting changeState() with... currentStateTimeout =  0
0034737072 [app.main] WARN: It has been 5967 since last loop!

It would appear that calling Particle.keepAlive() only blocks for 1590 ms in this case.

It would seem that the Spark firmware is waiting for the cloud disconnection to happen before setting the keepAlive(). When the cloud is already connected (as in above logs), this disconenction process takes 1500 ms or so. If the cloud connection process is in the middle of trying to reconnect to the cloud when keepAlive is called, that seems to be when things get stuck for a long time.

In terms of the application watchdog reset theory: if the system resets due to application watchdog, I don’t think it returns RESET_REASON_PANIC from System.resetReason(). I think it returns RESET_REASON_USER or RESET_REASON_WATCHDOG doesn’t it?

This line means that the watchdog went off in manage_network_connection().

Ah I see you mean a modem reboot, not an Electron reboot. I think you are correct!

But the thing is that the spark firmware shouldn’t be trying to connect to the network yet since I am between a call to Particle.disconnect() and Particle.connect() isn’t that right?

So there shouldn’t be any process active that would time out?

I am not sure why you say that. Particle.disconnect() just means that the system firmware won't try to open the UDP port to cloud. Getting an IP address from the network, for instance, is separate from connecting the cloud. Even with the cloud not connected, the modem driver will talk to modem.

1 Like

Indeed, so is there a way to set the spark firmware connection handling process into an idle state so that I can safely pass my desired credentials and keepAlive values to the spark firmware?

In other words, can I get the spark firmware into a state where there are no active processes related to connection occuring?

Perhaps calling Cellular.off() would have this effect?

Cellular.off() would depower the modem and you would probably not be able to perform any Cellular.command() calls, but Cellular.disconnect() would stop any cell connection.

2 Likes

Do I need to call Particle.disconnect() AND Cellular.disconnect() in order to reset all the connection state machines/processes, or will calling Cellular.disconnect() suffice?

Hmm, I guess this is a rethoric question :wink:

How would the device talk to the cloud without cell connection?

Just as you can call Particle.connect() to switch on the modem, initiate the cell connection and hook up to the cloud, the reverse will be true too

My concern is more so that if I call Cellular.disconnect() it will break something in the cloud connection state machine and cause the timeout behaviour I was seeing when a subsequent call to Particle.keepAlive() was made?

The Particle cloud is quite resilient to unexpected disconnects - it has to be able to cope with a dropping out device after all as well.

1 Like

Are you using SYSTEM_THREAD(ENABLED)? You have to be really careful with that because functions like Particle.disconnect() don’t block in system thread enabled mode and you’ll still be connected and in the process of disconnecting when the call returns. There’s no good way to know when it’s complete, you kind of just have to wait several seconds, as far as I know.

3 Likes

Yes I have threading enabled. I think perhaps there should be an addition to spark firmware of a Particle.disconnected() function, as there is with Particle.connected().

Then we could simply call that or use waitUntil(Particle.disconnected()) before continuing

1 Like