In one particular situation our Electrons just go around in a loop when we call Particle.connect()
.
Context:
Electron (3 different devices)
Firmware 0.8.0-rc.11
SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);
Here’s the log which shows it happening:
0000352734 [system] INFO: Cloud: connecting
0000353005 [app.deferredUpdate] ERROR: Couldn't clear session
0000353006 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0000354005 [app.deferredUpdate] ERROR: Update timed out...
0000354006 [app.deferredUpdate] ERROR: Will force handshake and try with a fresh boot
0000354199 [system] INFO: Resolved host 320033001047373334363431.udp.particle.io to 18.206.150.141
0000354219 [system] INFO: Cloud socket connected
0000354220 [system] INFO: Starting handshake: presense_announce=0
0000354220 [comm.protocol.handshake] INFO: Establish secure connection
0000354231 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=2
0000361673 [comm.protocol.handshake] INFO: Sending HELLO message
0000362961 [comm.protocol.handshake] INFO: Handshake completed
0000362962 [system] INFO: Send spark/device/claim/code event
0000363187 [system] INFO: Send subscriptions
0000363398 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000363398 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000363399 [comm] INFO: Sending TIME request
0000363692 [comm.protocol] INFO: Sending 'M' describe message
0000363937 [comm.protocol] INFO: rcv'd message type=1
0000363937 [system] INFO: Cloud connected
0000364222 [comm.protocol] INFO: rcv'd message type=13
0000364404 [comm.protocol] INFO: Sending 'S' describe message
0000364704 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000364722 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000364724 [comm.protocol] INFO: rcv'd message type=1
0000364888 [comm.protocol] INFO: rcv'd message type=13
0000365050 [comm.protocol] INFO: Received TIME response: 1541804046
0000365052 [comm.protocol] INFO: rcv'd message type=12
0000365964 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 1
0000365966 [comm.protocol] INFO: rcv'd message type=9
0000365966 [comm.protocol] ERROR: Event loop error 21
0000365966 [system] WARN: Communication loop error, closing cloud socket
0000365968 [system] INFO: Cloud: disconnecting
0000365968 [system] INFO: Cloud: disconnected
0000366068 [system] INFO: Cloud: connecting
0000366341 [app.deferredUpdate] ERROR: Couldn't clear session
0000366342 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0000366375 [system] INFO: Resolved host 320033001047373334363431.udp.particle.io to 18.206.150.141
0000366395 [system] INFO: Cloud socket connected
0000366396 [system] INFO: Starting handshake: presense_announce=0
0000366396 [comm.protocol.handshake] INFO: Establish secure connection
0000366407 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=2
0000367341 [app.deferredUpdate] ERROR: Update timed out...
0000367342 [app.deferredUpdate] ERROR: Will force handshake and try with a fresh boot
0000372792 [comm.protocol.handshake] INFO: Sending HELLO message
0000373479 [comm.protocol.handshake] INFO: Handshake completed
0000373480 [system] INFO: Send spark/device/claim/code event
0000373695 [system] INFO: Send subscriptions
0000373906 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000373906 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000373907 [comm] INFO: Sending TIME request
0000374201 [comm.protocol] INFO: Sending 'M' describe message
0000374425 [comm.protocol] INFO: rcv'd message type=1
0000374425 [system] INFO: Cloud connected
0000374487 [comm.protocol] INFO: rcv'd message type=13
0000374763 [comm.protocol] INFO: rcv'd message type=13
0000374905 [comm.protocol] INFO: Received TIME response: 1541804056
0000374907 [comm.protocol] INFO: rcv'd message type=12
0000375087 [comm.protocol] INFO: Sending 'S' describe message
0000375389 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000375407 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000375409 [comm.protocol] INFO: rcv'd message type=1
0000375573 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 1
0000375575 [comm.protocol] INFO: rcv'd message type=9
0000375575 [comm.protocol] ERROR: Event loop error 21
0000375575 [system] WARN: Communication loop error, closing cloud socket
0000375577 [system] INFO: Cloud: disconnecting
0000375577 [system] INFO: Cloud: disconnected
0000375677 [system] INFO: Cloud: connecting
And that same loop just carries on forever (or it would if we didn’t realize and System.reset
). The proximate cause seems to be the message type=9
(9 = KEY_CHANGE
) as it’s always followed by Event loop error 21
(21 = IO_ERROR_LIGHTSSL_BLOCKING_SEND).
As this doesn’t happen when I connect at other times I presume it’s the particular loop our app is in at this point. We are intentionally in our own loop and not in the main loop()
. Reason being we can’t have a reset happen at any random time (like when valves are open) so we System.disableUpdates()
. When we see a pending update and get to an appropriate time where a reset would be OK we System.enableUpdates()
and head off to this particular loop:
void waitForUpdate() {
log.info("Expecting update to start...");
if (! statusUpdateTs) {
statusUpdateTs = millis();
}
int publishAttempts = 0;
while (true) {
Particle.process();
timestamp expectedFinishTs = statusUpdateTs + (isComplete ? COMPELTE_TIMEOUT : IN_PROGRESS_TIMEOUT);
if (millis() > expectedFinishTs) {
log.error("Update timed out...");
if (isFreshBootAttempt()) {
log.error("Already tried fresh boot, will resume normal operations");
delay(500);
System.reset();
} else {
log.error("Will force handshake and try with a fresh boot");
bool publishSuccess = Particle.publish("spark/device/session/end", PRIVATE | WITH_ACK);
if (publishSuccess) {
log.info("Cleared session to force handshake next boot");
delay(500);
System.reset(FRESH_BOOT_RESET_MAGIC);
} else {
log.error("Couldn't clear session");
if (publishAttempts++ >= 3) {
log.error("Giving up on clearing seession");
delay(500);
System.reset(FRESH_BOOT_RESET_MAGIC);
}
}
}
}
delay(1000);
}
}
statusUpdateTs
and isComplete
are being updated whenever a system firmware event happens (through System.on(firmware_update_pending + firmware_update, firmwareEventHandler)
). isFreshBootAttempt()
is false
when this issue is happening.
The point to this loop is to wait for the update to apply but if it doesn’t within a timeout (which happens all the time for us) we publish an event to end our session and reboot to pick the update up next boot (next boot we won’t disable updates because of FRESH_BOOT_RESET_MAGIC
so the firmware update won’t fail – which it does for us most times on the first attempt when updates are re-enabled).
In that loop you can see it’s log messages sprinkled through the log output above. As per the log output because the Particle.publish
is WITH_ACK
it takes about 10 seconds to return with a failure result, and then another 1000ms delay at the end of the loop means Particle.process()
is only being called every 10s or so. Could that be the problem and causing the KEY_CHANGE
to fail?