LTE Freezing with Spotty connections

electron
Tags: #<Tag:0x00007f0390495fd0>

#1

OS ver 1.0.0 e-series LTE module

I am testing the cellular connection of devices and am having devices freeze the MCU which forces a watchdog system.reset when the LTE coverage is spotty and there are many disconnects.

I have the following output. Does anyone know how to interpret these errors?

These seem to be the main contributors.

0000226114 [comm.protocol] ERROR: Event loop error 3
0000226115 [system] WARN: Communication loop error, closing cloud socket

It froze in this block

0000189735 [comm.protocol] INFO: rcv'd message type=13
0000191639 [comm.protocol] INFO: rcv'd message type=13
0000226114 [comm.protocol] ERROR: Event loop error 3
0000226115 [system] WARN: Communication loop error, closing cloud socket
0000226115 [system] INFO: Cloud: disconnecting
0000226115 [system] INFO: Cloud: disconnected
0000226216 [system] INFO: Cloud: connecting
0000226238 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0000227277 [system] INFO: Resolved host "particleID changed".udp.particle.io to 34.237.141.248
0000227601 [system] INFO: Cloud socket connected
0000227601 [system] INFO: Starting handshake: presense_announce=0
0000227603 [comm.protocol.handshake] INFO: Establish secure connection
0000227627 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=2
0000234944 [comm.protocol.handshake] INFO: Sending HELLO message
0000235340 [comm.protocol.handshake] INFO: Handshake completed
0000235342 [system] INFO: Send spark/device/claim/code event
0000235438 [system] INFO: Send spark/device/last_reset event

#2

This is the function that gets called

/**
 * This is the internal function called by the background loop to pump cloud events.
 */
void Spark_Process_Events()
{
    if (SPARK_CLOUD_SOCKETED && !Spark_Communication_Loop())
    {
        WARN("Communication loop error, closing cloud socket");
        cloud_disconnect(false, false, CLOUD_DISCONNECT_REASON_ERROR);
    }
    else
    {
        lastCloudEvent = millis();
    }
}

#3

Here is some more information:

At the ERROR: Event loop error 3 was where it lost connection and then froze
Then the device proceeded to flicker from freezing to having a second of control and then froze again
that is when the ERROR: Unable to create socket was printed
At each unable to create socket, I had control for a moment and then it froze again.

0000833371 [comm.protocol] INFO: rcv'd message type=13
0000907134 [comm.protocol] ERROR: Event loop error 3
0000907134 [system] WARN: Communication loop error, closing cloud socket
0000907135 [system] INFO: Cloud: disconnecting
0000907135 [system] INFO: Cloud: disconnected
0000907235 [system] INFO: Cloud: connecting
0000917246 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0001007337 ERROR: Unable to create socket
0001088027 ERROR: Unable to create socket

#4

Hey Wesner,

Thanks for sharing this. I’m going to check and see if I can’t get some eyes from our internal team on this.


#5

@wesner0019 This seems likely to be one of the LTE bugs we are currently working on resolving. A bug fix patch to 1.0.0 will be coming within the next week (hopefully Monday).

If you want to send in more detailed logs so we can confirm that would be helpful. Make sure you have applied the system firmware from the github release page or via particle update on the CLI to ensure DEBUG_BUILD=y is enabled.

Then add SerialLogHandler logHandler(115200, LOG_LEVEL_ALL); globally at the top of your app, or Serial1LogHandler logHandler(115200, LOG_LEVEL_ALL); if you want to capture from the TX output. Please provide as large of a file as you can, and if you can point out when things are happening at various timestamps in the logs that would also be helpful. Please PM/DM me the logs.


#6

@BDub, I am working on getting the log files created. Will let you know when its done


#7

Thanks for the logs! This issue should be resolved in the next prerelease, coming soon. It manifests when sending data while there is no signal, but the system should come back on it’s own in about 10 minutes… so it’s not actually locked up, but it does appear that way for far too long.


#8

v1.0.1-rc.1 addresses these issues and is out now. @wesner0019 already knows about this, but if anyone else hits this thread using <= v1.0.0 please give v1.0.1-rc.1 a test: