Hello there
We have a few Argons and Borons scattered around different locations.
We use these devices to control some sensors for which it is essential that the Particle device is responding to the requests we make via cloud functions
Lately we have been seeing an issues where the device is unresponsive for a few seconds. I checked the vitals in the dashboard and noticed that every 3 days the device disconnects from the cloud. Immediately after te disconnect it reconnects (about 10 seconds average), which works great. The only issue here being that we are not able to control the Particle device which is a problem when this disconnect happens and need to control a connected sensor.
Since the devices are placed on locations that we haven’t always acces to it is hard to get the logs out of the device. So I have made a setup in my office with an Argon which had this issue as well on a location it was placed on, to be able to log what is happening with the device. From the logs I noticed that when the disconnect happens it is because of an error on the system:
0055442972 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055442973 [comm.protocol] TRACE: message id 41252 complete with code 0.00
0055442973 [comm.protocol] TRACE: rcv'd message type=13
0055468079 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055468080 [comm.protocol] TRACE: message id 41253 complete with code 0.00
0055468080 [comm.protocol] TRACE: rcv'd message type=13
0055493186 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055493186 [comm.protocol] TRACE: message id 41254 complete with code 0.00
0055493187 [comm.protocol] TRACE: rcv'd message type=13
0055505693 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 1
0055505694 [comm.protocol] TRACE: rcv'd message type=9
0055505695 [comm.protocol] ERROR: Event loop error 21
0055505695 [system] WARN: Communication loop error, closing cloud socket
0055505696 [system] INFO: Cloud: disconnecting
0055505696 [system] INFO: Cloud: disconnected
0055505697 [system] INFO: Cloud: connecting
0055505698 [system] INFO: Read Server Address = type:1,domain:$id.udp-mesh.particle.io
0055505699 [system] WARN: Failed to load session data from persistent storage
0055505699 [system] INFO: Discarding session data
0055505701 [system] TRACE: Resolving xxxxxxx.v2.udp-mesh.particle.io#5684
0055505749 [system] TRACE: Address type: 3
0055505749 [system] TRACE: Cloud socket=0, family=2, type=2, protocol=17
0055505750 [system] INFO: Cloud socket=0, connecting to xxxxxx#5684
0055505750 [system] TRACE: Cloud socket=0, connected to xxxxxx#5684
0055505751 [system] TRACE: Updating cloud keepalive for AF_INET: 25000 -> 25000
0055505751 [system] TRACE: Applying new keepalive interval now
0055505752 [system] INFO: Cloud socket connected
0055505752 [system] INFO: Starting handshake: presense_announce=0
0055505753 [comm.protocol.handshake] INFO: Establish secure connection
0055505756 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=2
0055506323 [comm.protocol.handshake] INFO: Sending HELLO message
0055506446 [comm.protocol.handshake] INFO: Handshake completed
0055506447 [comm.protocol.handshake] TRACE: Updating protocol flags
0055506447 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0055506448 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0055506614 [hal] TRACE: Cached ESP32 NCP firmware version: 5
0055506616 [comm.protocol] INFO: Posting 'S' describe message
0055506621 [system] INFO: Send spark/hardware/max_binary event
0055506624 [system] INFO: Send spark/device/last_reset event
0055506631 [comm] INFO: Sending TIME request
0055506634 [system] INFO: Sending application DESCRIBE
0055506634 [comm.protocol] INFO: Posting 'A' describe message
0055506637 [system] INFO: Sending subscriptions
0055506637 [comm.protocol] INFO: Sending subscriptions
0055506642 [system] TRACE: Waiting until all handshake messages are processed by the protocol layer
0055506759 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055506759 [comm.protocol] TRACE: message id 41256 complete with code 0.00
0055506760 [comm.protocol] TRACE: Updating system DESCRIBE checksum
0055506760 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0055506926 [hal] TRACE: Cached ESP32 NCP firmware version: 5
0055506928 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0055506929 [comm.protocol] TRACE: rcv'd message type=13
0055506929 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055506930 [comm.protocol] TRACE: message id 41257 complete with code 0.00
0055506930 [comm.protocol] TRACE: rcv'd message type=13
0055506931 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0055506932 [comm.protocol] TRACE: message id 41258 complete with code 0.00
0055506932 [comm.protocol] TRACE: rcv'd message type=13
I can’t figure out why that is. I have done an extensive code review with the Particle documentation next to me to see if there are important notes we missed for some logic that causes memory leaks or threading issues and I don’t think that’s the case. The code is also fairly simple, all the Particle connection logic is handled automatically and we have one cloud function that controls max 3 pins of the device. Good to add here is that the device is reset at midnight to make sure the device doesn’t suffer from any memory issues or something that makes the device not function or function incorrectly throughout the day. That is also backtracked in the logs as well, so no issues there that could cause something to happen after 3 days. And it runs on a fairly old DeviceOs version -> 2.3.0. The test device (same as on location and Borons over cellular) is connected via Ethernet and when logging I checked if there was an outage with the internet, but that wasn't the case.
Particle is fairly new to me, although I have some experience with other hardware like Arduino, ESP32, ESP8266 and developing software in general. Would be great to get some new insights and solve this issue with help of the community. I looked around the forum already but couldn’t find find a issue that is similar (unfortunately).
Huge thanks in advance for the help!