We are working with the particle products around 2 years. The documentation and forum support are great. Our products are in USA, South Africa, Australia etc. We updated our system firmware version to 0.7.0 and began to save the device logs.
There was a lot of online events in most of the devices. We optimised our firmware, thought it was some memory issue. Thanks to @ScruffR for his help. This didn’t help us a lot. We downgraded some of them to system firmware 0.6.3 and upgraded some of them to 0.8.0-rc11. Almost all of them started to become stable. Our online logs reduced from more than 100 per day to 7 per 20 days. Some of them didn’t publish a single online event for 15 days.
Still there are devices which shows online events no matter what. The WiFi strength is strong and we are using external antennas.
Is there anyway we could identify the cause of these online events? If it is due to modem config, DHCP restrictions, IP collisions etc, what we can do in these cases. What may cause these continuous online events?
@Rahul_G For those devices that you have on 0.8.0-RC.11 I would suggest you go to the console and download the history log as a CSV and then look at the reasons why it has been disconnecting. My experience with WiFi is that whilst instantaneous signal strength (RSSI) might appear good or strong there can be short periods where the signal is attenuated and the connection can be lost. The router logs might be a better source to answer your listed reasons - the network connection errors will likely be 1006/7, 1023/24 - AP not available/out of range, credentials wrong.
Are you able to log the signal strength and data quality for each device back to you monitoring app? I did that for some problematic devices on customer sites and it highlighted that AP get turned off for maintenance and for security reasons out of hours plus that during the day there are huge fluctuations in signal strength due to people attenuating the signals, other devices (wifi, bluetooth).
My hope is that with the 3rd gen mesh hardware we can move away from using WiFi as it is just not reliable.
It takes a bit of digging to find - you are looking for WICED error codes if you want to search the web. Below are the ones I have found:
SUCCESS 0 // Success
PENDING -1 // Pending
TIMEOUT -2 // Timeout
PARTIAL_RESULTS -3 // Partial results
ERROR -4 // Error
BADARG -5 // Bad Arguments
BADOPTION -6 // Mode not supported
UNSUPPORTED -7 // Unsupported function
OUT_OF_HEAP_SPACE -8 // Dynamic memory space exhausted
NOTUP -9 // Interface is not currently Up
UNFINISHED -10 // Operation not finished yet
CONNECTION_LOST -11 // Connection to server lost
NOT_FOUND -12 // Item not found
PACKET_BUFFER_CORRUPT -13 // Packet buffer corrupted
ROUTING_ERROR -14 // Routing error
BADVALUE -15 // Bad value
WOULD_BLOCK -16 // Function would block
ABORTED -17 // Operation aborted
CONNECTION_RESET -18 // Connection has been reset
CONNECTION_CLOSED -19 // Connection is closed
NOT_CONNECTED -20 // Connection is not connected
ADDRESS_IN_USE -21 // Address is in use
NETWORK_INTERFACE_ERROR -22 // Network interface error
ALREADY_CONNECTED -23 // Socket is already connected
INVALID_INTERFACE -24 // Interface specified in invalid
SOCKET_CREATE_FAIL -25 // Socket creation failed
INVALID_SOCKET -26 // Socket is invalid
CORRUPT_PACKET_BUFFER -27 // Packet buffer is corrupted
UNKNOWN_NETWORK_STACK_ERROR -28 // Unknown network stack error
NO_STORED_AP_IN_DCT -29 // DCT contains no AP credentials
STA_JOIN_FAILED -30 // Join failed
PACKET_BUFFER_OVERFLOW -31 // Packet buffer overflow
ERROR_WWD_INVALID_KEY 1004 You will get this error if the password you provided for your AP is invalid
ERROR_WWD_AUTHENTICATION_FAILED 1006 You will get this error if authentication failed trying to connect to the AP|
ERROR_WWD_NETWORK_NOT_FOUND 1024 You will get this error if the requested AP could not be found in an AP scan. A likely cause of this error message is that you are out of range of the AP
ERROR_WWD_UNABLE_TO_JOIN 1025 You will get this error if you are unable to join the requested AP. A likely cause of this error message is that you are out of range of the AP
ERROR_WWD_ACCESS_POINT_NOT_FOUND 1066 This error message indicates that the requested AP could not be found.
ERROR_TLS_UNTRUSTED_CERTIFICATE 5035 Indicates that the certificate from the remote secure server could not be validated against any of the root certificates available to WICED. You may need to add another root certificate.
Have a look for a @rickkas7 design for a message log stored in retained RAM - if this is what you mean by save data before publish. I use an event log on SD card where I can store years of events. Process works well and can recover history once reconnected.
When an AP is out of range the WICED stack will return error 1024 typically.