Knowing if your device is online or not is a big deal. That’s why we’re so excited to share that now the Electron and all other Gen 2 and Gen 3 devices accurately report their status in the Console, Web IDE, CLI, and Cloud API.
While this feature is not new, important improvements were made to increase the accuracy of the online indicator. One of the challenges that we overcame — and why it’s not a simple problem to solve — is that the low-level protocol used by these Particle devices is UDP. UDP is a stateless protocol, unlike the more familiar TCP used by the Photon.
TCP gets its stateful nature partly by sending “extra” data over the wire. UDP connections do not need to send this extra data. This means UDP devices can have lower data costs on metered connections.
Reduced data consumption is an awesome feature of UDP, but being stateless makes it more difficult to know if a device is online or not. While we could lean on the networking stack to keep an eye on TCP devices, UDP requires us to track the state of devices on our own.
Our Solution
To solve this problem we changed how the online state of a device is determined. Instead of checking on the existence of a live connection in the network stack, we leverage a normal behavior of the Particle Device OS: keep-alive messages.
By default, keep-alive messages are sent at regular intervals by the Device OS to the cloud. These messages inform routers between the Device and the Particle Device Cloud to maintain an open UDP. Routers can only hold so many connections in memory at a time and will actively close connections if they think the connection is no longer required. For TCP connections, routers just watch the inherent state of the connection. Because UDP connections are stateless, routers have to depend on the existence of such keep-alive messages in order to know if the connection is still required.
Our online indicator uses a similar approach: if we receive data from a device, we mark it online. If no data has been received from that device within some period of time, we can reliably estimate that the device has gone offline.
The default timing of the Device OS keep-alive messages is well defined but varies somewhat between platforms. Given these values, we can accurately predict when a device will send its next keep-alive message.
An Electron, for example, sends a keep-alive message every 23-minutes by default.
Once we start receiving data from an Electron, we can expect at least one keep-alive message every 23-minutes. While other data may, of course, be received from the device during that time, as long as one message is received, we can accurately assume the device is still online.
However, if we do not receive any data within that 23 minute window, we can assume the Electron has gone offline.
Changes to Event Behavior
Robots and computers may not be great at seeing the breathing cyan online indicators on the Console, so we’ve also made changes to the behavior of the spark/status
events.
These events are published after a device switches between an online or offline state. The timing of these events is also based on the default keep-alive. In some situations, these events may be published immediately after the device changes state, but on average, these events will be published after about 1.5x the default keep-alive timing for the device.
For Electrons, with their default keep-alive of 23-minutes, these events should be published between 23- and 46-minutes after the device changes state.
Timing is Everything
Those are the basics for tracking the online state of UDP-based Particle devices!
However, the implementation details are a little more complex. Clock drift between devices and the Cloud, sleepy devices, devices that often disconnect and reconnect their Cloud connection, certain types of network issues, mesh device topologies, and user applications that call the Particle.keepAlive()
API all have to be taken into account.
All of these situations impact the timing of a device’s keep-alive and make it more difficult to predict when a device is truly offline.
While we were able to tackle some of these challenges so far, we still have future improvements in the pipeline to address these correctly and improve the reliability of the online indicator.
As we tackle these challenges, the timing of the spark/status
events will also improve. Our goal is to publish these events as soon as the device actually changes state. However, we must be careful not to lose the lower data usage advantages we gain from the stateless nature of UDP connections.
For now, the online indicators and the updated spark/status
events should be reliable for most applications!
If you have questions, please reply in this thread, and we’ll do our best to provide clarity.