Greetings !
Environment:
- Device: Electron
- System Firmware: 0.6.4
Background:
Many months of successful development using Electron (very satisfied). System test is underway with dozens of devices in the field. Initial deployment of several hundreds of Electrons is imminent, then expanding into the thousands based on success of the Initial deployment.
As we are expanding system test, we are detecting occasional unexpected problems where Udp.sendPacket() occasionally (and unexpectedly) blocks for 10 seconds before returning an error code of -1. Digging through the Particle forums, I find the following list of UDP error codes:
[Particle UDP Error Codes]( https://community.particle.io/t/udp-error-codes-listed-anywhere/18775/3)
According to this list, an error code of -1 means: “Pending”. From this I can imagine many possible meanings for “Pending”, however, my bigger concern is that Udp.sendPacket() is blocking for 10 seconds before returning it. According to the Particle Firmware documentation, Udp.sendPacket() is supposed to be “unbuffered” meaning that the Packet is sent directly from user memory; when it returns, the Packet is either sent, or not with a negative return code.
True to the spirit of UDP, delivery of Packets is unreliable meaning that software in the sender and receiver must be be designed to handle lost, duplicate, and out-of-order transmission. I’m pleased to say that our software architecture and implementation for the Electron and our “Cloud” is successfully recovering from the lost “Pending” packet. Unfortunately, the problem still exists: why is sendPacket() blocking for 10 seconds? It shouldn’t take 10 seconds to send a 512 byte UDP Packet, or to detect an error.
We always call Cellular.ready() before attempting to send. It would appear that there might be a firmware error in Cellular.ready() or Udp.sendPacket(). Also, it is possible within the small window of time between Cellular.ready() and Udp.sendPacket() that Cellular is lost. In any case, our code can recover from the situation, but only if the System Firmware doesn’t block for such a long time.
This occasional blocking situation is invalidating a very long design/implementation effort where we have meticulously designed the loop() { } construct to follow a single threaded state-machine where nothing blocks or takes longer than 250 ms. Our implementation sports 7 sensors and 2 servo based actuators. As far as we can tell, we are successfully satisfying Particle’s design principles. That is, until we started detecting the occasional long-term blocking of sendPacket().
For many other kinds of projects, if the processor freezes occasionally for 10 seconds, that is no big deal. While I can’t get into details, it is violating our design principles where our “user interface” becomes frozen for these 10 seconds. For us, this is a big deal. No, we are not monitoring mission critical information. But the success of our product centers on both functionality and solid user interface experience.
I am posting this issue here because we’ve made great progress learning about how to solve various problems on these forums. I think that the answers posted here will be of value to the Particle community. Specifically,
- Is the single-threaded state-machine loop() { } valid in all circumstances?
- Is there some kind of undocumented “timeout” parameter I can add to Particle Firmware calls to ensure they never block?
- Does the Firmware API expose FreeRTOS so that I can create my own threads (and sychronisation primitives) to make calls to the Particle Firmware? Is the System Firmware “thread-safe”? Or, is this a really bad idea attempting to solve this problem using this approach?
Thank you all for your invaluable input.