I see that RC27 was released last night and so this morning I immediately started flashing my mesh network with my Marco-Polo mesh heartbeat testing code to keep this mesh stability testing going.
First, I update my Argon gateway. Of course the mesh drops and all the nodes are blinking green; that’s to be expected. After a lengthy magenta blinking session on the Argon, the device comes back up and starts running user code again.
The mesh did not recover automatically. All the nodes went blinking green. Not until I reset one of the Xenons did the rest of the devices almost immediately give a quick cyan flash and then breathing cyan. Strange that the mesh didn’t recover by itself and strange resetting only one Xenon recovered the whole mesh network.
Then I start updating the nodes. The first node I flashed caused the entire mesh network to drop. The Argon was running the heartbeat code, the nodes were responding (indicated by the “beating” D7), but the Argon wasn’t receiving the responses. I assume this was because of the code flashing to the updating node bogging down the limited bandwidth of the mesh network. On a subsequent flash, I accidentally flashed one of the nodes with RC26 (which it already had) because I wasn’t paying attention to switch the OS target in the WebIDE. When I did that RC26 flash, the response latency went up to about 400ms, but the mesh stayed up and all node responses were received. Flashing RC27 on subsequent nodes didn’t seem to degrade the mesh network as much. On those subsequent flashes, it seemed that 3 out of 4 nodes were responding as I expected and continued to respond while a single node was going through the flashing magenta sequence.
The point is, if you are running critical mesh communications, an OTA flash of new user code will cause some latency. However, an OTA flash of new user code that also requires a device OS change may cause significant degradation of the mesh network while the self-healer updates the device.
This also poses an interesting problem for updating a mesh network that is part of a “product” (when that functionality comes around). It wouldn’t be advantageous for every mesh node to try to flash OTA all at the same time. I am currently doing a “rolling upgrade” by manually selecting a single node to flash one-at-a-time. The product updates for mesh might have to use a similar approach to make sure every node gets its OTA while not causing excessive failures due to bandwidth limitations.