Mesh round robin findings was rc.25 now rc.26


#1

So I have a mesh of 3 Xenons and an Argon in very close proximity. The Argon subscribes to MQTT and the message content is a colour, it then propagates this colour value around the 3 Xenons using a mesh publish/ subscribe from each and then back to the Argon. The LEDs change accordingly. This is then published onto MQTT as a different colour and so the loop should continue endlessly. In practice I get around 300ms as a loop time so that’s OK’ish. However this loop only runs for about 60 seconds before a subscribe is missed. So it looks like some form of delivery confirmation (ACK) may be required… or I hope this improves in the next release. The missed publish message doesn’t seem to be ‘detectable’ as such, more akin to UDP. It’s not a dependable delivery.

Some observations… The lack of pulsing cyan or flashing green is no indicator that a mesh doesn’t exist. Pulsing cyan doesn’t guarantee cloud connectivity. The cloud is significantly confused by which devices are available at any time. Very unreliable indications of mesh status. Frequent non recovery requiring router power cycle (and occasional repeater power cycle) to recover.


#2

The stability of a mesh containing an Argon as the gateway is greatly improved in 0.8.0-rc.26, which should be out soon.

You may want to use SYSTEM_THREAD(ENABLED) so your code is less likely to be blocked by lengthy system operations which could cause you to lose a MQTT message.

The breathing cyan indicator is not particularly accurate for mesh devices at this time. Mesh devices will have many of the same issues as cellar devices with online indication as they all don’t have a TCP connection to the cloud as the Photon/P1/Core did. It will likely get better, but it will never be perfect.


#3

I’ll try SYSTEM_THREAD(ENABLED) but it seems my loop is failing between two Xenons on the thread.publish/subscribe most of the time as the LEDs are different colours. I hope rc26 may improve things.

Is there any delivery check/guarantee mechanism in place on the mesh so that devices won’t miss these messages, unless of course they are dropping off the mesh which I’m not seeing because I’ve re-appropriated the LED.


#4

Oh, I see. Yes, the Mesh.publish is based on UDP multicast, so it’s not reliable delivery. However, I suspect it’s far less reliable than it should be right now, because of a problem in the 0.8.0-rc.25 gateway code on the Argon is causing mesh communication difficulties.


#5

So I’ve updated this to rc.26 - went OK although a couple of errant Web IDE messages (update failed and still listing rc.25 bottom left).

Changes to the (4 mesh unit) round robin behaviour are interesting . It took a long time to get started with round loop times of several seconds. I assume the mesh was stabilising. Then it settled down to a loop time of maybe 600ms - half the speed of rc.25 (300ms). So a definite throughput drop.

The loop time is oddly erratic but it doesn’t stall as often, although still unfortunately does. Previously on rc.25 it would run for about 2 mins before a publish was missed. Now it runs for about 5 mins it seems, early results still. I know the UDP publish subscribe will always be prone to failure so I’ll try next with an ACK confirmation similar to another implementation on here. The stall resulting in variable loop times I think is in the Argon/MQTT interaction and the failure is always between mesh nodes and not the MQTT broker. Interestingly the same mesh nodes. All nodes are within 1m.

My issues requiring power cycling and mesh/internet connectivity so far seem to have been fixed,