Mesh pub-sub at the gateway

armor · November 18, 2019, 12:10pm

When setting up a Mesh network gateway application that subscribes to Mesh events and then receives publishes from the nodes in the Mesh network - are there any specific guidelines / restrictions for the design of the handler function to ensure that published Mesh messages are received? I have a suspicion that at times of high message volumes some are being missed/lost - is a Mesh event inbound queue required?

//Mesh event subscribe in setup()
result = Mesh.subscribe(meshNodePublishName, meshNodePublishHandler);

//Handler for the receipt of mesh messages from end nodes
void heartbeatHandler(const char *event, const char *command)
{
    if (strncmp(event, meshNodePublishName, strlen(meshNodePublishName)) == 0)
    {
        // parse the JSON or copy the message data and flag received for the loop() to handle
    }
}

peekay123 · November 18, 2019, 2:33pm

@armor, since Mesh publishes are done over stateless UDP message, there is always a chance of loss. The only way to handle this is to create an ACK response process between your gateway and the nodes. I haven’t had the time to create an example of this yet but perhaps @rickkas7 or @ScruffR have. It would be nice to have a “with_ACK” parameter in Mesh.publish() so the DeviceOS handles the ACK automatically!

armor · November 18, 2019, 3:13pm

Agreed - but doing multi-cast with an ACK response is going to very quickly flood a Mesh network? I don’t have the experience with designing such an acknowledgement process. I also don’t understand the impact it might have on a Mesh network’s effective throughput and latency. Hence the shout-out.

ScruffR · November 18, 2019, 3:34pm

When you have a 1:1 relation between publisher and subscriber both ways you should be able to avoid flooding the mesh. The more subscribers you'll have the more difficult things will become since you'd also need to define how many and/or which of the subscriber's ACK you value the most or all the same.

You also want to make sure that the ACK doesn't get ACKed itself again

armor · November 18, 2019, 3:46pm

A very quick stab at such an ACK process - not Async. I am only proposing doing this from node to gateway on a “sensordata” message.

Node
Setup()
Mesh subscribe to “gateway_ack” ack_handler

Ack_handler()
Parses the device ID from the data argument and the unique message ID
Checks device ID same as this node
Checks if unique message ID in queue waiting ACK
If found then delete message from publish buffer
flag message ACK’d

Loop()
when required add message to Mesh message buffer and assign unique message ID
Mesh publish message
if (waitFor (message ACK’d, timeout))
Delete ACK’d message
If timeout then retry Mesh publish message
Qs - how many retries, what if no ACK after X retries?

Gateway
Handler for message with ACK ()
Receive message
Mesh publish ACK for message
Parse message
use data payload
Exit handler

ScruffR · November 18, 2019, 3:51pm

I’d reduce the parsing effort by having the subscribe filter already contain the device ID.

peekay123 · November 18, 2019, 4:42pm

@armor, there are pros and cons when using Low Power Wireless WAN for mesh. One of them is the bandwidth of the medium used, which in this case is 802.15.4. The other is the protocol above that, which in this case is OpenThread over UDP. The great thing about UDP is that it is stateless, lightweight and you simply fire-and-forget. The bad thing is that the “forget” part can mean your data never gets to its intended target.

So this brings a new dimension when designing meshes - data classification. If data is non-essential, then fire-and-forget is fine. If it is collected and averaged, the same may apply. If the data is single-point critical (you can’t miss a measurement) then you need to ensure the target has received it. Most discussions I have seen so far pertain to node-to-gateway or node-to-(gateway-to)-cloud. I have yet to see node-to-collector node-to-gateway or cloud with intermediary processing and forwarding. This, I believe, is in part because there are no Particle mesh analysis tools. Designing for low(ish) bandwidth networks is always a challenge but it seems that most folks are designing as if they have unlimited bandwidth! We will see some creative solutions as Particle Mesh matures.

Rftop · November 18, 2019, 11:35pm

@peekay123, I've tried to dream up a way for this to work, but I cant think of a solution given the Mesh messages are UDP.

It might be possible with a child Mesh, but that means a device would need to be allowed to bridge 2 independent Mesh Networks. Do you know if the architecture of OpenThread prevents that for future Particle Device OS updates ?

peekay123 · November 19, 2019, 1:00am

@Rftop, doing node-to-collector means assigning a Xenon as a collector node whose “sub” nodes Mesh.publish() to it specifically. Ideally those neighbours are meshed with that node, likely within proximity, creating a “cell” of xenons. The collector node receives the publishes, treats the aggregated data and passes on the “pre-chewed” processed data to the gateway or another collector node. The idea is to keep the traffic to the collector node as much within the cell, reducing traffic on the mesh.

However, this assumes that the collector node doesn’t dumbly pass the Mesh.publish() to the rest of the node, which it most likely does. Understanding how meshes can be optimized using the Mesh pub/sub constructs is something that Particle hasn’t communicated. Nor have mesh analysis tools been shared. Hopefully, it’s just a matter of time.

armor · November 19, 2019, 12:56pm

I tried the schema I outlined above and it doesn’t work - the reason being (I have now just realised) is that the subscribe handler is not running separate to the application loop so if I Mesh publish I can’t wait with a timeout for the subscribe handler to flag the publish has been acknowledged in the same loop() - Duh. And thus, simple ideas need to become more complex but not necessarily more certain to work!

The reason I didn’t want to make the process async is that would mean another thread and potential more fragility in the solution.

I guess I would like to see some critical event ACK mechanism for Mesh - without it making the application code too complicated and convoluted. Does anyone know what OpenThread 1.2 specification brings?

ScruffR · November 19, 2019, 1:10pm

You can make that async without additional threads and it won't even be complicated.
You'd just need to remember when you sent the message, do your stuff and after a set time (millis() - timestamp) check whether the ack has arrived since.
If it has, remove from queue if not act accordingly (e.g. re-send).
No need to wait.

armor · November 19, 2019, 1:25pm

Just to illustrate the issue with the application - it is sensing human presence the Serial Log from endnode looks like this

sensorDataPublishWithAck {“DID”:”***04d9”,”DTS”:”2019-11-19T13:13:36”,”TA”:”18.8”,”SC”:”0”,”MS”:”0”,”AC”:”Presence Ended”,”ASN”:”node31”,”BV”:”4.14”} at 32401

For device sensordata ***04d9 Ack received at 32427

sensorDataPublishWithAck {“DID”:”***04d9”,”DTS”:”2019-11-19T13:14:05”,”TA”:”18.1”,”SC”:”1”,”MS”:”20”,”AC”:”Presence Started”,”ASN”:”node31”,”BV”:”4.14”} at 60983

For device sensordata ***04d9 Ack received at 61011

The Ack is quick from the gateway 26 mSec (admittedly there is only 1 node on the Mesh) even with logging.

I need the start and end of presence to be certain otherwise the stats the gateway consolidates will be rubbish.

I understand your point the further complexity is that now I need a Mesh publish buffer (admittedly it could be be a small circular buffer which I already have for queuing) and then it is the subsequent processing and resending and how many retries and therefore can’t sleep whilst unAck’d data…

peekay123 · November 19, 2019, 1:31pm

@armor, you don’t need another thread, you can use an FSM! I’m currently exploring the use of Active Objects (with and without threads) which take FSMs to another level. There is an lightweight Arduino version available from Quantum Leap.

armor · November 19, 2019, 1:41pm

The application already uses FSM (2 levels) - there is a general run state (error, standby, sleep, gotosleep) and then a sensor state - since the sensor has its own normal, standby and sleep modes and I am trying to sip very carefully from the LiPo battery whilst not missing changes in presence! It is certainly how I would implement what @Scruff has suggested! Time to wrap a wet towel round my head!

Still I am impressed by the speed of the gateway Ack.

Topic		Replies	Views
Need some help with Mesh.subscribe BLE // NFC	9	2232	February 18, 2019
Current reliability of Mesh.publish messages BLE // NFC xenon	5	1064	February 6, 2019
Info about Mesh API BLE // NFC	6	1155	October 12, 2018
Looking for advice on communicating with local Mesh nodes BLE // NFC	8	963	November 17, 2018
MESH: Confused new guy :) Getting Started boron , xenon	2	523	April 10, 2019

Mesh pub-sub at the gateway

Related topics