I have a project where I need to control 48 Xenons in a mesh network. Each Xenon is supposed to receive a command via mesh subscribe, operate some field equipment, and report its progress (or lack thereof) via mesh publish.
Xenons are installed outdoors, in a flat field, in a row with 5.5m between each unit. I initially planned to have Argon gateway in the middle, with end units being 130m away from the gateway (but still 5.5m from the nearest Xenon):
X <- 5.5m -> X <- 5.5m -> X … X <- 5.5m -> A <- 5.5m -> X <- 5.5m -> X … X <- 5.5m -> X
I started with 6 Xenons, spaced out nearly the full length of the field, and it worked OK.
Now I added 18 more (full 24 set on one side), and it appears that not all Xenons are connecting to the mesh, or at least not all at the same time. Oddly, sometimes ones further away connect, and closer ones do not. Seems to be function of time rather than distance (?!?).
Secondly, I can see that even when most Xenons are connected to the mesh (they do their jobs as commanded and report back), I can only see Argon and a handful of Xenons connected to the cloud at any one time in my Web IDE. This means I can OTA flash Argon, but not Xenons, and this is a problem (this is a remote, unattended site).
I’m suspecting that Argon can’t cope with this many Xenons attempting to use it as a gateway.
Optionally, too much traffic on mesh?
I’m also not sure if there is a way to boost Xenon mesh antennas, and if this would even make a difference.
Questions:
Can I add more Argons (gateways) to the same mesh? (And if so, how? Last time I tried app said one Argon per mesh.)
If not, should I split my field of 48 Xenons into 4 (or more) separate mesh networks? Will there be issues with 4 mesh networks operating in close(ish) proximity?
I’m trying to avoid having multiple independent Argons getting separate instructions from the outside world and publishing to separate mesh networks (opens up a whole new set of modbus related problems…)
What you are porposing is what Particle calls High Availability Networks (multiple gateways in one mesh) but that is a future feature currently not supported.
@vbarac, the nRF52840 runs OpenThread on the (low-rate wireless personal area networks) 802.15.4 protocol with a maximum 250kpbs bitrate. With a large mesh creating a lot of traffic, it is not hard to see how quickly the mesh can saturate and drop (UDP) messages. With every node using Particle.publish() or Particle.subscribe(), mesh traffic quickly adds up. This, by the way, is an issue with ANY large mesh.
One way to reduce traffic is to broker all Cloud requests in the gateway and Mesh.publish() into and from the mesh nodes only when necessary. Another is to create aggregator nodes that collect data, reduce it (eg averaging) and then sends it on to the Cloud. This is why mesh topology and design is important and also why folks have been asking for mesh analysis tools.
Even with HA, “large” Mesh networks will still suffer due to the fact that OpenThread uses UDP multicasting for sending messages. An ideal mesh would support “bridge” devices to keep smaller sub-mesh traffic out of the larger network whenever possible. There is a lot of work yet to be done by Particle in both the High Availability and High Reliability mesh software. However, a good understanding of Mesh principles, bitrate limitations, topologies and design can go a long way in implementing larger mesh networks.
I have been working on an architecture to support around 30 xenon end nodes off a single gateway based upon xenon plus ethernet featherwing. The plan has been to have only local mesh traffic between the end nodes and the gateway. I spent some time looking at the thread services that Particle haven’t implemented but are key to supporting a reasonable size mesh network. I haven’t got as far as volume testing with any more than 5 devices. A couple of things to note from your questions;
End nodes are mesh only connected until an OTA command is sent to the gateway which can then instruct an endnode to cloud connect and enable flashing. This is a slow business with the mesh bandwidth and thus isn’t something to be done often and certainly not across a large mesh all at once!
I have a heartbeat mesh event but even this can be turned off to save bandwidth for instructions/commands and for sensor data back to the gateway.
I haven’t implemented an ACK on the mesh publishes to the gateway and I suspect that events are occasionally missed or lost.
I haven’t implemented an ACK on the mesh command publishes by the gateway and I suspect that events are occasionally missed - although there is a retry mechanism for commands.
You need to sync the time on mesh devices - each xenon needs a RTC. Therefore, time services to sync with the gateway and allow for timestamping of data is a non trivial exercise.
I don’t claim the way I am doing things is the right way and debugging the mesh network behaviour is very difficult without analysis tools. Hopefully this overview gives you an idea of some of the things you will need.
I'd think this might be an optional "demand" depending on how critical exat timestamps really are.
If you can get away with timestamping the individual readings on reception by the gateway it would be the only one that needs an RTC. With some wiggle room on the timing you should be able to get away with the "precision" of the real-time-counters on the Xenons.
I agree. The lack of a “real” RTC in the nRF52840 is telling. Why did Nordic skip on this? I suspect it is because in many applications, time accuracy is not necessary and given that STOP vs STANDBY sleep don’t differ by much current on the Xenon, timed waking via the Real Time Counter is adequate. Personally, I believe the missing “real” RTC is a design miss.
However, given the bandwidth restrictions, a mesh design requires that “real time” events be managed locally, at the node and not over mesh or cloud. Latency exists and it is not deterministic so a designer needs to take this into account. Of course, “real time” is whatever is defined by the designer and could be minutes, hours, seconds, milliseconds or even microseconds.
Or by the measurement taken. When measuring the water level in a huge reservoir it doesn't really matter whether the value is 10 seconds or 2 minutes old but the endstops of some on heavy machinery should not take that long to get reported
Hence, in order to give correct guidance the data acquired should be known - with fuzzy input only rough estimates can be made.
Thanks for the advice.
I’m not using Particle.subscribe or publish at all, just publishing/subscribing to the mesh. However, I still need to publish command from the gateway every 3 minutes, and collect status reports from end nodes at least once every 10 minutes. I tried spreading the load (publish command to only 8 nodes; have them reply 15 seconds apart; publish to next 8…), but it did not help at all - looked like only some of the end nodes happened to be connected to the mesh at any one time, so many missed the command entirely.
Internally to the mesh, when publish/subscribe happens, is there any internal handshake/acknowledge? I thought there was, but from what peekay is saying, it is just a UDP broadcast? So if endpoint has momentarily dropped mesh connection, and broadcast happens, endpoint simply misses it? Not good at all… To achieve reliability outside of lab environment I need to spam publish so nodes (eventually) receive it, but in doing so I saturate the network. Catch 22.
I suppose the only way I can get this project going will be splitting one big mesh network into several smaller ones, perhaps no more than 10 nodes each.
Although this is slightly off topic regarding multiple gateways on one mesh network, I would like to add a quick solution.
Been running some tests on Gen 3 devices as I needed redundancy.
Although HA network is not currently available(some inside information from Particle also suggests this wont be available/supported in near future), a workaround is suggested.
I needed a cloud connection at all times for my projects, so I used an Argon as my preferred gateway.
When my Argon lost its cloud connection, it turns on a relay to power a Boron running the exact same code.
If and when the Argon comes back online, it turns off the Boron.
This might not be very productive. but solved my problem!
Now regarding Mesh network, I guess breaking up your endpoints into multiple networks each with their own gateway could be implemented.