Large Mesh Networks Thread

Right now I wanted to just make sure that many could connect and there wouldn’t be any unforeseen issues.

I will be breaking the network up into multiple, smaller networks to install into buildings for measuring environmental data.

3 Likes

Will do, let me circle back to this over the weekend.

2 Likes

Great to hear @emile! Looking forward to seeing the network grow! :smiley:

2 Likes

The last few days I have been pushing the network further, deploying my app & passing data over the mesh network. At a high level, the network seems to have trouble staying connected to the Particle Cloud over the mesh network.

If there is a better way to do any of this, I’m all ears and happy to test. I’ve put my code at the bottom of the post because this could be the bottle neck. I am not sure.

The goal for this project

The purpose of this is to read pulse meters, collect the number of pulses, and periodically send the data out over the gateway. The Xenon then resets back to 0, and starts counting pulses again.

My setup

To make testing easier, all of the devices are in a single room so that could be introducing other environmental variables/noise.

My gateway is a Xenon connected to a Raspberry Pi over Ethernet & USB.

  • Over ethernet to connect to the network

  • Over USB for power & to write to the USB port on the Pi.

The Pi is connected to the internet over WiFi.

Initially

Initially I wanted to see if 55 devices could stay connected to each other & the Particle cloud without any code installed. All of the devices stayed connected, did not drop off.

However once I pushed my app to the edge devices, I have seen the following issues -

All 55 devices cannot be connected the Particle Cloud & running my app

Now that they are pushing data across the mesh, there seems to be a bottle neck. When I power up all of the devices, the network goes crazy and devices are all unable to reconnect to Particle cloud. Some connect, then others drop off.

It seems like the magic number is somewhere between 37-45 devices that can comfortably stay connected to Particle at the same time.

All 55 devices cannot be updated at the same time.

Because some are not connected to the Particle cloud, you cannot run a bulk OTA update. The Xenons not connected will not get the update.

This meant devices would get certain updates and not others. So the Xenons were all running different versions of the app as I pushed updates.

Is there a way to push updates over the mesh network without an edge Xenon being connected to the cloud?

Dropped the network down to 37 devices

After seeing the errors above, I decided to drop the network size down to 37 edge devices. At this point, all of the devices can stay connected to the Particle cloud at the same time.

The following issues were visible when the network was both 37 & 55 devices.

Individual devices ‘Request Timeout’ during bulk update

To update all of the Xenons, I run a bash script that loops over all of the Xenon device ids.


for name in $names

do

particle flash $name prod-transmitter.ino

done

This will walk through each one. In random occurrences, a flash will timeout.

Devices taking over 10 minutes to reconnect to Particle cloud after the deploy

After a mass OTA deploy, all of the devices would be trying to reconnect over the gateway to the Particle network at the same time. It can take anywhere from 3 - 10+ minutes. Why such a large variation, I am not sure.

To try and speed this up I have put in place 2 solutions:

  • A reset function. If the device hasn’t connected to Particle in 3 minutes to System Reset

  • When a gateway deploys, to send a Mesh.Push alerting all the edges to System Reset

The Gateway Push does seem to lower the reconnection most of the time 2-4 minutes with the 37 device network. However there are instances where it can take 10min+ for the last node to reconnect.

Random devices flashing 1 red light between cyan flashes (Hard fault?)

I am not entirely sure why these pop up but it does happen when devices are trying to reconnect.

Any ideas on this?

My code

For the Gateway:

#include "Particle.h"

SYSTEM_THREAD(ENABLED);
SerialLogHandler logHandler(LOG_LEVEL_ALL);

// For the Pulse
const uint32_t connectivity_message_interval = 180000UL; // 3 min
uint32_t last_connectivity_message_time = 0;  // controls how often we send the offline message

// Pulse each minute
void handlePulse(const char *event, const char *data) {
    Serial.printlnf("event=%s data=%s", event, data ? data : "NULL");
}

// Alert the edges they need to reset
int handleReset(String command) {
    Mesh.publish("reset", "ok");
    return 1;
}

void writeToUSB(const char *event, const char *data) {
    Serial.write(data);
}

void setup() {
    Serial.begin(9600);

    // Send periodic wakeups
    Mesh.subscribe("pulse", handlePulse);

    // When data comes in from the Edge Devices
    Mesh.subscribe("meter-data", writeToUSB);

    // Whenever the Gateway resets, tell the Edges to reset as well
    Particle.function("reset", handleReset);
}

void loop() {
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Mesh.publish("pulse", "hello");
    }
}

For the Edge devices:

SYSTEM_THREAD(ENABLED);
 STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));

 // Settings
 #define __FILENAME__ (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
 #define updatePeriod 14 // update every 10 seconds

// pins
int WATER_SENSOR_PIN = D3;

// general vars
retained int pulseCount = 0;
retained int lastUpdate = 0;

char *version = "0.1.4";
char deviceInfo[120];  //adjust size as required
char json[256];

// Variables for the reset
const uint32_t connectivity_message_interval = 5000;  // 1 sec
uint32_t last_connectivity_message_time = 0;  // controls how often we send the offline message

const uint32_t connectivity_timeout = 180000UL;  // 3 min b/c 30sec is pretty short, but up to you.
uint32_t last_connectivity_change_time = 0;

bool is_particle_connected; // flag to handle the moment of connectivity change

// This is when there is an issue and to reset the Xenon
void handlePulse(const char *event, const char *data) {
    Serial.println("Heard");
}

// This is when there is an issue and to reset the Xenon
void handleNeedReset(const char *event, const char *data) {
    System.reset();
}

// setup() runs once, when the device is first turned on.
void setup() {

  Serial.begin(9600);
  pinMode(WATER_SENSOR_PIN, INPUT);
  attachInterrupt(WATER_SENSOR_PIN, pulse, CHANGE);

//   Mesh.on();  // potentially needed due to bug when mesh module is not already powered up.
//   Mesh.connect();
  Particle.connect();
  Particle.variable("version", version);
  Particle.variable("deviceInfo", deviceInfo);
    snprintf(deviceInfo, sizeof(deviceInfo)
        ,"App: %s, Date: %s, Time: %s, Sysver: %s"
        ,__FILENAME__
        ,__DATE__
        ,__TIME__
        ,(const char*)System.version());
  is_particle_connected = Particle.connected();

  Particle.syncTime();
  Particle.publishVitals(3600);

  // // turn off core LED
  // RGB.control(true);
  // RGB.color(0, 0, 0);

  pulseCount = 0;
  lastUpdate  = Time.now();

  // Mesh
  Mesh.subscribe("pulse", handlePulse);
  Mesh.subscribe("reset", handleNeedReset);
  Serial.println("Finished setup");}

void handle_reset() {
  // I the device is in a bad state, resetting it
  Particle.process();

  if (!Particle.connected()) {
    if (is_particle_connected) {
      // handle moment of disconnection
      is_particle_connected = false;
      last_connectivity_change_time = millis();
      Serial.println("Just Went Offline");
    }
    if ((millis() - last_connectivity_change_time) > connectivity_timeout) {
      // probably an issue connecting, we'll try to fix by resetting
      Serial.println("Resetting due to connectivity timeout...");
      delay(1000);  // allow serial message to get read out
      System.reset();
    }
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Offline");
    }
  }
  else {
    if (!is_particle_connected) {
      // handle moment of connection
      is_particle_connected = true;
      last_connectivity_change_time = millis();
      Serial.println("Just Came Online");
    }
    // We are connected!  Time for normal connected stuff...
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Online");
    }
  }
}

// loop() runs over and over again, as quickly as it can execute.
void loop() {
   handle_reset();

   // Loop to see if there should be any information published
   if (Time.now() >= (lastUpdate + updatePeriod)) {
     noInterrupts();   // Disables interrupts while publishing data
     publish();
     lastUpdate = Time.now();
     interrupts(); // Re-enables interrupts when data is published
   }
}

void pulse(void) {
    // increment pulse count
    pulseCount++;
}

void publish() {
  // Try to push data to the rest of the Mesh Network

  bool success;
  snprintf(json, sizeof(json), "{\"time\":%d,\"device\":\"%s\",\"count\":%d}", Time.now(), System.deviceID().c_str(), pulseCount);
  success = Mesh.publish("meter-data", json);
  if (success == 0) {
    pulseCount = 0;
    lastUpdate  = Time.now();
    Serial.println(json);
  } else {
    Serial.println("Not pushed to Mesh");
  }
}

EDIT: Potential Solution

Thinking this through, I think I have a work around that I need to test - 2 functions that can be triggered over the Mesh network for Particle.connect() and Particle.disconnect.

If the bottle neck is around the cloud, I can just send a Mesh.Publish to the a subset of the devices, that will let them connect to Particle cloud for OTA updates, and then disconnect afterwards.

void handleParticleConnect(const char *event, const char *data) {
    Particle.connect();
}

void handleParticleDisconnect(const char *event, const char *data) {
    Particle.disconnect();
}

I’ll test this tomorrow and see if I can have a pure mesh with all 55 devices.

@emile, I have a couple of questions for your final deployment and maybe a suggestion.

Are you thinking that your final installed Xenon locations can operate in a Star Topology, where each Xenon is within RF range of the Gateway ?

image

If the Star layout works for your location, you wouldn't need to have the Xenons on the MESH unless it's time to send your Pulse Count.
Even if you need to utilize the Mesh Topology because of distance from the Gateway, you can do so with a few Xenons that are powered 24/7 to provide the Mesh Service Area (the overall footprint). Then, your Edge Devices run Manual Mode counting Pulses from your meters. They Connect to the Cloud whenever they decide to Publish their Data, or when your Update Flag is set.

In Automatic Mode, I'm normally seeing 4 seconds for a Sleeping Xenon to Wake up from Pin Sleep and send a Cloud Publish. That 4 seconds includes a coded 2 second delay after the Publish to make sure it's sent. Occasionally the runtime spikes to 9 seconds. This is for 0 Hops (Xenon communicates with the Gateway directly). I assume Manual mode would have similar connection times ?

But the basic concept is to not force the Mesh to maintain a huge number of devices at any particular time, since its not required in most cases. That only adds Overhead to the Mesh Network.

Your 55 Xenons should be able to quickly join the Mesh when required, perform their Cloud duties, and drop off. If your final physical area is to too large for a Star Topology (Zero Hops), then you install a few Xenons that are always ON to maintain the Mesh footprint and forward the Edge Device data to the Gateway as they randomly connect and publish.

1 Like

One thing to check out that will allow you to store pulse count data and only remove it AFTER it has been successfully sent out to the Particle Cloud with ACKNOWLEDGEMENT it was received is by using @rickkas7 PublishQueueAsyncRK library which is a work of art really.

2 Likes

I'm currently thinking a mesh topology.

Great idea. I will definitely look into this. Sounds like I was working towards your solution, but you've filled in the gaps.

Thanks again!

Brilliant - I will definitely check this out. Love it. Thanks @rickkas7!

I pushed an update to all of my edge Xenons, and disconnected them from Particle.

Particle.disconnect() in the setup function.

All are now connected to each other in the mesh network and transmitting data. So it appears that something with the Particle cloud is a blocker. @marekparticle maybe something to look into?

Next I’ll test batch updates by reconnecting them to Particle, pushing updates and then disconnecting them.

1 Like

Can you step through the Xenon's sequentially, to reduce the Mesh Traffic?
Or insert a randomized delay for each Xenon to join the batch update ?
I'm just thinking out loud.

2 Likes

This is what I'm thinking right now. :+1:

2 Likes

you could also wake each xenon on some second or minute boundary based on the last 2 digits of the device ID …

1 Like

Given Xenons only have Mesh comms how would this work?

At the last G3CC meeting we were asked to provide ideas for future mesh development. One of items I requested was an Device OS API with the ability to stop a mesh node being a repeater (i.e. force a star topology).

Gateways and Repeater nodes must be continually awake otherwise the sleepy endnode publish doesn't work and currently it fails silently since the bool return from Mesh.publish() doesn't work.

1 Like

A sleeping Xenon can easily wake, connect to the cloud via the Gateway, and perform a Particle.publish.
It needs to be within radio distance to the Gateway, or another Xenon that's powered 24/7 (router) that's on the Mesh.

Most average users wouldn't need to perform any Mesh.publish at all.

IMHO, I don't see an immediate need to force a Star Topology. That's easily done with the physical layout, but I can't see a reason to need to force a Star.

I personally don't see a need for stopping a Xenon from being a Repeater.
If any particular Xenon has a mains/utility power source, properly designed Solar Solution, etc (powered 24/7), then it helps fortify the Mesh Network. As long as you don't take that too far and flood the Mesh with unnecessary overhead.

When we perform a site visit for a Mesh Deployment, we install "routers" (powered 24/7) spaced properly (depending on the environment) to define the Mesh Service Area (footprint). Each "router" needs to be able to communicate with 2-3 neighbors. You also want to ensure several paths back to the Gateway(IE: the Gateway needs a few neighbors). Once that's done, no other Xenon's (in this example) remain on the Mesh, they are endnodes. The endnodes wake as required and perform a Particle.publish to send sensory data and check the Gateway for a firmware update.

Network Topology Tools would be helpful for the Gen3. During installation, we need to know/determine a Xenon's neighbors. This is the most important part of any Mesh Install.

2 Likes

The point of sleeping endnodes is to conserve power usage and therefore enable true wireless use by being battery powered. Even if you are in a building with a mains/utility power source it is still much more convenient to be able to deploy devices (endnodes) without needing to mains power them. Needing to connect through the gateway to the Cloud on waking surely uses a lot more power and doesn’t appear to be necessary. I agree for most average Mesh users not needing such a solution - the topic here is Large Mesh networks and that is a different matter. Could you expand on your experience of deploying Particle/Other? Mesh networks and the Network Topology Tools you use - I am keen to feed the planning going on at Particle.

What do you mean by this?

I agree and didn't intend to imply that :grinning:
The Xenon shines as a Low-Powered device. I absolutely love it.

The only time a Xenon (edge device, endnode, sensor node, etc) needs to connect to the Gateway is when it need to push data to the outside world.

Thus the suggestion to not have 55 devices continuously on the Mesh at any one time, it only complicates things. Users will get much better performance having Xenons join the mesh only when necessary to push data to the outside world. Only the Gateway and the Xenons required to develop the physical service area should remain on the Mesh Network. You still have a 55 device network, they normally don't have any reason to all be on the Mesh at once.

IMO, the leader is Digi. Take a look at XCTU.
image
The Particle topology tools dont need to be as elaborate, but we do need a quick way to ask a Xenon what other Gen3 devices are within it's radio range, especially during the original deployment for the "routers".

I appoligize if I sound like I'm preaching.
I promise that's not my intention.
Just sharing experience that I learned the Hard Way over the years with other Mesh Networks.

3 Likes

Sounds like I should just have the devices disconnect on setup, reconnect to the Mesh when I need to pass data back to the gateway. The issue with this right now is that I want to push as much data as possible so is there a lower limit where this doesn’t make sense?

Ex. if I am pushing every 10 seconds (as an example), then there is time to reconnect to the network, and I have to make sure every device is online to get the data back to the gateway. Does it make sense to

  • disconnect from the network
  • reconnect
  • check there is a connection
  • send the data
  • get the response back from the gateway
  • then have some buffer to make sure there is no other data coming across from other Xenons in the network
  • and then go to sleep

This is my first mesh network so learning on the fly. Thanks everyone, this is turning into a great, little thread.

@armor and @Rftop, a single gateway topology definitely calls for a planning an measuring tool and we have been pushing for this on both the Elite and G3CC sides. Besides the obvious need for HA, I also believe there is a need for a Xenon bases mesh bridge to allow two meshes to connect. With a gateway on each mesh to keep traffic-per-gateway low, a bridge would also allow for a mesh to fall back to the connected mesh as another type of HA.

I totally support the idea of creating a Xenon repeater “backbone” first then adding Xenon endnodes to those. That is simply good mesh planning IMO which I have been preaching since day one (:wink:). Having sleepy-node capability will add one of the key missing pieces in addressing “larger” meshes.

The biggest mistake I see if folks making is treating mesh like a high speed network, which it is not. Something that has not been discussed is data aggregation at backbone nodes for processing and passing upstream to reduce data bandwidth. BTW, a Particle.publish() from a node still uses an underlying Mesh.publish()-like transmission except that the data needs to get to a gateway to make its way to the Cloud. So consideration of ALL data over the 250Kbps bandwidth of a mesh is a key factor. This is a major concern when trying to apply OTA to mesh devices simultaneously as the data can quickly swamp the mesh, especially if it’s already loaded. OTA to mesh may need a “smarter” sequenced approach which has not been discussed as of yet.

I just want to say thank you t @emile for starting this area of exploration and to his time devoted to testing different ideas!

2 Likes

That should be a Sticky :sunglasses:

I recently explained a Mesh Network like this:
A Mesh has limited Bandwidth ( a lot is used for Network Overhead), but unlimited Time on it's hands.
Just do what you can to space out your traffic....there's plenty of time in a day.

3 Likes