Large Mesh Networks Thread

I have shared this in another thread, but figured it was worth sharing as a separate thread to see what others are doing.

I now have a 56 Xenon mesh network running without any issues.

  • 55 edge devices
  • 1 gateway with an ethernet bridge connected to a Raspberry Pi over USB & Ethernet.

If anyone has other large networks, I’m interested to hear about your setup and what you’ve seen so far.

13 Likes

This is so cool! I’ve gotta share this internally. :slight_smile:

4 Likes

Cool, what are all the nodes doing if you don’t mind me asking.

Sounds like you’ve got the largest Particle Mesh network so far!

It would be really good to know (and for you to be willing to share this) how much memory is available on each device (endnode) and also on the gateway. I gather you have only recently got this setup. Could you run the Marco Polo test software to see what the response is like?

2 Likes

Right now I wanted to just make sure that many could connect and there wouldn’t be any unforeseen issues.

I will be breaking the network up into multiple, smaller networks to install into buildings for measuring environmental data.

3 Likes

Will do, let me circle back to this over the weekend.

2 Likes

Great to hear @emile! Looking forward to seeing the network grow! :smiley:

2 Likes

The last few days I have been pushing the network further, deploying my app & passing data over the mesh network. At a high level, the network seems to have trouble staying connected to the Particle Cloud over the mesh network.

If there is a better way to do any of this, I’m all ears and happy to test. I’ve put my code at the bottom of the post because this could be the bottle neck. I am not sure.

The goal for this project

The purpose of this is to read pulse meters, collect the number of pulses, and periodically send the data out over the gateway. The Xenon then resets back to 0, and starts counting pulses again.

My setup

To make testing easier, all of the devices are in a single room so that could be introducing other environmental variables/noise.

My gateway is a Xenon connected to a Raspberry Pi over Ethernet & USB.

  • Over ethernet to connect to the network

  • Over USB for power & to write to the USB port on the Pi.

The Pi is connected to the internet over WiFi.

Initially

Initially I wanted to see if 55 devices could stay connected to each other & the Particle cloud without any code installed. All of the devices stayed connected, did not drop off.

However once I pushed my app to the edge devices, I have seen the following issues -

All 55 devices cannot be connected the Particle Cloud & running my app

Now that they are pushing data across the mesh, there seems to be a bottle neck. When I power up all of the devices, the network goes crazy and devices are all unable to reconnect to Particle cloud. Some connect, then others drop off.

It seems like the magic number is somewhere between 37-45 devices that can comfortably stay connected to Particle at the same time.

All 55 devices cannot be updated at the same time.

Because some are not connected to the Particle cloud, you cannot run a bulk OTA update. The Xenons not connected will not get the update.

This meant devices would get certain updates and not others. So the Xenons were all running different versions of the app as I pushed updates.

Is there a way to push updates over the mesh network without an edge Xenon being connected to the cloud?

Dropped the network down to 37 devices

After seeing the errors above, I decided to drop the network size down to 37 edge devices. At this point, all of the devices can stay connected to the Particle cloud at the same time.

The following issues were visible when the network was both 37 & 55 devices.

Individual devices ‘Request Timeout’ during bulk update

To update all of the Xenons, I run a bash script that loops over all of the Xenon device ids.


for name in $names

do

particle flash $name prod-transmitter.ino

done

This will walk through each one. In random occurrences, a flash will timeout.

Devices taking over 10 minutes to reconnect to Particle cloud after the deploy

After a mass OTA deploy, all of the devices would be trying to reconnect over the gateway to the Particle network at the same time. It can take anywhere from 3 - 10+ minutes. Why such a large variation, I am not sure.

To try and speed this up I have put in place 2 solutions:

  • A reset function. If the device hasn’t connected to Particle in 3 minutes to System Reset

  • When a gateway deploys, to send a Mesh.Push alerting all the edges to System Reset

The Gateway Push does seem to lower the reconnection most of the time 2-4 minutes with the 37 device network. However there are instances where it can take 10min+ for the last node to reconnect.

Random devices flashing 1 red light between cyan flashes (Hard fault?)

I am not entirely sure why these pop up but it does happen when devices are trying to reconnect.

Any ideas on this?

My code

For the Gateway:

#include "Particle.h"

SYSTEM_THREAD(ENABLED);
SerialLogHandler logHandler(LOG_LEVEL_ALL);

// For the Pulse
const uint32_t connectivity_message_interval = 180000UL; // 3 min
uint32_t last_connectivity_message_time = 0;  // controls how often we send the offline message

// Pulse each minute
void handlePulse(const char *event, const char *data) {
    Serial.printlnf("event=%s data=%s", event, data ? data : "NULL");
}

// Alert the edges they need to reset
int handleReset(String command) {
    Mesh.publish("reset", "ok");
    return 1;
}

void writeToUSB(const char *event, const char *data) {
    Serial.write(data);
}

void setup() {
    Serial.begin(9600);

    // Send periodic wakeups
    Mesh.subscribe("pulse", handlePulse);

    // When data comes in from the Edge Devices
    Mesh.subscribe("meter-data", writeToUSB);

    // Whenever the Gateway resets, tell the Edges to reset as well
    Particle.function("reset", handleReset);
}

void loop() {
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Mesh.publish("pulse", "hello");
    }
}

For the Edge devices:

SYSTEM_THREAD(ENABLED);
 STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));

 // Settings
 #define __FILENAME__ (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
 #define updatePeriod 14 // update every 10 seconds

// pins
int WATER_SENSOR_PIN = D3;

// general vars
retained int pulseCount = 0;
retained int lastUpdate = 0;

char *version = "0.1.4";
char deviceInfo[120];  //adjust size as required
char json[256];

// Variables for the reset
const uint32_t connectivity_message_interval = 5000;  // 1 sec
uint32_t last_connectivity_message_time = 0;  // controls how often we send the offline message

const uint32_t connectivity_timeout = 180000UL;  // 3 min b/c 30sec is pretty short, but up to you.
uint32_t last_connectivity_change_time = 0;

bool is_particle_connected; // flag to handle the moment of connectivity change

// This is when there is an issue and to reset the Xenon
void handlePulse(const char *event, const char *data) {
    Serial.println("Heard");
}

// This is when there is an issue and to reset the Xenon
void handleNeedReset(const char *event, const char *data) {
    System.reset();
}

// setup() runs once, when the device is first turned on.
void setup() {

  Serial.begin(9600);
  pinMode(WATER_SENSOR_PIN, INPUT);
  attachInterrupt(WATER_SENSOR_PIN, pulse, CHANGE);

//   Mesh.on();  // potentially needed due to bug when mesh module is not already powered up.
//   Mesh.connect();
  Particle.connect();
  Particle.variable("version", version);
  Particle.variable("deviceInfo", deviceInfo);
    snprintf(deviceInfo, sizeof(deviceInfo)
        ,"App: %s, Date: %s, Time: %s, Sysver: %s"
        ,__FILENAME__
        ,__DATE__
        ,__TIME__
        ,(const char*)System.version());
  is_particle_connected = Particle.connected();

  Particle.syncTime();
  Particle.publishVitals(3600);

  // // turn off core LED
  // RGB.control(true);
  // RGB.color(0, 0, 0);

  pulseCount = 0;
  lastUpdate  = Time.now();

  // Mesh
  Mesh.subscribe("pulse", handlePulse);
  Mesh.subscribe("reset", handleNeedReset);
  Serial.println("Finished setup");}

void handle_reset() {
  // I the device is in a bad state, resetting it
  Particle.process();

  if (!Particle.connected()) {
    if (is_particle_connected) {
      // handle moment of disconnection
      is_particle_connected = false;
      last_connectivity_change_time = millis();
      Serial.println("Just Went Offline");
    }
    if ((millis() - last_connectivity_change_time) > connectivity_timeout) {
      // probably an issue connecting, we'll try to fix by resetting
      Serial.println("Resetting due to connectivity timeout...");
      delay(1000);  // allow serial message to get read out
      System.reset();
    }
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Offline");
    }
  }
  else {
    if (!is_particle_connected) {
      // handle moment of connection
      is_particle_connected = true;
      last_connectivity_change_time = millis();
      Serial.println("Just Came Online");
    }
    // We are connected!  Time for normal connected stuff...
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Online");
    }
  }
}

// loop() runs over and over again, as quickly as it can execute.
void loop() {
   handle_reset();

   // Loop to see if there should be any information published
   if (Time.now() >= (lastUpdate + updatePeriod)) {
     noInterrupts();   // Disables interrupts while publishing data
     publish();
     lastUpdate = Time.now();
     interrupts(); // Re-enables interrupts when data is published
   }
}

void pulse(void) {
    // increment pulse count
    pulseCount++;
}

void publish() {
  // Try to push data to the rest of the Mesh Network

  bool success;
  snprintf(json, sizeof(json), "{\"time\":%d,\"device\":\"%s\",\"count\":%d}", Time.now(), System.deviceID().c_str(), pulseCount);
  success = Mesh.publish("meter-data", json);
  if (success == 0) {
    pulseCount = 0;
    lastUpdate  = Time.now();
    Serial.println(json);
  } else {
    Serial.println("Not pushed to Mesh");
  }
}

EDIT: Potential Solution

Thinking this through, I think I have a work around that I need to test - 2 functions that can be triggered over the Mesh network for Particle.connect() and Particle.disconnect.

If the bottle neck is around the cloud, I can just send a Mesh.Publish to the a subset of the devices, that will let them connect to Particle cloud for OTA updates, and then disconnect afterwards.

void handleParticleConnect(const char *event, const char *data) {
    Particle.connect();
}

void handleParticleDisconnect(const char *event, const char *data) {
    Particle.disconnect();
}

I’ll test this tomorrow and see if I can have a pure mesh with all 55 devices.

@emile, I have a couple of questions for your final deployment and maybe a suggestion.

Are you thinking that your final installed Xenon locations can operate in a Star Topology, where each Xenon is within RF range of the Gateway ?

image

If the Star layout works for your location, you wouldn't need to have the Xenons on the MESH unless it's time to send your Pulse Count.
Even if you need to utilize the Mesh Topology because of distance from the Gateway, you can do so with a few Xenons that are powered 24/7 to provide the Mesh Service Area (the overall footprint). Then, your Edge Devices run Manual Mode counting Pulses from your meters. They Connect to the Cloud whenever they decide to Publish their Data, or when your Update Flag is set.

In Automatic Mode, I'm normally seeing 4 seconds for a Sleeping Xenon to Wake up from Pin Sleep and send a Cloud Publish. That 4 seconds includes a coded 2 second delay after the Publish to make sure it's sent. Occasionally the runtime spikes to 9 seconds. This is for 0 Hops (Xenon communicates with the Gateway directly). I assume Manual mode would have similar connection times ?

But the basic concept is to not force the Mesh to maintain a huge number of devices at any particular time, since its not required in most cases. That only adds Overhead to the Mesh Network.

Your 55 Xenons should be able to quickly join the Mesh when required, perform their Cloud duties, and drop off. If your final physical area is to too large for a Star Topology (Zero Hops), then you install a few Xenons that are always ON to maintain the Mesh footprint and forward the Edge Device data to the Gateway as they randomly connect and publish.

1 Like

One thing to check out that will allow you to store pulse count data and only remove it AFTER it has been successfully sent out to the Particle Cloud with ACKNOWLEDGEMENT it was received is by using @rickkas7 PublishQueueAsyncRK library which is a work of art really.

2 Likes

I'm currently thinking a mesh topology.

Great idea. I will definitely look into this. Sounds like I was working towards your solution, but you've filled in the gaps.

Thanks again!

Brilliant - I will definitely check this out. Love it. Thanks @rickkas7!

I pushed an update to all of my edge Xenons, and disconnected them from Particle.

Particle.disconnect() in the setup function.

All are now connected to each other in the mesh network and transmitting data. So it appears that something with the Particle cloud is a blocker. @marekparticle maybe something to look into?

Next I’ll test batch updates by reconnecting them to Particle, pushing updates and then disconnecting them.

1 Like

Can you step through the Xenon's sequentially, to reduce the Mesh Traffic?
Or insert a randomized delay for each Xenon to join the batch update ?
I'm just thinking out loud.

2 Likes

This is what I'm thinking right now. :+1:

2 Likes

you could also wake each xenon on some second or minute boundary based on the last 2 digits of the device ID …

1 Like

Given Xenons only have Mesh comms how would this work?

At the last G3CC meeting we were asked to provide ideas for future mesh development. One of items I requested was an Device OS API with the ability to stop a mesh node being a repeater (i.e. force a star topology).

Gateways and Repeater nodes must be continually awake otherwise the sleepy endnode publish doesn't work and currently it fails silently since the bool return from Mesh.publish() doesn't work.

1 Like

A sleeping Xenon can easily wake, connect to the cloud via the Gateway, and perform a Particle.publish.
It needs to be within radio distance to the Gateway, or another Xenon that's powered 24/7 (router) that's on the Mesh.

Most average users wouldn't need to perform any Mesh.publish at all.

IMHO, I don't see an immediate need to force a Star Topology. That's easily done with the physical layout, but I can't see a reason to need to force a Star.

I personally don't see a need for stopping a Xenon from being a Repeater.
If any particular Xenon has a mains/utility power source, properly designed Solar Solution, etc (powered 24/7), then it helps fortify the Mesh Network. As long as you don't take that too far and flood the Mesh with unnecessary overhead.

When we perform a site visit for a Mesh Deployment, we install "routers" (powered 24/7) spaced properly (depending on the environment) to define the Mesh Service Area (footprint). Each "router" needs to be able to communicate with 2-3 neighbors. You also want to ensure several paths back to the Gateway(IE: the Gateway needs a few neighbors). Once that's done, no other Xenon's (in this example) remain on the Mesh, they are endnodes. The endnodes wake as required and perform a Particle.publish to send sensory data and check the Gateway for a firmware update.

Network Topology Tools would be helpful for the Gen3. During installation, we need to know/determine a Xenon's neighbors. This is the most important part of any Mesh Install.

2 Likes

The point of sleeping endnodes is to conserve power usage and therefore enable true wireless use by being battery powered. Even if you are in a building with a mains/utility power source it is still much more convenient to be able to deploy devices (endnodes) without needing to mains power them. Needing to connect through the gateway to the Cloud on waking surely uses a lot more power and doesn’t appear to be necessary. I agree for most average Mesh users not needing such a solution - the topic here is Large Mesh networks and that is a different matter. Could you expand on your experience of deploying Particle/Other? Mesh networks and the Network Topology Tools you use - I am keen to feed the planning going on at Particle.

What do you mean by this?