The last few days I have been pushing the network further, deploying my app & passing data over the mesh network. At a high level, the network seems to have trouble staying connected to the Particle Cloud over the mesh network.
If there is a better way to do any of this, I’m all ears and happy to test. I’ve put my code at the bottom of the post because this could be the bottle neck. I am not sure.
The goal for this project
The purpose of this is to read pulse meters, collect the number of pulses, and periodically send the data out over the gateway. The Xenon then resets back to 0, and starts counting pulses again.
My setup
To make testing easier, all of the devices are in a single room so that could be introducing other environmental variables/noise.
My gateway is a Xenon connected to a Raspberry Pi over Ethernet & USB.
The Pi is connected to the internet over WiFi.
Initially
Initially I wanted to see if 55 devices could stay connected to each other & the Particle cloud without any code installed. All of the devices stayed connected, did not drop off.
However once I pushed my app to the edge devices, I have seen the following issues -
All 55 devices cannot be connected the Particle Cloud & running my app
Now that they are pushing data across the mesh, there seems to be a bottle neck. When I power up all of the devices, the network goes crazy and devices are all unable to reconnect to Particle cloud. Some connect, then others drop off.
It seems like the magic number is somewhere between 37-45 devices that can comfortably stay connected to Particle at the same time.
All 55 devices cannot be updated at the same time.
Because some are not connected to the Particle cloud, you cannot run a bulk OTA update. The Xenons not connected will not get the update.
This meant devices would get certain updates and not others. So the Xenons were all running different versions of the app as I pushed updates.
Is there a way to push updates over the mesh network without an edge Xenon being connected to the cloud?
Dropped the network down to 37 devices
After seeing the errors above, I decided to drop the network size down to 37 edge devices. At this point, all of the devices can stay connected to the Particle cloud at the same time.
The following issues were visible when the network was both 37 & 55 devices.
Individual devices ‘Request Timeout’ during bulk update
To update all of the Xenons, I run a bash script that loops over all of the Xenon device ids.
for name in $names
do
particle flash $name prod-transmitter.ino
done
This will walk through each one. In random occurrences, a flash will timeout.
Devices taking over 10 minutes to reconnect to Particle cloud after the deploy
After a mass OTA deploy, all of the devices would be trying to reconnect over the gateway to the Particle network at the same time. It can take anywhere from 3 - 10+ minutes. Why such a large variation, I am not sure.
To try and speed this up I have put in place 2 solutions:
-
A reset function. If the device hasn’t connected to Particle in 3 minutes to System Reset
-
When a gateway deploys, to send a Mesh.Push alerting all the edges to System Reset
The Gateway Push does seem to lower the reconnection most of the time 2-4 minutes with the 37 device network. However there are instances where it can take 10min+ for the last node to reconnect.
Random devices flashing 1 red light between cyan flashes (Hard fault?)
I am not entirely sure why these pop up but it does happen when devices are trying to reconnect.
Any ideas on this?
My code
For the Gateway:
#include "Particle.h"
SYSTEM_THREAD(ENABLED);
SerialLogHandler logHandler(LOG_LEVEL_ALL);
// For the Pulse
const uint32_t connectivity_message_interval = 180000UL; // 3 min
uint32_t last_connectivity_message_time = 0; // controls how often we send the offline message
// Pulse each minute
void handlePulse(const char *event, const char *data) {
Serial.printlnf("event=%s data=%s", event, data ? data : "NULL");
}
// Alert the edges they need to reset
int handleReset(String command) {
Mesh.publish("reset", "ok");
return 1;
}
void writeToUSB(const char *event, const char *data) {
Serial.write(data);
}
void setup() {
Serial.begin(9600);
// Send periodic wakeups
Mesh.subscribe("pulse", handlePulse);
// When data comes in from the Edge Devices
Mesh.subscribe("meter-data", writeToUSB);
// Whenever the Gateway resets, tell the Edges to reset as well
Particle.function("reset", handleReset);
}
void loop() {
if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
last_connectivity_message_time = millis();
Mesh.publish("pulse", "hello");
}
}
For the Edge devices:
SYSTEM_THREAD(ENABLED);
STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));
// Settings
#define __FILENAME__ (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : __FILE__)
#define updatePeriod 14 // update every 10 seconds
// pins
int WATER_SENSOR_PIN = D3;
// general vars
retained int pulseCount = 0;
retained int lastUpdate = 0;
char *version = "0.1.4";
char deviceInfo[120]; //adjust size as required
char json[256];
// Variables for the reset
const uint32_t connectivity_message_interval = 5000; // 1 sec
uint32_t last_connectivity_message_time = 0; // controls how often we send the offline message
const uint32_t connectivity_timeout = 180000UL; // 3 min b/c 30sec is pretty short, but up to you.
uint32_t last_connectivity_change_time = 0;
bool is_particle_connected; // flag to handle the moment of connectivity change
// This is when there is an issue and to reset the Xenon
void handlePulse(const char *event, const char *data) {
Serial.println("Heard");
}
// This is when there is an issue and to reset the Xenon
void handleNeedReset(const char *event, const char *data) {
System.reset();
}
// setup() runs once, when the device is first turned on.
void setup() {
Serial.begin(9600);
pinMode(WATER_SENSOR_PIN, INPUT);
attachInterrupt(WATER_SENSOR_PIN, pulse, CHANGE);
// Mesh.on(); // potentially needed due to bug when mesh module is not already powered up.
// Mesh.connect();
Particle.connect();
Particle.variable("version", version);
Particle.variable("deviceInfo", deviceInfo);
snprintf(deviceInfo, sizeof(deviceInfo)
,"App: %s, Date: %s, Time: %s, Sysver: %s"
,__FILENAME__
,__DATE__
,__TIME__
,(const char*)System.version());
is_particle_connected = Particle.connected();
Particle.syncTime();
Particle.publishVitals(3600);
// // turn off core LED
// RGB.control(true);
// RGB.color(0, 0, 0);
pulseCount = 0;
lastUpdate = Time.now();
// Mesh
Mesh.subscribe("pulse", handlePulse);
Mesh.subscribe("reset", handleNeedReset);
Serial.println("Finished setup");}
void handle_reset() {
// I the device is in a bad state, resetting it
Particle.process();
if (!Particle.connected()) {
if (is_particle_connected) {
// handle moment of disconnection
is_particle_connected = false;
last_connectivity_change_time = millis();
Serial.println("Just Went Offline");
}
if ((millis() - last_connectivity_change_time) > connectivity_timeout) {
// probably an issue connecting, we'll try to fix by resetting
Serial.println("Resetting due to connectivity timeout...");
delay(1000); // allow serial message to get read out
System.reset();
}
if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
last_connectivity_message_time = millis();
Serial.println("Still Offline");
}
}
else {
if (!is_particle_connected) {
// handle moment of connection
is_particle_connected = true;
last_connectivity_change_time = millis();
Serial.println("Just Came Online");
}
// We are connected! Time for normal connected stuff...
if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
last_connectivity_message_time = millis();
Serial.println("Still Online");
}
}
}
// loop() runs over and over again, as quickly as it can execute.
void loop() {
handle_reset();
// Loop to see if there should be any information published
if (Time.now() >= (lastUpdate + updatePeriod)) {
noInterrupts(); // Disables interrupts while publishing data
publish();
lastUpdate = Time.now();
interrupts(); // Re-enables interrupts when data is published
}
}
void pulse(void) {
// increment pulse count
pulseCount++;
}
void publish() {
// Try to push data to the rest of the Mesh Network
bool success;
snprintf(json, sizeof(json), "{\"time\":%d,\"device\":\"%s\",\"count\":%d}", Time.now(), System.deviceID().c_str(), pulseCount);
success = Mesh.publish("meter-data", json);
if (success == 0) {
pulseCount = 0;
lastUpdate = Time.now();
Serial.println(json);
} else {
Serial.println("Not pushed to Mesh");
}
}
EDIT: Potential Solution
Thinking this through, I think I have a work around that I need to test - 2 functions that can be triggered over the Mesh network for Particle.connect()
and Particle.disconnect
.
If the bottle neck is around the cloud, I can just send a Mesh.Publish
to the a subset of the devices, that will let them connect to Particle cloud for OTA updates, and then disconnect afterwards.
void handleParticleConnect(const char *event, const char *data) {
Particle.connect();
}
void handleParticleDisconnect(const char *event, const char *data) {
Particle.disconnect();
}
I’ll test this tomorrow and see if I can have a pure mesh with all 55 devices.