Boron + 5-Xenon Setup Questions


#21

I checked the console and it showed no rate limiting going on.

I also slowed down the mesh publishes to prevent this from happening. Even when publishing faster I did not see any rate limiting.

I reset some Xenons to bring them all back online and it worked for awhile.

It doesn’t take long for one or two of the Xenons to stop responding. Some times they come back later, and sometime a reset is required.

Maybe I’ll try slowing the mesh publish down some more and see what happens. They are publishing every 5-6 seconds per xenon now.

I wish I could particle publish directly from the Xenon’s but they always end up in the flashing cyan state when when running off the Argon.

What is working best for you?


#22

That’s not surprising. The 4/sec rate limit is never reported by the console since the cloud doesn’t even know about the fact because tha device just doesn’t send the event when it itself detects the violation to prevent flooding the cloud.

Borons seem more stable and currently the only way I can sensibly test due to my IPv6 issue with my home WiFi. Adding a secondary router or using a mobile hotspot are only workarounds that introduce other issues for me.
Hence I’m waiting for rc.26 and hope that will resolve these sore points.


#23

I wrote the heartbeat code to publish on a single node with the gateway and other nodes listening for testing only. @twoyoon’s code is great though instead of publishing on receiving every node heartbeat, I will modify the code to maintain counters, average time between heartbeats and peak time between heartbeat for every node. I will keep these as Particle.variables or publish them at fixed intervals.

As for my Boron + single Xenon mesh, when the Boron Cloud connection was lost briefly, the Boron came back up but the Xenon node did not reconnect to the Boron, not unlike what happens with the Argon. Resetting the node fixed the problem.


#24

Here’s my quick version of a Mesh Marco-Polo test. The gateway gets flashed with the “Marco” code where it requests that each node report with a “Polo” heartbeat. The nodes get flashed with the “Polo” code where they simple publish to the mesh when requested. The Marco code records the amount of time it takes to report, stores an array of known nodes and a count of how many known nodes responded.

UPDATED CODE ON 11/28/2018 11:46AM. Enabled System Threading for testing. Changed Particle.publish() format for better consumption in Losant. Added System.reset() if cloud or mesh is lost for more than 10 minutes.

Marco code (gateway):

#include "Particle.h"


SYSTEM_THREAD(ENABLED);


bool heartbeat = false;
bool cloudPub = false;

unsigned long beatInterval = 10000;
unsigned long lastBeatTime = 0;
unsigned long beatTimeout = 5000;
unsigned long lastPoloTime = 0;

char knownNodes[10][50];
uint8_t knownNodeCount = 0;
bool reportingNodes[10];
uint16_t nodeReportCount = 0;

bool cloudLost = false;
unsigned long cloudLostTime = 0;
unsigned long cloudResetTimeout = 600000; //10 min = 600000

const char version[] = "MeshMarcoPoloHeartbeat_Marco 0.3";


void setup() {
    Serial.begin(9600);

    pinMode(D7, OUTPUT);
    
    Particle.variable("version", version);
    Particle.publish("Marco-Polo heartbeat test started.");
    Mesh.subscribe("Polo", ProcessBeat);
    
    ResetReportingNodes();
}


void loop() {

    //Send heartbeat collection message every beatInterval.
    if (!heartbeat && ((millis() - lastBeatTime) >= beatInterval)) {
        ResetReportingNodes();
        nodeReportCount = 0;
        
        heartbeat = true;
        lastBeatTime = millis();
        lastPoloTime = lastBeatTime;
        digitalWrite(D7, HIGH);
        if (Mesh.ready()) {
            Mesh.publish("Marco");
        }
    }
    
    //Turn off LED after beat timeout.
    if(heartbeat && ((millis() - lastBeatTime) >= beatTimeout)) {
        heartbeat = false;
        cloudPub = true;
        digitalWrite(D7, LOW);
    }
    
    //Publish collected heartbeat results to cloud.
    if(cloudPub) {
        if (Particle.connected()) {
            char msg[80];
            snprintf(msg, arraySize(msg)-1, "Nodes:%d of %d;Millis:%d", nodeReportCount, knownNodeCount, lastPoloTime - lastBeatTime);
            Particle.publish("MarcoPoloHeartbeat", msg, PRIVATE);
            cloudPub = false;
            
            //TODO: Report which knownNodes did not report.
        }
        else {
            if (!cloudLost) {
                cloudLostTime = millis();
            }
            cloudPub = false;
            cloudLost = true;
        }
    }
    
    //Check for lost cloud. Reset if down for more than cloudResetTimeout (default 10 min).
    if (cloudLost) {
        if (Particle.connected()) {
            cloudLost = false;
        } else {
            if (millis() - cloudLostTime > cloudResetTimeout) {
                System.reset();
            }
        }
    }
}


void ProcessBeat(const char *name, const char *data) {
    //Loop through known nodes array and look for matches.
    for (int i; i < arraySize(knownNodes); i++) {
        //If we get to a blank array slot, record this node there.
        if (strcmp(knownNodes[i],"") == 0) {
            snprintf(knownNodes[i], arraySize(knownNodes[i])-1, data);
            //knownNodes[i] = data;
            reportingNodes[i] = true;
            nodeReportCount++;
            knownNodeCount++;
            lastPoloTime = millis();
            break;
        }
        
        //If we encounter a node already known, just count it.
        if (strcmp(knownNodes[i], data) == 0) {
            nodeReportCount++;
            reportingNodes[i] = true;
            lastPoloTime = millis();
            break;
        }
    }
}

void ResetReportingNodes() {
    for (int i; i < arraySize(reportingNodes); i++) {
        reportingNodes[i] = false;
    }
}

Polo code (nodes):

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

bool heartbeat = false;
bool meshPub = false;

unsigned long lastBeatTime = 0;
unsigned long beatTimeout = 1000;

bool meshLost = false;
unsigned long meshLostTime = 0;
unsigned long meshResetTimeout = 600000;  //10 min = 600000

const char version[] = "MeshMarcoPoloHeartbeat_Polo 0.3";


void setup() {
    Serial.begin(9600);

    pinMode(D7, OUTPUT);
    
    Mesh.subscribe("Marco", ProcessBeat);
}


void loop() {
    
    if (heartbeat && meshPub) {
        if (Mesh.ready()) {
            Mesh.publish("Polo", System.deviceID());
            meshPub = false;
            meshLost = false;
        } else {
            if (!meshLost) {
                meshLostTime = millis();
            }
            meshPub = false;
            meshLost = true;
        }
    }
    
    //Turn off LED after beat timeout.
    if(heartbeat && ((millis() - lastBeatTime) >= beatTimeout)) {
        heartbeat = false;
        digitalWrite(D7, LOW);
    }
    
    //Reset if mesh network is down longer than meshResetTimeout (default 10 min).
    if (meshLost) {
        if (Mesh.ready()) {
            meshLost = false;
        } else {
            if (millis() - meshLostTime > meshResetTimeout) {
                System.reset();
            }
        }
    }
}


void ProcessBeat(const char *name, const char *data) {
    heartbeat = true;
    meshPub = true;
    lastBeatTime = millis();
    digitalWrite(D7,HIGH);
}

Footnote… since this is a multi-national community and colloquialisms might not translate well: Marco-Polo is a game played in a swimming pool. The person that is “it” closes their eyes and calls out “Marco!” All other persons in the pool that are part of the game are obligated to respond with “Polo!”. The “it” person then tries to tag one of the other players while keeping their eyes closed, while honing in on them by repeatedly calling out “Marco!”. A tagged person is either “out” or they become the “it” person depending on who makes the rules.

Update: The response time is generally under 100 ms but I have seen it spike up to 100-150 ms. I assume the time goes up with the number of devices. When doing OTA updates or adding a device to the network, the latency goes way up.


#25

@ScruffR, thanks for the PLATFORM_ID tip. Haven’t used that before. While that seemed like a good idea last night, which devices you are targeting gets a little murky with the Mesh. Last night it was clear, Argon as gateway and Xenon as node. Now, today, playing with a Xenon on an Ethernet Featherwing, all the devices are Xenons. I have to be very careful which device I am targeting with either gateway or node code.


#26

@ninjatill I have your code running on 5 Xenons and a Argon now.

All Xenons are flashing green because they have no Particle Cloud connection but the MESH is working.

I’ll keep an eye on how long all 5 stay connected and reporting back.


#27

I have 2 different Mesh nets up and running all running the Marco-Polo code. One at my house (StarshipPittsburgh 1 Argon, 1 Xenon) and one at my office (StarshipOrlando 3 Xenon). Both networks seem pretty stable. Every once in a while I’ll get a very high round trip time as you see in the screencap. Must be interference. I do have a “smart home”; I use Insteon outlets and switches throughout. Insteon is also a mesh network of sorts but runs at 915MHz both wireless and via the power lines. https://www.insteon.com/technology/


#28

I’m seeing none of the Xenons breathing cyan, it’s a constant flashing Green while the Argon is breathing cyan all the time.

The mesh is staying up fine with one missing the publish events here and there before coming back online.

The Argon is requiring manual resets when the WiFi is dropped and then back available again.

Running your code I’ve seen the Argon and Xenons flash the Red SOS error message after the WiFi comes back online but they restart and reconnect again after that.

In the image below you can see 4 of 5 Xenons are reporting back. So one is not responding a lot of the time.


#29

I haven’t had my devices farther than a couple of feet apart on my desk. So I’m not experiencing any of the dropouts. And all my xenon’s are flashing cyan. When the Gateway drops, they go to flashing green, but they’ve always come back so far. I’ll have to start experimenting with distance.


#30

My Xenons are all 15-20 feet apart.

Spread yours out and see what happens.

I also just found one of the Xenons with the Blue D7 LED locked up an constantly ON. A Manual reset of the Xenon was required to bring it back online.


#31

@peekay123 you say that it is important that SYSTEM_THREAD(ENABLED) is not running.

Is this a general statement for mesh devices or just for your application?

Thanks in advance for shedding some light on this.


#32

@Jseiler, I think he was just pointing out that it was not enabled in his application… not that you have to keep it disabled.


#33

@ninjatill, that is correct. I am not confident of the threading reliability just yet so I disabled it. It would be good for either you or @RWB to report on the stability of your mesh with and without SYSTEM_THREAD enabled.


#34

Thanks @peekay123, @ninjatill.

I got my Christmas presents 2 days ago (alas without the LTE bundle) and setup an Argon as the gateway and two Xenons nodes last night. Argon and one Xenon are still running Tinker and one Xenon running the Adafruit OLED example with SYSTEM_THREAD(ENABLED) disabled . This morning everything was still happy.

I’ll test this setup with SYSTEM_THREAD(ENABLED) active tonight and either heartbeat and/or data moving through the system and let everyone know how this works out.


#35

So stability wise, I had my first big hiccup this morning. The Xenon gateway at my office stop publishing to the cloud. It looked like the heartbeat on the mesh was still being sent and received because the D7 LEDs still kept “beating”. I checked the console and it wasn’t getting any events despite the Xenon breathing cyan as well as all the endpoints. A reset took care of the gateway but one of the endpoints never recovered. A reset, nor a power down, recovered it. It wasn’t until I powered off both endpoints and then reapplied power that they all started breathing Cyan again. I know exactly when the hiccup occurred because I started logging the heartbeat publishes to Losant last night (first experience with Losant and I think it’s pretty good.) It stopped reporting for about an hour before I caught it. The big spikes on the graph are during the reconnect process after recovering all 3 devices on the network. Also to note, I moved one of the endpoints into an adjacent office about an hour ago (11/28/18 11:30AM ish). The endpoint is now about 15 feet away through a partition wall with steel studs and 1/2" drywall and sitting next to a Surface Pro 3 with WiFi active.

I just enabled system threading on my office mesh network and we’ll see how that goes. I also added some code to do a System.reset() if the cloud is lost for 10 minutes. And also on the endpoints, a System.reset() if the mesh is lost for 10 minutes. The Particle.publish() is also reformatted for better consumption in Losant. I’ll update my code in the above post with the latest.

Office:

Home (just setup in Losant this morning):


#36

After manually resetting some Xenons so all 5 of them were reporting back here is what I see about 6 hours later. Only 3 of the Xenons consistently reporting back.

All 5 Xenons flash Green and NEVER have a breathing cyan cloud connected status.

I’m going to run with System Thread Enabled today and see what happens.


#37

@RWB, Try flashing the updated code in my post above. I enabled system threading and added some resets when networks are lost for more than 10 minutes.


#38

I will do that now.

Do I need to set it up as a input in Losant for it to work properly? I’m short on free time today.


#39

Losant is a non issue. You do not need to set anything up there. The payload of the Particle.publish() just looks a little different. It may be better for you because the string is more compact and would fit better in your screen captures.


#40

I have Losant setup and have been using it for a year now with great success just need more time to set it up like you have. Loading your code now.