Mesh Network Testing - MarcoPolo Heartbeat Code


#1

I have posted my “MarcoPolo” heartbeat testing code in various threads but never in its own. So consider this the official place for updates. The MarcoPolo code is what I’ve been using to test mesh network reliability. It may have real-world applications to verify if remote nodes are alive and responding, but for me, it has been a neat side project to get familiar with the Mesh devices.

I have just created a v0.4.3 which implements an acknowledgement system. With v0.3.x, I was seeing that about 99%+ reliability. That high reliability is probably adequate in most situations. After all, you could probably wait until several heartbeats are missed, for an individual node, before sending some type of alert. I have been contemplating how to get up to 100% and I came up with this acknowledgement system. The v0.4.x code should be backwards compatible with nodes running v0.3.x. However, a node running v0.3.x will not be aware of the acknowledgement system and may create superfluous traffic on the mesh.

I have a more detailed post on my github under Issue #1. I created an “ImplementAck” branch that I will merge into the master branch soon (more of an exercise in github functionality). Here is the general overview of the acknowledgement process:

  1. A “Marco” event is published from Marco node… now there is event data which includes an unique ID (UID) for the “Marco” attempt, the current retry count (starts at 0, increments by one on each subsequent retry), and the retry interval timeout.
  2. The Polo node responds to the Marco event exactly as before (with a “Polo” event).
  3. The Marco node catalogs each response and then sends a “PoloAck” event with the device ID of the Polo node being acknowledged. This step doubles the amount of mesh network traffic.
  4. The Polo node accepts the PoloAck event and sets a flag so that it will not respond to any subsequent Marco messages with the same UID.
  5. The Marco node will check if all nodes have reported at the ack.retryInterval. If the number of reporting nodes is less than the number of known nodes, another “Marco” event is published. The UID is kept the same but the ack.retryCount is incremented by one. This step repeats every time the retryInterval is reached and the reporting vs known node counts do not match.

The official repository:


Loose connection after flashing application code to xenon
#2

Now that Device OS v0.9.0 is released, it’s time to play with sleep! I just posted MarcoPoloHeartbeat v0.4.8 which implements sleep on the Polo nodes. When enabled, the Polo node will dynamically calculate the number of seconds to sleep in order to wakeup prior to the next heartbeat. There are 2 ways to enable:

  1. On the Marco node, set the MarcoPolo Sleep Select (MPSS) pin HIGH (default is D3). This will enable sleep for every Polo node on the mesh by adding a few parameters onto the Marco event payload.
  2. On the Polo node, set the MarcoPolo Sleep Select (MPSS) pin HIGH (default is D3). This will enable sleep for only a single Polo node. Because the required parameters are not included in the Marco event payload, the beatInterval and preHeartbeatWakupBuffer will need to be hard-coded in the polo.ino code.

I’ve only put this through some preliminary testing and already I’m seeing some inconsistent behavior. For example, if the beatInterval is set at 10 seconds, the Polo node will only sleep about every other heartbeat. On the very first heartbeat with a “sleep” payload, the Polo sleeps and wakes as expected. On the 2nd heartbeat, it will take the Polo node several seconds to actually enter sleep. It then sleeps for the duration calculated prior to the System.sleep() call. But since the system is inducing an unexpected delay, the Polo node does not wake in time to catch the next heartbeat; If acknowledgements are enabled, it may catch one of the subsequent retries. If the beatInterval is set to 20 seconds, then the sleep behavior is much more consistent.

Comments and testing are welcome.


#3

I haven’t tried the MarcoPolo code, so I’m at risk of asking a dumb question:
Is it possible for the Gateway to broadcast the next Wake Time in UTC?
Then the Xenons would calculate their exact Sleep Time upon receipt, mitigating network latency.


#4

@peekay123 mentioned seeing the sleep function taking a few to actually go into sleep mode and achieve low stable uA sleep current. Sounds like currently maybe a minimum sleep time may be required for stable operation??


#5

It’s certainly possible. I hadn’t thought about doing that. I would need to test how accurate the clock synchronization, between Marco and Polo nodes, is when the Polo node constantly sleeps/wakes.


#6

The delay only applies to DEEP SLEEP, not STOP SLEEP. For now the observed difference in current consumption is only 10uA (less in deep sleep). Given all the ways that STOP sleep can be awakened, I would say that 10uA is a small price to pay for that flexibility.


#7

The delays going into sleep I’m experiencing might have something to do with the USB port and handshaking with Windows or serial output. If I power the Xenon from a 5V phone charger, it seems to go to sleep very consistently.

Power changed to phone charger at the blue tick… testing is ongoing…


#8

Good to know.

I’ll be joining these test with you in the next few days.


#9

@ninjatill I have a Boron + 7 Xenons up and running now and they are all looking pretty stable so far which is great news!

I want to setup your code to do some testing of the LTE connection.

Can you provide screen shots of your settings for the Losant workflow so I can setup the same thing on my account and we can share uniform data graphs over time and compare easier?


#10

Sure, Just FYI, I need to publish an update to the Polo sleep code. After re-reading the docs last night I realized I needed to tweak the code after sleep since the unit doesn’t fully reset after a Stop mode sleep. Rather, the device continues with the code… like a delay. I was tweaking the code last night but don’t quite yet have a stable Polo node. The sleep seems to be a bit erratic (works sometimes, stays away, goes to blinking green, etc.)

Here’s the losant workflow exported: https://www.dropbox.com/s/mo5atxi73z430fy/mesh_starshippittsburgh-develop.flow?dl=0

Here’s the flow in diagram form… the string operations just break the string at the “:”.

The device setup:


#11

Excellent. Looks simple enough.

So are you saying don’t use the latest code where you added the sleep functions and wait for you to work out the kinks?


#12

Yes. Don’t use v0.4.8. You can use any of the previous versions.


#13

@ninjatill Since I’m publishing from a Boron LTE device and not Wifi I need to change the Marco publish to Losant interval.

What do you recommend and where exactly should I change the publish interval in the Marco code?


#14

Simply change the beatInterval to a larger value. It’s in millis and the default is 10 seconds = 10000 millis. The beatInterval is basically the reporting interval as well. If you want to change the length of the beat (waiting period to see if all units report) then you would change the beatTimeout.

I’ll add some more code comments to the next version.


#15

Thanks for the tip.

I have it running now.

I reset the Boron and see none of the 7 mesh nodes connecting back automatically. It was working fine before I reset the Boron.


#16

I have a Boron from the preorder but haven’t activated it yet. I will be activating the Boron from the G3CC so I’ll test the Boron eventually. There’s nothing in the MarcoPolo code (v0.4.3 or lower) that should prevent the Polo nodes from reconnecting.


#17

Wonder if it has to do with having 7 nodes?


#18

Got it working.

So what I found out is that after you flash the Boron with the Marco firmware and it restarted running the new firmware all the Xenon’s had to be manually restarted for them to respond to any of the Boron Marco messages it sent out.

If I disconnect the Boron long enough that all the Xenon’s start flashing Green rapidly and then plug the Boron back in then all the Xenon’s will automatically reconnect to the Boron and receive and reply to the Marco code. It took the Xenon’s like 15-20 mins before they started flashing green after I unplugged the Boron.

I’m not sure if the Boron goes offline due to Cellular signal loss and then reconnects to cellular if the Xenon’s will stop replying to the Marco request the same as it does if you flash new firmware to the Boron but we will see over time.

I need to finish setting up the Losant Dashboard. I was able to import your workflow.


#19

I published an update for sleeping Polo nodes. The code has been updated to version v0.5.0 to better reflect the feature addition. I removed all cloud functionality from the Polo node and went with SYSTEM_MODE(SEMI_AUTOMATIC) in pursuit of more consistent sleep behavior. I am seeing that the nodes go to sleep about 50% of the time. I’m not sure if there’s something better I could be doing in code to fix this. I started to call Mesh.disconnect() and Mesh.off() before the sleep call to see if that would sleep more consistently; but I am not waiting for those conditions before calling sleep. Perhaps removing all calls to Serial.println() would help since you can’t really watch the serial terminal with all the wake/sleep cycles.

This is the sleep pattern I’m seeing… it look as though when the device doesn’t sleep, it doesn’t respond to the heartbeat as it should. Very puzzling but I don’t have enough time to really understand the inconsistencies:

I will say in the brief periods where both nodes wake, respond and sleep perfectly, it’s great to see the little coordinated dance take place!


#20

Even with not using the sleep modes I’m only 5-6 of 7 nodes responding on a consistent basis while all Xenons are fairly close together.

Without signal strength it’s hard to tell which nodes are not responding.