Mesh Network Testing - MarcoPolo Heartbeat Code


#1

I have posted my “MarcoPolo” heartbeat testing code in various threads but never in its own. So consider this the official place for updates. The MarcoPolo code is what I’ve been using to test mesh network reliability. It may have real-world applications to verify if remote nodes are alive and responding, but for me, it has been a neat side project to get familiar with the Mesh devices.

I have just created a v0.4.3 which implements an acknowledgement system. With v0.3.x, I was seeing that about 99%+ reliability. That high reliability is probably adequate in most situations. After all, you could probably wait until several heartbeats are missed, for an individual node, before sending some type of alert. I have been contemplating how to get up to 100% and I came up with this acknowledgement system. The v0.4.x code should be backwards compatible with nodes running v0.3.x. However, a node running v0.3.x will not be aware of the acknowledgement system and may create superfluous traffic on the mesh.

I have a more detailed post on my github under Issue #1. I created an “ImplementAck” branch that I will merge into the master branch soon (more of an exercise in github functionality). Here is the general overview of the acknowledgement process:

  1. A “Marco” event is published from Marco node… now there is event data which includes an unique ID (UID) for the “Marco” attempt, the current retry count (starts at 0, increments by one on each subsequent retry), and the retry interval timeout.
  2. The Polo node responds to the Marco event exactly as before (with a “Polo” event).
  3. The Marco node catalogs each response and then sends a “PoloAck” event with the device ID of the Polo node being acknowledged. This step doubles the amount of mesh network traffic.
  4. The Polo node accepts the PoloAck event and sets a flag so that it will not respond to any subsequent Marco messages with the same UID.
  5. The Marco node will check if all nodes have reported at the ack.retryInterval. If the number of reporting nodes is less than the number of known nodes, another “Marco” event is published. The UID is kept the same but the ack.retryCount is incremented by one. This step repeats every time the retryInterval is reached and the reporting vs known node counts do not match.

The official repository:


#2

Now that Device OS v0.9.0 is released, it’s time to play with sleep! I just posted MarcoPoloHeartbeat v0.4.8 which implements sleep on the Polo nodes. When enabled, the Polo node will dynamically calculate the number of seconds to sleep in order to wakeup prior to the next heartbeat. There are 2 ways to enable:

  1. On the Marco node, set the MarcoPolo Sleep Select (MPSS) pin HIGH (default is D3). This will enable sleep for every Polo node on the mesh by adding a few parameters onto the Marco event payload.
  2. On the Polo node, set the MarcoPolo Sleep Select (MPSS) pin HIGH (default is D3). This will enable sleep for only a single Polo node. Because the required parameters are not included in the Marco event payload, the beatInterval and preHeartbeatWakupBuffer will need to be hard-coded in the polo.ino code.

I’ve only put this through some preliminary testing and already I’m seeing some inconsistent behavior. For example, if the beatInterval is set at 10 seconds, the Polo node will only sleep about every other heartbeat. On the very first heartbeat with a “sleep” payload, the Polo sleeps and wakes as expected. On the 2nd heartbeat, it will take the Polo node several seconds to actually enter sleep. It then sleeps for the duration calculated prior to the System.sleep() call. But since the system is inducing an unexpected delay, the Polo node does not wake in time to catch the next heartbeat; If acknowledgements are enabled, it may catch one of the subsequent retries. If the beatInterval is set to 20 seconds, then the sleep behavior is much more consistent.

Comments and testing are welcome.


#3

I haven’t tried the MarcoPolo code, so I’m at risk of asking a dumb question:
Is it possible for the Gateway to broadcast the next Wake Time in UTC?
Then the Xenons would calculate their exact Sleep Time upon receipt, mitigating network latency.


#4

@peekay123 mentioned seeing the sleep function taking a few to actually go into sleep mode and achieve low stable uA sleep current. Sounds like currently maybe a minimum sleep time may be required for stable operation??


#5

It’s certainly possible. I hadn’t thought about doing that. I would need to test how accurate the clock synchronization, between Marco and Polo nodes, is when the Polo node constantly sleeps/wakes.


#6

The delay only applies to DEEP SLEEP, not STOP SLEEP. For now the observed difference in current consumption is only 10uA (less in deep sleep). Given all the ways that STOP sleep can be awakened, I would say that 10uA is a small price to pay for that flexibility.