I have posted my “MarcoPolo” heartbeat testing code in various threads but never in its own. So consider this the official place for updates. The MarcoPolo code is what I’ve been using to test mesh network reliability. It may have real-world applications to verify if remote nodes are alive and responding, but for me, it has been a neat side project to get familiar with the Mesh devices.
I have just created a v0.4.3 which implements an acknowledgement system. With v0.3.x, I was seeing that about 99%+ reliability. That high reliability is probably adequate in most situations. After all, you could probably wait until several heartbeats are missed, for an individual node, before sending some type of alert. I have been contemplating how to get up to 100% and I came up with this acknowledgement system. The v0.4.x code should be backwards compatible with nodes running v0.3.x. However, a node running v0.3.x will not be aware of the acknowledgement system and may create superfluous traffic on the mesh.
I have a more detailed post on my github under Issue #1. I created an “ImplementAck” branch that I will merge into the master branch soon (more of an exercise in github functionality). Here is the general overview of the acknowledgement process:
- A “Marco” event is published from Marco node… now there is event data which includes an unique ID (UID) for the “Marco” attempt, the current retry count (starts at 0, increments by one on each subsequent retry), and the retry interval timeout.
- The Polo node responds to the Marco event exactly as before (with a “Polo” event).
- The Marco node catalogs each response and then sends a “PoloAck” event with the device ID of the Polo node being acknowledged. This step doubles the amount of mesh network traffic.
- The Polo node accepts the PoloAck event and sets a flag so that it will not respond to any subsequent Marco messages with the same UID.
- The Marco node will check if all nodes have reported at the ack.retryInterval. If the number of reporting nodes is less than the number of known nodes, another “Marco” event is published. The UID is kept the same but the ack.retryCount is incremented by one. This step repeats every time the retryInterval is reached and the reporting vs known node counts do not match.
The official repository: