2019 priorities for 3rd Gen hardware stability


#1

Hey folks! Our US headquarters is back from our holiday break beginning this morning. We’re in the process of catching up from over the holidays and collecting feedback on v0.8.0-rc.27 which we released before break, and would love to hear from you.

Do you have priorities that you think we should take a look at now that we’re back? Feel free to let us know! We’re excited to continue 2019 where we left off in 2018 – working hard to make our 3rd Generation of hardware the best yet.


#2

My thoughts:

  • Implement sleep modes for all mesh devices
  • Mesh “robustness” - proven failure recovery for Argon, Boron and Xenon (including hw watchdog)
  • Prove reliability of threading with various system modes
  • Mesh management API
  • Mesh topography mapper (using Mesh OpenThread traffic monitoring with Nordic dongle and Nordic mapper?)
  • BLE API
  • NFC API
  • Repeater/sleepy node capability
  • Functional completeness, ie any “stubbed” functions, retained RAM, etc.
  • User access to 2MB external flash (data or for XIP code execution)
  • Hardware debugging in Workbench
  • Revisit RGB signaling for Mesh devices (vs Gen2 signaling)

:smile:


#3

+1 on the mesh management API and hw watchdog.

Also HA networks so I don’t have to write as much redundancy management code myself on the project I’m working on :grin:


#4

+1 To all of the above.

Also, take another look at how OTA updates affect the mesh network. If mesh is stressed, the OTA can fail and/or messages are dropped. Perhaps look at data prioritization on the mesh network if feasible. Can an OTA be flagged as lower priority so that it happens as a “background” transfer? Can data packets be flagged as high priority so they are not dropped in favor of an OTA?

And… look at minimizing OTA data usage impact when rolling out identical firmware to each endpoint.


#5

I agree generally with @peekay123’s list. My personal main priorities would be:

  • Obviously, general stability improvements – intra-Mesh communication should continue to function, regardless of the failure of a gateway or other nodes, or Cloud/WiFi/Cellular availability. Nodes should recovery quickly from various error conditions
  • Sleep modes, sleep modes, sleep modes!
  • More Mesh introspection (RSSI between nodes, topology info, message routing metadata, etc)
  • Boron Cellular API parity with Electron (to the degree that it’s possible, considering system differences)
  • BLE APIs
  • NFC APIs

And based on other threads I’ve read recently, I’d like to add:

  • Ability to set up Xenons / create Mesh network without a gateway device
  • Ability to flag Argon / Boron as non-gateway, to use like a Xenon router/leaf node (just because it’s not practical in most use-cases doesn’t mean it’s not useful)

Lastly, I guess this is really out-of-scope for this particular discussion, and maybe a little pie-in-the-sky, but tangentially related… I think a lot of us here wish we could have two Mesh network gateways in the free tier. I fully understand that Particle has to set business models and make a profit to pay for the Cloud servers and whatnot. But the ability to have two gateways would not only be useful/reassuring for us lowly hobby users, but also for those who are trying to test the Mesh devices for building their own products.


#6

+1 to this. I think having two free gateways would certainly be put to good use.


#7

My suggestion would be to focus on improving the foundation, which is mesh and network connectivity. Particle’s value add is making networking as easy as using a serial port; everything else is icing on the cake.

The more tools Particle can provide to make networking simple and bulletproof, the more inclined I think the dev community will be to figure out workarounds for “minor” problems such as running out of flash, lack of sleep modes, or mesh network visualization.

That being said, there are urgent features and important features. Providing rudimentary access to BLE and NFC seems very important, even if it is not very urgent. Perhaps building an API sufficient for a few PoC examples would be enough to remove the most significant roadblocks.


#8
  • Sleep modes

  • BLE API

  • RGB mode signaling to differentiate mesh connect, cloud connect, etc

J


#9

Sleep Mode
BLE API
2MB flash


#10

I would put sleep mode and robustness at the top of the list.

@dougal’s “mesh introspection” would also be a great help in setting up networks.


#11

For the LTE Boron the following is needed:

  1. Sleep Modes.
  2. Cellular Data Usage in Console so we can see how much data our code structure is using and adjust accordingly.
  3. Access to the 2Mb external flash chip.
  4. Hardware Watchdog on all Particle Devices would be great.
  5. Bring on the Bluetooth. It could be useful for easy device setup from a smartphone when embedded into products.

#12
  1. Overall Mesh stability
  2. (deep) sleep mode(s)

#13

@peekay123 Pretty much covered what I would like to see. I know you will sort the reliability and stability and complete the functionality - it is the vision around Mesh - I would like to understand the roadmap that Particle has for Mesh (whatever you can share). Thanks


#14

I didn’t answer the general feedback portion of the question. I’ve seen much-improved stability on RC27… but not perfect. Good job getting to this point! My heartbeat code generally is at something like 99% reliability. I have the Argon gateway and all 4 xenon nodes sitting on my desk not even 18 inches apart all plugged into my PC and I’m still getting “lost” messages from one endpoint on a regular basis. The heartbeat from the Argon goes out every 10 seconds to which all 4 Xenon endpoints respond with their device ID. The endpoint that doesn’t respond isn’t always the same. In the screenshot of my Losant dashboard, I added blue tick marks on the X axis at all the places I can see where a heartbeat response is missed. Over the 6 hours of this graph, there are 8 missing responses: 6 hrs * 60 minutes/hr * 6 pubs/minute = 2160 total pubs. 2152 responses / 2160 total pubs = 99.6% reliability.

In comparison, I have another mesh network at home with an Argon gateway and a single Xenon endpoint. That network is rock-solid and never misses a heartbeat response:

One possible explanation for the missing messages is that 2 response messages arrive at the exact same time. I wrote the subscribe callback function to record data and exit as quickly as possible so I would think it’s something else. I just don’t know how to diagnose for you Particle experts any further.

However, on my 2-device home network, I have noticed several times that my gateway will go into blinking green and is no longer publishing to the cloud. The Xenon endpoint also goes into blinking green because the gateway’s state. Resetting the Argon gateway will get back to breathing cyan almost immediately. I have left the Argon in blinking green for 2 days and it never self recovered. It required a press of reset to recover and goes to breathing cyan almost immediately. It has done this at least twice on RC27. These nodes are simply inserted in the stock breadboard, powered via USB via a wall charger, 500mAh LiPo attached to LiPo connector, no external sensors or wiring attached and left with the heartbeat code running. I cannot correlate the drops in connectivity to anything environmental, WiFi drops, etc. Seems random. I wouldn’t put it past Verizon FIOS to have a network outage when these drops happen but that’s just speculative.


#15

@ninjatill, I have a 5-node Argon/Xenon mesh running at home. Three nodes are running the latest Marco-Polo with the Argon, another is firing a webhook to fetch weather data and then Mesh pubs the “massaged” data to the last node which displays it on a Waveshare eink display. One of the Marco-polo nodes is 35ft away from the Argon (with an interior wall in between) and both the Argon and that node have Mesh antennas connected. I haven’t got the data going to Losant as you do (didn’t have the time to setup Losant).

I often see one node missing a reporting cycle and I suspect that is due to “collisions” on the UDP socket at the receiving Argon. Since UDP doesn’t retry, the data simply gets lost.

One way around this is a short random delay on the node response. Another is for the Argon to publish an ACK message that the node subscribes to. I would recommend this approach to implement no-fail messaging on the mesh. However, it should only be used for no-fail type data. Again, good mesh planning is key here.

I suggest this discussion be taken to another topic to avoid hijacking of the original topic.


#16
  1. Sleep mode(s) especially for the Boron. I’m impressed so far with Mesh devices low power usage compared to Gen2, but the Boron being the mobility device of choice with a power hungry cellular module, we really need at least one sleep mode that will put that module to sleep, and ideally wake it up by either a pin trigger or a cellular ping just like Electron 3G.

  2. Update particle CLI to manage mesh networks: create, delete, add node/gateway/repeater, check status via serial communication or BLE.


#17

Tools to show the topology of the network. I want to be able to see where my single points of failure are in the mesh.


#18

My personal priorities are for…

  1. Hardware watchdog
  2. 2MB flash memory access
  3. HA networks. I’m likely to have multiple argons in close proximity and would love for them to programmatically use the local mesh comms to get online in case one of their ethernet cables gets unplugged

#19

A post was split to a new topic: Xenons having trouble joining Argon Mesh


#20
  1. Mesh topology/debugger tool in the form of a tutorial, so it can be used when setting up the network. It could be an extension of Tinker, but also a project to be loaded by a first time user. It would help debug setup issues. Examples are the Marco Polo code (@ninjatill) and the Mesh Hello World (@rocksetta).
  2. Fully functional debugging in Workbench… traps, run to, show variables, send values to published functions…
  3. I don’t know if this would fall under the hardware watchdog, but it would be awesome if something like the SparkIntervalTimer functionality (or a subset) is built into the DeviceOS. For Photons and Electrons as well.
    It would simplify coding for applications with IR control, ultrasound range sensors, LoRa…
  4. BLE support. I wish the Gen3 hardware could be the gateway for my BluzDKs, even without revealing the BluzDK on the cloud. At least I can pass data back and forth.