Is Particle Mesh ready for Prime Time?

Hello everyone! I am new to the community and our first order of argon/xenon is in the mail. We are currently using ESP8266 with painless mesh software and I ran across the “Enterprise Quality mesh” claim that Particle made. After reading all the sales lit we decided that this would be a great improvement over the platform being used now which is buggy at best. The first order of samples was made and I downloaded the workbench and preceded to read this forum to get a head start on what was required. The more I read the more I realized that the sales hype was not being met by actual performance. As of December I noted that many peoples performance was touch & go at best. Even within the last month I read a post from Particle not to deploy commercially yet. What I would like to know is how people are fairing currently with the platform. Is it still to unreliable to deploy in a commercial environment?

Here are our requirements in more detail so that we may be guided better in the proper direction. First off we do not need any cloud access, only repeaters(xenon) and the master (Argon), we just need to send simple messages from the master via serial to specific slaves as well as broadcast to all. There will be from 30-100 slaves in a system fairly closely spaced, mostly within one large room but sometimes will be spread between a few rooms. Messages can not be lost! Any program examples that can be provided that match the way we intend to use the network will also be greatly appreciated.

Obviously there is going to be a huge learning curve to switch platforms as well a new prototypes will need to be fabricated. In the end if there is no improvement in reliability obviously this is the wrong path for us. I thank the community in advance for any/all of your honest advise.

J

It’s a bit difficult to answer your question beyond broad stroke meaningless generalities…

The esp8266 has lots of reliability issues, usually they have to do with lack of memory or threading issues. But with care, it can be quite reliable.

The esp32 improves on all these aspects and there’s a ton of information out there and a ton of code. It looked to me like the painless mesh stuff works for it too, but I didn’t look very far.

But it’s really not clear what the reliability issues are that you’re encountering. Any self-reconfiguring mesh will loose messages, sometimes massively, 'cause it may be broken when you send a message. Hell, your message may well be trapped on a node that is just dropping off, perhaps due to low battery or a crash. So you need some reliability layer according to your description. Given the nature of the beast, you have to think carefully about what reliability is cost effective and what you do when messages just can’t be delivered.

I still see a lot of software movement in the particle world, the Mesh device OS software tree was just merged with the older generation and there are very tight constraints around the provisioning and deployment models. It’s not like you can just #define the SSID/PWD of your network in the code and flash N units and then power them up. Also beyond a couple of small dev network deployments there is a monthly per-unit cost. You get something for that money, but whether it has value to you is a different question.

There are also other issues, such as power consumption if you have anything running on batteries and the security levels are different too…

tve, Thank you for you insights in this matter. Most of what you brought out has been swirling around my head since I started the project. I may need to drop back and just let all remotes comm with a master and forget about hopping. To begin with I figured that using mesh would eliminate any distance problems as no node should be beyond 25 feet of another but from end to end there could be 100+ feet. Do able with a few well placed ap’s but I want the user experience to be completely config free. Right now what I have works pretty well but will fall apart from time to time and not recover without manual reboots and 10 mins to settle down plus in some environments it does not work at all even when I use a clean wifi channel. (cleared out 3 channels on both sides even). I could see eventually working through the issues but when I read about the Particle stuff and it’s library compatibility with Arduino it peaked my interest. On the cost side it was my impression if I don’t use the cloud there would not be a per unit cost and these parts will comm with each other without online reg, am I wrong about that? None will be on battery at present so that will not be an issue. Again, Thankx again for your comments. J

Hi,
I think there is still a bit of instability with the mesh nodes and for that, few people would recommend you deploy something commercial today.

However, fixes are being made and stability is being worked on, so if I were you I would:

  • start my ramp up with the Particle mesh nodes
  • check this community often (and subscribe to this thread, where new firmware releases are announced)

I believe there are possible workarounds for some of the stability issues you may hit.
Perhaps when you are fully ramped up and hopefully loving the Particle Cloud, all the stability issues have been addressed already.

Cheers,
Gustavo.

1 Like

based on feedback in the forums the devices are certainly more stable now than in december - feb 90 day range. a large part of how the mesh will work is what type of environment you place the network as far as layout of devices as well as any possible interfering electrical noise. reading the forum and the device os issues on github on a regular basis will help. any platform you choose though probably the surest way to determine if it will work for you is to setup a typical test case network in the worst conditions you might encounter not the best, imo.

I have to ask the same question as the the OP @jmayes - while I am not using the mesh component at this stage, I am building a product that has a target reach of 18K remote devices. As you all will appreciate - a fleet this size has to be stable and manageable. I have two devices on the bench, testing the initial code block to get to a stable state before we move on and I have yet been able to get a 24hr session to remain connected without manual intervention - I left this specific device (XENON + ETH WING) on from 15:13 yesterday and now, 15 minutes ago - all of these errors below appeared on the usb serial port and the device is solid blue. (The device app listens on Serial1 for messages from an industrial control panel at 19200 bps, and then publishes this message to an MQTT broker, and outputs a simple activity trail on the USB serial connection and has a 2 x16LCD to show time and comms status.)

0000123504 [app] WARN: Panel communications timeout
got event 536964888 with value 32
got event 536964888 with value 32
got event 536964888 with value 32
0051327229 [comm.dtls] WARN: mbedtls_ssl_write returned ffffffff
0051327237 [comm.protocol] ERROR: Event loop error 3
0051327241 [system] WARN: Communication loop error, closing cloud socket
got event 536964888 with value 32
got event 536964888 with value 64
got event 536964888 with value 64
got event 536964888 with val0051327355 [system] ERROR: Failed to load session data from persistent storage
got event 536964888 with value 32
got event 536964888 with value 32
got event 536964888 with value 32
got event 536964888 with value 32
got event 536964888 with value 32
got event 536964888 with value 32
0051337477 [comm.dtls] ERROR: handshake failed -6800
0051337484 [comm.protocol.handshake] ERROR: handshake failed with code 17
0051337490 [system] WARN: Cloud handshake failed, code=17
got event 536964888 with value 64
got event 536964888005 with value 64
1338550 [system] ERROR: Failed to load session data from persistent storage
0051340450 [comm.protocol] ERROR: Channel failed to send message with error-code <0>
got event 536964888 with value 64
got event 536964888 with value 16384
got event 536964888 with value 32
0051342739 [comm.dtls] WARN: mbedtls_ssl_write returned ffffffff
0051342747 [comm.protocol] ERROR: Channel failed to send message with error-code <20>
0051342754 [comm.protocol] ERROR: Event loop error 20
0051342759 [system] WARN: Communication loop error, closing cloud socket
got event 536964888 with value 32
got event 536964888 with value 64
got event 536964888 with value 64
got event 536964888 w00513ith value 64
42870 [system] ERROR: Failed to load session data from persistent storage
0051344770 [comm.protocol] ERROR: Channel failed to send message with error-code <0>
got event 536964888 with value 64
got event 536964888 with value 16384
got event 536964888 with value 32
0051358671 [comm.dtls] WARN: mbedtls_ssl_write returned ffffffff
0051358681 [comm.protocol] ERROR: Event loop error 3
0051358687 [system] WARN: Communication loop error, closing cloud socket
got event 536964888 with value 64
got event 536964888 with value 64
got event 536964888 with value 32
got event 536900564888 with value 64
1358799 [system] ERROR: Failed to load session data from persistent storage
0051368271 [comm.dtls] ERROR: handshake failed -6800
0051368277 [comm.protocol.handshake] ERROR: handshake failed with code 17
0051368284 [system] WARN: Cloud handshake failed, code=17
got event 536964888 with value 64
got event 536960051369341 [syste4888 with value 6m] ERROR: Failed to load se4
ssion data from persistent storage
0051371239 [comm.protocol] ERROR: Channel failed to send message with error-code <0>
got event 536964888 with value 64
got event 536964888 with value 16384
got event 536964888 with value 32
0051390727 [comm.dtls] WARN: mbedtls_ssl_write returned ffffffff
0051390735 [comm.protocol] ERROR: Event loop error 3
0051390741 [system] WARN: Communication loop error, closing cloud socket
got event 536964888 with value 64
got event 536964888 with value 64
got event 536964888 with005 value 32
got ev1390853 [system] ERROR: ent 536964888 with value 64
Failed to load session data from persistent storage

the first line is from my app

0000123504 [app] WARN: Panel communications timeout

Then the rest is generated by some subsystem I have no control over. I can understand the [comm.xxx] messages as being from the cloud connection firmware. My app does not generate an messages such as “got event 53xxxxxxx with value xx”, my app only uses Log.info to output the audit messages. I am yet to find and explanation of the “got event” messages anywhere in the docs.

My app has the following in place for debugging

SYSTEM_THREAD(ENABLED);

/*
  DEBUG STUFF
*/

ApplicationWatchdog wd(30000, System.reset);

ParticleWebLog particleWebLog("log", LOG_LEVEL_WARN, // Logging level for non-application messages
{ 
    { "app", LOG_LEVEL_WARN} // Logging level for application messages
});

SerialLogHandler logHandler(LOG_LEVEL_WARN, // Logging level for non-application messages
{ 
    { "app", LOG_LEVEL_ALL } // Logging level for application messages
});

The device console is showing “breathing cycan” when in fact it is solid cycan and my app is “frozen” - I have an LCD display attached with a simple time of day display and flashing colon, so it is easy to see its frozen.

The last vitals are (and its ironic that the remedy is to reposition the device - it is connected to 100mb/s fibre internet link and is not using mesh functions)

image

The defined watchdog has not fired and recovered the device and I am unable to remotely flash the device or get it back in any way other than a manual intervention. So there is clearly still a way to go to get this product ready for prime time. Much as I love Particle and its offerings - this is a show stopper.

1 Like

Thank you to everyone for your wisdom and comments, in addition to the probable reliability problems I have found indeed another show stopper in that I now understand that the devices can not be initialized without the cloud and will count against the 100 piece limit even if being used for local mesh only. I don’t know if they can be unregistered and only incur a one month fee (when over 100) but either way not being able to set them up without their servers is a non-starter for me. I see that Particle “may” allow this at some point with their comments in the forums but I suspect the business model is more to sell cloud services per month then just sell parts which unless they permit them to work without the cloud will greatly reduce hardware sales, me for one will have to go else where even if the issues are resolved. For now I have some parts coming, I will do the initial testing and continue to watch the progress but for this project I must go another direction. I had high hopes especially since my existing code could have been dropped in with little modification. Thankx again all the help, J

not sure at all, but, not even flashing them via USB would work in this case?

I agree. I really like that they're open-source, but I remain hesitant to trust anything that requires a company cloud (see Works with Nest). They also used to provide the source for the cloud server, but have stopped. There is GitHub - Brewskey/spark-server: An API compatible open source server for interacting with devices speaking the spark-protocol although I haven't tested it yet myself.

There's a few keys burned in to the devices that aren't accessible, but otherwise I'm pretty sure you can do whatever you like with then, so you could use device-os as a template to build your own OpenThread network if you wanted. Some of the hardware is open-hardware as well so you could also build your own devices, or try one of the other nRF52840 devices. Even if particle isn't the right solution for you it's possible an OpenThread mesh still might be.

So - 7 days and a new firmware release later - I am much more confident in the stability of the product right now, my lab devices have run for 24hrs with no issues and have recovered on their own from cloud and network problems while still reporting correctly as expected.
There is a learning curve (as with any new product - I am also changing platforms for my production product) and my issues have been largely based on moving from a single threaded MCU platform to this which (for best performance) runs as a dual thread RTOS. Other complications I have found is that there are libraries that can be selected that are not thread aware and lead you down the rabbit hole now and then. On the whole now looking much more positive and customer testing will start soon !

5 Likes