Latest round of testing - 12/21

Well, I spent about 8 hours today testing the mesh products. The first thing I did was to upgrade one Argon and 4 Xenon’s from rc26 to rc27. That seemed to go OK as I first upgraded the Argon, and then brought one one Xenon at a time and upgraded OTA.

After running my standard test code which has the Xenons Mesh.publishing one variable (once a second), and the Argon subscribing and printing out the data, everything started to look positive.

It was at this point that things went very bad and I spent the rest of the day recovering. I’m not sure what I did, but as others have reported the Argon went totally off the deep end. It got to the point that I could put the Argon in listen mode, manually enter the wireless credentials, have it come up normally, and when I rebooted, it immediately went back into listen mode. This lasted for about 2 hours and after trying so many things I can’t even remember, I finally got it working again.

Of course, since it was now a new device, the Xenon’s would no longer boot since I’m guessing they knew they “belonged” to the Argon but the Argon didn’t realize it was still the gateway for the mesh network. I ended up unclaiming the Argon and the 4 Xenon’s and starting over. Then I ran into the “that mesh network already exists”, so I had to create a new mesh network. As I started moving the Argon and the 4 Xenon’s to the new mesh network, the counts started dropping from the old mesh network. As soon as I moved the last Xenon from the old mesh network to the new mesh network, the old mesh network disappeared from the list.

So, once I had a “functional” mesh network again, I started trying to make changes to my program. Of course, as I upgraded the Argon code it kicked all of the Xenon’s off the network. After doing this at least a half dozen times, I never had more than 1 of 4 Xenon’s automatically come back online. For the others, I had to manually reset each of them.

When I tried to update the Xenon’s, each time I tried it timed out. I’m assuming the poor Argon was totally swamped with the 4 Xenon’s and couldn’t process the OTA request. I had to end up powering down the other 3 Xenon’s one at a time to upgrade all 4 devices. For the record, all devices are using SYSTEM_THREAD(ENABLED).

I also noticed something weird when using thestrdup() function. I know that anything String-related is a no-no in Particle land, but I was using it for a short-term test. And yes, I used free() to release the space allocated by the function call in the heap. However, after running for about 3-5 minutes, the Argon would reliably reboot due to an “out of heap” error message. I’m still not sure why it would run out of heap as long as I did free up the memory. I switched to the strdupa() call which allocates the space on the stack and that problem went away.

Even when the Argon crashed due to the heap error, I never saw more than 1 of the 4 Xenon’s come back on their own. They would fast blink green and occasionally blink what appeared to be orange every 10 seconds or so. Once I reset them they all came up fine.

I haven’t seen the SOS7 error yet, but just to be safe I went ahead and put in the waituntil() line as suggested.

I have to agree with another poster that I haven’t been able to work on anything useful since as previously stated, half of the time you are recovering from errors and the other half you’re trying to document the findings so somebody can fix the issues. Once you go beyond one or two mesh clients it becomes really difficult to manage the troubleshooting process.

5 Likes

thanks for the report. reading about experiences and troubleshooting efforts is helpful. :+1:

Hopefully testing can transition from simple stability to more important issues like range testing and network architecture design.

Your experiences are due to the new mesh kit being “bleeding edge”!

If the research, development and community contribution cycle at this stage is too onerous, just wait for stability to flow through…

Witness the steady and ongoing improvements with the photon firmware, which is a stable and well documented system after much community contribution and Particle staff effort.

Out of interest, as my comittments with the Photon are dominating my time, I have not even opened my boxes of mesh gear and will wait for the dust to settle before doing so.

While it might be tedious and frustrating and putting the devices aside can bring some ("quick") relief, sticking with it, will reduce the time till then.
I'm convinced Particle is appreciating the efforts of all the brave and fearless early adopters that put in their time to unveil as many as possible not-yet-found issues as quickly as possible and "complain" about them here in the forum or in form of a well documented GitHub issue :wink:

1 Like