Request: Particle endorsed Semi Automatic, Thread mode test case .ino

@Dave, who should we tag for the firmware sketch?

I’m working on a new firmware release, I was hoping to get it out next week. I’d like to ensure it follows the golden endorsed method as if a device doesn’t go online it gets expensive quickly to do RMA’s.

I’ve already accidentally had two devices that got upgraded past 1.1.1 and then were flashed with the firmware sketch we used back on 1.1.1 …the simple result of that is the devices are effectively bricked as they don’t have the magical combination of wifi.connect and particle connect that >1.1.1 seems to expect from our trial and error.

OK this thread has me intrigued. I’ve been away from the Particle world since 0.8.x was just about becoming a finished thing. At that point it was fairly clear that our application was not very fond of anything later than 0.6.3/4 and amongst other things it was these kinds of modes that had outstanding issues.

My project got revived, relying on 0.6.3 to maintain support seems dangerous and there are functions in 0.8/1.2 that are interesting from a diagnostics point as well as helpful increases in publish size.

However being away that long means all the little outstanding new or perceived issues that people encountered along that path have passed me by. It sounds like some of them are issues that would apply to the kind of things I was looking at back then (SYSTEM_THREAD(ENABLED); poor cellular signal etc etc). I aware that often some of these issues are down to poor implementation when they appear in this forum but they also arise because some the sample code historically breaks rules described elsewhere (using strings in publish instead of char array) or ignores any kind of mitigation (publishing without establishing a connection is there first) or simply too simple.

I agree a more complex example use case that implements more of the advanced features is something many people would find useful.

1 Like

Hi @mterrill,

I will try to set aside some time this week to write something up we can share and run it past the team. I really like this idea of sharing test cases and Particle Device OS Examples that could really help product creators and power users like yourself.

Thanks,
David

6 Likes

There were some changes after 0.7.3 (? going from memory) that required careful use of waitfor on subscribes, but the major connect etc changes and bugs came in after 1.1.1. I noticed the RC for 1.5.0 has the fix for wifi.macaddress(), the sequencing of making that reliable may have fixed some other edge cases.

I’m wanting to move the fleet to 1.4.4 as we have feature updates to our firmware to deploy and all the beta’s have been on 1.4.x, however I’ve observed flaky cloud behaviour so want to make sure it’s rock solid first. Best way is a validated/endorsed/tested/supported sketch.

Hi @Dave, hope your week is travelling well. I’d really ask for the sketch again as it’s holding us up and we’ve been seeing errant cloud behaviour with publish/subscribe in the setup block.

Let me know how we can get this going as I’ve got a new app version out with IFTTT integration, but don’t have the matching device firmware published as I don’t want to dig myself into a hole with devices not going online. At the moment if we took down the fleet I imagine Particle Support would simply tell me to go fish. I also imagined that the sketch would be readily available and something @rickkas7 or you would be copy/pasting from a repo full of test cases or simply typing out from memory.

@avtolstoy do you happen to have a semi automatic system thread enabled sketch that reliably connects to cloud, publishes and subscribes?

Chasing what is the test case for new firmware as we’ve seen instances lately where we had to move a .publish and a .subscribe out of the setup block (it was sitting in a waitfor 15 seconds) into the main loop with a watch for particle.connected. My theory is the cloud is registering instances too slowly occasionally. I presumed there was a test sketch that was used for CI …

Here is the Shell Code I start with. I have tested it reliably on the Boron 2G/3G.

2 Likes

Interesting code, thanks!

I implemented a similar approach with subscribe/publish in the main loop after checking for .connected().

To me it highlights the need for Particle folk to step up and provide code for how they ensure it connects, publishes and subscribes in the most reliable and non blocking way. We’re all in the community simply trying to figure out what’s in the black box of particle cloud and what combination of firmware commands is the best way.

@calebatch, thanks for the code! I noticed you have a potential of two consecutive calls to Particle.connect() in setup():

    if(System.resetReason() == RESET_REASON_PANIC || System.resetReason() == RESET_REASON_PIN_RESET || System.resetReason() == RESET_REASON_WATCHDOG){
        Cellular.off();
        delay(200);
        Cellular.on();
        Particle.connect();   <-- this line can be removed!
    }
    Particle.connect();//ready to connect to the cloud
    waitUntil(Particle.connected);//loops until connected or wdt triggers

Since the second Particle.connect() is unconditional, you can remove the Particle.connect() from the if(...) body as the code will fall through to that connect statement anyway. :wink:

2 Likes

I am behind you all the way here @mterrill!

For example, the fast flashing cyan (ie Particle connection dropped) which kicks off at random times and only solved by reset is a worry…

In lieu of Particle input, am wondering if a COIN project (where “COIN” = Community Of Interest Network) is the way to go here? The group would work together to build a “best practice” minimal application that can be used as test jig and template.

2 Likes

Might be good if @oddwires could chime in on here

2 Likes

Hi @Dave,

Does Particle do long term test running? e.g. have some sort of test rig setup that tests device-os stability over a long period of time?

You would earn a lot of trust with me if you had a long term test rig setup for each of your “production” releases, then logged issues to the Particle Status page when those devices went offline, had to be reset, etc…

1 Like

You are correct, I used to have more code in the if statement to ask my server if I should go into safe mode (and tell me it is doing so) or keep running code.

1 Like

Hi @UMD, a COIN project is a nice idea, and effectively historically how Particle has operated. Which obviously isn’t good enough when a) their code stops working and takes a year for bugs to be resolved (ie mac.address) b) it’s becoming reasonable to now assume that there isn’t actually internal test cases already operating to validate firmware as part of CI or cloud operation. That’s bad news.

Particle needs to provide their test case code for semi automatic, system thread with particle cloud reliably connecting, subscribing and publishing. Community maintained is useless when Particle firmware keeps changing, cloud behaviour changes and bugs are running rampant.

Let’s presume B) is true, the cupboards are empty. We can accept that and move on quickly. A particle employee should put up their hand and admit as much, then thrash out some code with the guru’s in the Particle team and offer it up in a repo for validation.

2 Likes

Let’s tag a few more people. 16 days radio silence with no code offered is a concern and it’s apparent it’s not just the crazy australian who is keen on some official movement.

@zachary CTO
@will community
@avtolstoy as he’s a legend who knows the code well and fixed the most recent 1.4.2
@mdma as the original genius behind all the HAL and low level coding a few leagues beyond my skills

Hi. To get things rolling (it’s awkward that we have to do this…) but here’s a COIN like repo for comment/advancement. I do anticipate we’ll have someone from Particle step up to the plate but maybe this will help them flesh out what we’re trying to validate.

@mterrill, let’s get this show on the road!

Will review particle-skeleton and contribute when I can and hopefully others will also get behind the project.

I do note that a lack of good sample code seems to be a common problem within the embedded controller industry.

1 Like

Hey folks,

Good conversation here, and want to make sure that you know the lack of response from Particle has not been avoidance of the conversation – simply a lot going on internally and didn’t get the ping until you tagged me 16 hours ago :smile:

We have been (re-)developing a sample application model that can act as both a common reference app for evaluating device performance across Device OS versions as well as providing an instructional model for customers who want to know how best to implement thread-safe publishing on the Particle platform.

It’s not quite ready to release yet, but already exists and should get there in the next handful of weeks or so given our current team event and travel schedules. @rickkas7 has been driving development. Our intention is absolutely to make it available to customers and to the community.

7 Likes

The list of examples is still being refined, but there will be several. I’ll be sure to include the features in the original post. There will also be examples using cellular in various modes with various sleep modes, with information about power and data usage trade-offs and of course working code.

10 Likes

Great stuff. If you compile my code you’ll observe that the last publish statements in the setup block don’t actually publish. I did the hacky subscribe / publish in the loop because it seems cloud registration timing and reliability has changed. Our beta code using 1.4.x firmware stopped reliably connecting / publish / subscribe.

While other more rounded examples will certainly help the community, selfishly I’d really love honing in on the core process of wifi connection / cloud connection / subscribe / publish. That behaviour has changed dramatically since 1.1.x with how Particle.connect now needs wifi.connect, and I presume the recent cloud registration issues are due to timing and changes to the backend.

The code I provided is quite stripped down, it has some necessary weight due to test functions but aside from that is quite minimal.

Juggling the sections around to your recommendations or adding recommended recovery / watchdog / waitfor logic should be pretty easy I hope!

For customer context, I’m currently held up by 3 weeks, waiting to release firmware features to my customers after observing the subscribe / publish cloud issues. If they don’t reliably work then we’ll have devices that don’t retrieve their configuration or listen to configurations actively sent to them.