Network status calls and SYSTEM_THREAD issues

TL;DR - Holding the SETUP button for 3 seconds causes my Photon to stop breathing when in manual mode.

With all the recent movement around user-friendly wifi configuration apps, I’ve been looking into how the process can be made even friendlier with feedback and assistance from the firmware (and not just the wifi setup app).

As an example of why I want to do this, consider one shortcoming of the setup process: when incorrect credentials are given, the Photon seems to get jammed with the only recourse being a reset. The setup app has to guess at what is going on with the process and no reliable means to know what failed, just some heuristics. Thanks to the awesome and very recent threading support, the firmware should not only be able to tell what happened but also act on it, by giving the user feedback along the way and perhaps even recovering to a good state.

My goal was to create some test firmware to play around with the setup process and see what state changes the firmware could detect to realistically be of assistance in guiding a user through the process.

To that end, I created a small application that monitors all of the connection-related boolean state values from the WiFi and Particle system libraries for changes. This works like a charm in detecting everything up to a cloud connection (assuming valid credentials are present), but when the SETUP button is held down for 3 seconds, the LED never starts to blink blue and in fact, completely stops breathing. The last bit of output I see indicates that listening mode has been entered, however:

I don’t see a Photon-XXXX network show up and the device is no longer connected to the cloud according to “particle list”. This happens whether or not I have SYSTEM_THREAD(ENABLED);. I haven’t tested in AUTOMATIC mode yet.

Does anyone know what the intended behavior of the SETUP button is when used to enter listening mode with SYSTEM_MODE(MANUAL);?

I fixed a bug in my own code that was causing this to trip up, but now I’m running into an issue that’s definitely threading-related. In my posted code, I’m avoiding calling WiFi.hasCredentials() while attempting to connect to the Particle cloud because I’d previously noticed it getting stuck there. Now, it seems that even when a cloud connection is already established, calling WiFi.hasCredentials() is causing issues.

The problem exists at two levels:

  1. When Particle.connect() has been called, WiFi.hasCredentials() seems to get stuck.
  2. When avoiding the issue in 1, the Particle cloud connection keeps dropping off so long as WiFi.hasCredentials() is getting called.

Here’s a fixed, minimized version of the code above that exhibits both problems depending on what lines you comment out. My hunch is that it has to do with SYSTEM_THREAD_CONTEXT_SYNC_CALL_RESULT here, which none of the other boolean status flags require the use of:

These relatively minor problems aside, some serious props for your engineering effort to make threading work, @mdma, especially given how much of this code is Particle homebrew.

1 Like

Currently the hasCredentials() call is a synchronous call on the system thread since we want to avoid data races (the test for credentials examines shared data in flash so needs to be done only on a single thread or guarded with a mutex.)

There are two approaches to making hasCredentials() non blocking:

  • use a mutex and allow hasCredentials() to execute on any thread. This will mean the application can execute hasCredentials() so long as the system isn’t also modifying wifi data (which it isn’t most of the time.)

  • use a flag that is updated by the system to indicate if WiFi credentials are present. The main concern here is what is the value of the flag before the system has determined if WiFi credentials are present? To avoid that tricky situation, we could force the system to check for WiFi credentials before starting the application thread.

That’s my current thoughts on this - open to comments and suggestions!

So the deeper issue is that both threads are accessing non-reentrant flash driver code? It’s not faulting or SOS-ing, but without mutexes it can’t be getting deadlocked either… (this is more to sate my own curiosity)

Approach 1 involving mutexes sounds preferable, as long as calling hasCredentials() will eventually return once the system thread is done modifying that data.

1 Like

I believe I’ve fixed the issues you were seeing - there were two problems:

  1. A busy application thread would prevent the system thread from executing - this was due to a change in thread priorities (which itself was a workaround to another bug that’s now fixed.)

  2. An application that continually made calls to the system, such as the call to WiFi.hasCredentials() would prevent the system thread from running the background loop. This has also been fixed.

The code will be released in 0.4.7 next week. Eager followers can build directly from the PR at https://github.com/spark/firmware/pull/699 (feature/multithreading-feedback branch)

3 Likes