Achieving acceptable functionality (wrt Cloud blocking / reconnection)


#1

Hi.

Well as luck would have it, our prototype product was recently demo’d at a large tradeshow where a serious flaw was found. :frowning:

So, background:
My client has an existing, non-IoT, product that is a real-time control system with button and LCD UI.
We are currently engineering the “premium” next version of that product with the substantial upgrade feature being IoT. The new product is built on the P1 platform.

When the product was at the tradeshow and for the first time away from a development workstation, it began suffering pretty bad issues where it basically became unresponsive.

Some research on the forums and testing to confirm, shows that I am having the well-known issues with the P1 being basically dead when trying to reconnect to the Cloud. I replicated this quite simply by turning of the WiFi router it was connected to and the button interface went from super responsive to basically not working.

So I’ve seen on the forums and on the issues tracker on GitHub plenty of people discussing workarounds and some ideas on how to get acceptable functionality in this situation. Most of the suggestions aren’t going to be usable by us for 2 reasons:

  1. The device is going to be installed outside, so poor signal strength is not only possible but will be probable; and
  2. Most of the suggestions discuss ways to recover when things have been in this locked state for a period of time, but as a real-time control system where the user is expecting to push a button and have something work immediately, that “period of time” is unacceptable.

So the only idea that seems to look promising to me was floated in this thread, and it was to extract basically all of my control system functionality into a completely separate thread and leave all Particle networking/cloud functionality behind in its own thread.

My concerns with that are that:

  1. I’m already running SYSTEM_THREAD(ENABLED) (albeit there are some Publish calls in my application thread as it stands right now) and my system is still pretty much dead; and
  2. I saw a comment on GitHub made by @ScruffR that indicated that when Particle.connect is running other threads will have their performance severely affected; and
  3. I saw a comment (that I can’t locate right now) that seemed to imply that “millis” time keeping might be affected during this blocking behaviour.

So, with all this on-board, does anyone have any comments or advice before I try this?

I have to say that based on what I’ve presented above that I’m not hopeful that it would work anyhow, and I’m wondering if I need to go to my client and tell them we may need to abandon Particle as a platform before they start placing orders for thousands of P1s.

Thanks if you’ve read this far. I hope you have some words of wisdom.


#2

Particle.publish is likely the source of most of your blocking issues in SYSTEM_THREAD(ENABLED) mode.

There are a number of techniques in that post of things you can do to mitigate the blocking.

However, in order to make the most responsive UI, I’d handle that from a thread, with a few caveats:

  • If you use a shared resource like I2C or SPI, you can use it from a thread, but you need to either never use it from the main thread, or add guards to prevent both your thread and the loop thread from using it at the same time.
  • Serial is fine from a thread.
  • Do not make any Particle calls from your UI thread, otherwise you’ll be back into the situation where blocking occurs,

This post explains some more of the caveats of using threads:

With system thread enabled your thread should run almost perfectly at 1000 times per second, even when connecting to the cloud at the same time.

There are a few situations in 0.7.0 where the networking code blocks. These should be fixed in 0.8.0 which should be released soon.


#3

Hi Rick.

Thanks, as always, for your speedy reply.

So I just removed my Publish calls as a test (:man_facepalming: not sure why I didn’t at least try that test before my post) and from my extremely brief test it did appear that the UI was back to normal.

Thank you for both of those links. I have read through them previously and they are both very informative.

To the best of your knowledge then, the rest of my concerns about pursuing the separate control logic thread design are not going to be an issue? Embedded threading design and usage aside as I have experience with that, I’m more concerned about my code’s interaction with the rest of the P1 system OS.


#4

Subject to all of the restrictions on threads I think that is the best solution to assure a very responsive UI even when having difficulties connecting to the cloud.

You can get close by avoid Particle.publish when not connected to the cloud, but it still won’t be quite as responsive as using a thread, though easier to program, so if the response time is not critical, this is easier.

But it sounds like you’re in the first situation to me.


#5

Thanks Rick. With all of the posting on this subject it has been difficult to determine what is still a current issue and what is historical, but your comments are helpful and a relief.

If I can clarify 2 points:

  1. You say to avoid “Particle calls” from my control thread, are you using the word Particle there to refer to functions from the Particle class/namespace, or just general “network-y” functions; and
  2. If I extract my control code out to its own thread, it seems to me that there is really not any point in keeping SYSTEM_THREAD(ENABLED) anymore, would you agree or am I missing something?

#6

You say to avoid “Particle calls” from my control thread, are you using the word Particle there to refer to functions from the Particle class/namespace, or just general “network-y” functions; and

I meant Particle.publish, Particle.connect, etc.

I would avoid using TCP or UDP from a thread you want to be responsive in 0.7.x. TCP connect is always blocking so you should avoid that in all system versions, however in 0.7.x other calls were inadvertently made blocking on the Photon and P1. This is fixed in 0.8.0, I think.

If I extract my control code out to its own thread, it seems to me that there is really not any point in keeping SYSTEM_THREAD(ENABLED) anymore, would you agree or am I missing something?

It could go either way. I’d probably leave it turned on unless you are encountering a specific problem with it.

Rick


#7

Thanks Rick.