I am trying to write an app for my Photon with a responsive touch interface. I was under the impression (based on the documentation for Spark.connect and Spark.process) that Spark.process was a lightweight call and could safely be called in what, for me, is the inner loop of my UI code.
Unfortunately, I am discovering that Spark.process is, in some circumstances, taking a long time to return, which results in UI stutter. This is undesirable.
If you run this, you can eyeball the amount of time spent in Spark.process (implicitly invoked on each loop iteration) based on the brightness of the onboard LED on D7. What you will see is that while the Photon is going through its Wi-Fi and cloud handshake, ~100% of the time is spend outside of loop. It’s only after the cloud handshake is complete that % of time spent outside of loop comes down to near zero.
I had a quick look at the firmware code, and my best guess is that this is happening due to a number of blocking sends inside Spark_Communication_Loop.
(The behavior is the same in manual made with an explicit call to Spark.process.)
As far as I understand this, it could change now, if the send calls inside Spark.process were non-blocking instead of blocking. Multi-threading across the board is a bigger problem than Spark.process blocking when it’s not supposed to. (Not that I don’t want both.)
And note that I am not talking about Spark.connect, which is a blocking call and documented as such. It’s Spark.process is the one documented to be non-blocking (well, to only block for a few ms) and behaving otherwise.
I am pretty sure that the behavior that I am observing is orders of magnitude worse than what you are describing, though. I realize that my test code is quite crude (eyeballing LED duty cycle) compared to your measurements, but when I see a solid-on LED with a loop that takes ~100ms (as per my code above), that is telling me that my post-loop Spark.process) is running in the ballpark of 50+ ms per call (compared to your 1-15 ms per call).
I guess it’s time to stop being lazy and get some actual measurements here.
Spark.process() services the Spark Cloud connection when it needs service. Spark.connect() connects to the Spark Cloud if you are not currently connected (i.e. you are managing the connection in semi-automatic or manual mode). When the Cloud is connected it requires service at least every 10 seconds and that code is blocking.
One other point worth mentioning is that delay() can also service the Spark Cloud and therefore take a somewhat variable amount of time. It is designed to not interfere as much as possible, but prevent Cloud timeouts which as I said takes 10 seconds.
Although having delay() service the cloud if needed might seem like a strange choice, it actually eliminated a lot of problems since calling delay(30000); was now a Cloud-safe operation.
Yeah, in absence of threads, the decision to service pending Cloud I/O from delay() makes sense. And nobody should be relaying on delay() to provide millisecond accuracy anyway, so if Spark.process introduces jitter of the order of milliseconds per second in calls to delay(), that’s not a big deal.
And to be clear, I am not disagreeing with any decisions about when to service pending cloud I/O. I am only disagreeing with the decision to make the implementation of Spark.process() use blocking sends, because it (in my experience) makes loop() unresponsive to the tune of 100+ ms (give or take; I haven’t yet had the time to measure it precisely).
To give you an idea of how bad this is, I wrote a different app which just PWMs an LED on a digital output pin, and ramps the PWM from 0% to 100% and back down to 0% duty cycle over the span of 2 seconds (so, your basic throbbing/breathing LED indicator with 2-second cycle). This PWM is done completely from loop(), with no calls to delay() — it just loop()s straight through and calls millis() to decide when to increment/decrement PWM duty cycle; which is to say, Spark.process is called as often as humanly possible, thus guaranteeing that the amount of work it has to perform on each call is as small as possible.
And with that code, after Spark.connect has returned (so the only thing that’s happening besides the LED PWMing is the implicit calls to Spark.process), the indicator freezes for seconds at a time as the Photon is connecting to WiFi and cloud.
Which is why I am complaining about this: given the current unpredictable blocking behavior of Spark.process, with delays in 100s of ms they create, it is simply impossible to create a quality user experience on a Photon. You can’t reasonably update any kind of quality UI or register any kind of quality input (such as touch input) when your refresh/sampling rate drops below 10 Hz. For some applications, this is completely immaterial — if you’re reading an environment sensor once a minute and dumping it into the cloud, you will never care about this. But if you need a user-facing device, Spark.process really hurts the way it works today.
The problem is that there are not a lot of little chunks of time required to service the cloud, but one big chunk instead as a packet comes in or needs to go out.
So I think things will go better for you if you think of the Spark cloud service as one big chunk of time taken at least every ten seconds. If that means that you don’t go around loop() very often but instead have your own for/while loop inside of the loop() function, you can “schedule” the time at which the cloud is serviced to be convenient for you when you are doing other tasks or UI functions.
Ok, first of all, I am not disagreeing with your opinion here. What I am saying is that “at least once every ten seconds you UI refresh rate drops below 10 Hz” is pretty much the same as “quality UI is impossible on this device”.
No matter how I slice this, and no matter how hard I try to only call Spark.process around the interactions that my user is having with the device, the only way I see to build a decent experience around this constraint is to simply a. prevent the user from interacting with the touchscreen while I am making any network calls (because touch experience will be bad at 10 Hz) and b. prevent Spark networking code from being called while the user is interacting with the touch screen (by setting the device in manual mode).
Which I can do. I can set the Photon in manual mode, and lock the user out of the touchscreen while I am doing things with the network.
But that’s a pretty crappy user experience, and I am not convinced that it has to be this way.
I get it, right now the code consumes big chunks of time to service the cloud. Why? I looked at the firmware code, and there are blocking sends all over the place in Spark.process. Why? I don’t see anything about that code that indicates it has to be written that way, although I certainly do understand that blocking code is faster to write and easier to read.
However, as far as I understand the architecture here, the choice to use blocking sends inside Spark.process is not dictated by the architecture, and the point I am trying to make is that it’s a choice that’s incompatible with quality in user experience.
So if I am wrong, and the Photon architecture somehow forces the firmware to use blocking sends in Spark.process, then I would like to know about it, so that I can abandon Photon in favor of some architecture that serves my needs better; but if I am right, then I would like to have this behavior acknowledged as faulty, and hopefully see it fixed.
I think the Particle team would like to make cloud functions non-blocking too, but it is a big rock to lift right now and there are lots of other rocks. This was impossible on the Core but much more reasonable on the Photon. If you want have a discussion with the team about your ideas, I urge you to file a github issue and explain it there. I am sure pull-requests are welcome too if you are up for it.
There are a lot of other options you could consider and first on my list would be a dedicated processor for the touch screen like this one, the STMPE610 but a PIC or ATTiny etc. would also work.
I am using a cap touch sensor (an FT6206) on an Adafruit breakout, actually. I’m using it over I2C, so the sensor output is unbuffered, and therefore when Spark.process hits me and drops my loop rate to 10 Hz, my touch sensor quality goes through the floor.
You are right; putting a 2nd processor in here would solve this problem. (It would also solve a bunch of other problems that I am having, like the flaky I2C on the Photon, or the fact that my Photon seems to crash while being woken from sleep by an external interrupt.)
I have seriously considered that — so thanks for the reminder that I should revisit that option. If I can find an I/O protocol that works reliably with Photon firmware (so, not I2C… is SPI robust on current firmware?), I could just drop in a well-behaved MCU with a more mature ecosystem, use that for everything except for network connectivity, and use the Photon only for its WiFi/cloud capability. Which seems like a massive overkill, but it is what it is.
I perfectly see your problem with the "blocking" nature of the cloud stuff happening between loop() iterations or when calling Spark.process(), but I'm not quite sure what you mean with above statement.
AFAIK you won't need to be worried about thread syncing your application code (unless you want to activly adopt it), since (at least to start with) the only two threads will be your app thread and the cloud background thread - unfortunately I've no idea of ETA I'm waiting for it too.
But another question jumps to mind. Will you actually need the cloud functionality or do you just need WiFi (with TCP/UDP)?
This might be a bit speedier.
On the other hand the amount of times you see this issue seems odd. I do seem to have such a problem only when having "bad" WiFi reception.
I suspect you misunderstood me. I meant "I do want both", meaning I want threading and I want Spark.process to be (as documented) a lightweight call.
I do actually use cloud capability; I have two Protons communicating with each other through publish/subscribe.
The reason that I see this issue more than you do is that I am putting the Photon to sleep to minimize power consumption. When the user wakes it (by interacting with the touchscreen), I encounter problems due to Spark.process delays. If your WiFi is solid and you never let your Photon disconnect from the cloud, you will not see this as often.
I obviously focused too much on the term “Multi-threading […] is a bigger problem”
But with MT the need to call Spark.process() and the time loss for reconnect should fall away or become “unnoticable” - eventually making things behave (more) as the docs already suggest.
I don’t know how close to production you already are, but if you can stick with the Photon as is (with some clunky workarounds) for the time being and can get on with your dev on some other end, your problems might be solved by Particle - but again, unfortunately no ETA yet.
I agree that my tests were done under near ideal network conditions, with a router in the same room, and a reliable high speed ISP connecting me to Particle's servers. I'm sure that the calls to Spark.process() could block for much longer, as you are seeing.
Basically this means to me that you can't use the Spark cloud ecosystem if your application needs to meet some sort of minimum latency requirement. In your case @airbornemint, it is about UI responsiveness, in my case it is about a requirement to respond in so many milliseconds to a complex set of sensor inputs.
Below is my code for doing these measurements. This was meant to be throwaway code, and isn't top quality. Perhaps you can mine something useful out of it.
I completely gave up on using the Particle cloud IDE because of the extremely regrettable decision by Particle to use a library format that is completely incompatible with the existing Arduino library format (which I understand, as they Arduino came up with this format after Particle web IDE was released) and the further regrettable decision to ignore this problem for a year now.
If you want to be the maintainer of a contributed FT2606 Particle library, you can start with the Adafruit library and my changes and then do all the shenanigans to get around Particle’s bad library format choices.
You don’t have to use the RSTN (reset) and INT (interrupt) pins, but you may want to do it anyway, depending on how you are using the sensor.
In that case, I’d like to encourage better transparency in the form of following up on GitHub issues that are actively being worked on (minimally, to assign them to a milestone). Thanks!