Spark.process() blocks for extended periods of time, contrary to documentation

airbornemint · July 27, 2015, 10:13pm

I am trying to write an app for my Photon with a responsive touch interface. I was under the impression (based on the documentation for Spark.connect and Spark.process) that Spark.process was a lightweight call and could safely be called in what, for me, is the inner loop of my UI code.

Unfortunately, I am discovering that Spark.process is, in some circumstances, taking a long time to return, which results in UI stutter. This is undesirable.

I narrowed this down to the following test code:

SYSTEM_MODE(SEMI_AUTOMATIC);

void setup() {
    pinMode(D7, OUTPUT);
    digitalWrite(D7, LOW);
    Spark.connect();
}

void loop() {
    digitalWrite(D7, LOW);
    delay(100);
    digitalWrite(D7, HIGH);
}

If you run this, you can eyeball the amount of time spent in Spark.process (implicitly invoked on each loop iteration) based on the brightness of the onboard LED on D7. What you will see is that while the Photon is going through its Wi-Fi and cloud handshake, ~100% of the time is spend outside of loop. It’s only after the cloud handshake is complete that % of time spent outside of loop comes down to near zero.

I had a quick look at the firmware code, and my best guess is that this is happening due to a number of blocking sends inside Spark_Communication_Loop.

(The behavior is the same in manual made with an explicit call to Spark.process.)

kennethlimcp · July 28, 2015, 12:01am

This is correct and will only change when multi-threading is introduced for the Photon. For now, a connection to the Wifi/Cloud is a blocking call.

airbornemint · July 28, 2015, 12:07am

As far as I understand this, it could change now, if the send calls inside Spark.process were non-blocking instead of blocking. Multi-threading across the board is a bigger problem than Spark.process blocking when it’s not supposed to. (Not that I don’t want both.)

And note that I am not talking about Spark.connect, which is a blocking call and documented as such. It’s Spark.process is the one documented to be non-blocking (well, to only block for a few ms) and behaving otherwise.

rvnash · July 28, 2015, 1:01am

See my post here for some numbers on this.

airbornemint · July 28, 2015, 1:33am

This is interesting — thanks for sharing it.

I am pretty sure that the behavior that I am observing is orders of magnitude worse than what you are describing, though. I realize that my test code is quite crude (eyeballing LED duty cycle) compared to your measurements, but when I see a solid-on LED with a loop that takes ~100ms (as per my code above), that is telling me that my post-loop Spark.process) is running in the ballpark of 50+ ms per call (compared to your 1-15 ms per call).

I guess it’s time to stop being lazy and get some actual measurements here.

bko · July 28, 2015, 2:18am

Spark.process() services the Spark Cloud connection when it needs service. Spark.connect() connects to the Spark Cloud if you are not currently connected (i.e. you are managing the connection in semi-automatic or manual mode). When the Cloud is connected it requires service at least every 10 seconds and that code is blocking.

One other point worth mentioning is that delay() can also service the Spark Cloud and therefore take a somewhat variable amount of time. It is designed to not interfere as much as possible, but prevent Cloud timeouts which as I said takes 10 seconds.

Although having delay() service the cloud if needed might seem like a strange choice, it actually eliminated a lot of problems since calling delay(30000); was now a Cloud-safe operation.

airbornemint · July 28, 2015, 2:43am

Yeah, in absence of threads, the decision to service pending Cloud I/O from delay() makes sense. And nobody should be relaying on delay() to provide millisecond accuracy anyway, so if Spark.process introduces jitter of the order of milliseconds per second in calls to delay(), that’s not a big deal.

And to be clear, I am not disagreeing with any decisions about when to service pending cloud I/O. I am only disagreeing with the decision to make the implementation of Spark.process() use blocking sends, because it (in my experience) makes loop() unresponsive to the tune of 100+ ms (give or take; I haven’t yet had the time to measure it precisely).

To give you an idea of how bad this is, I wrote a different app which just PWMs an LED on a digital output pin, and ramps the PWM from 0% to 100% and back down to 0% duty cycle over the span of 2 seconds (so, your basic throbbing/breathing LED indicator with 2-second cycle). This PWM is done completely from loop(), with no calls to delay() — it just loop()s straight through and calls millis() to decide when to increment/decrement PWM duty cycle; which is to say, Spark.process is called as often as humanly possible, thus guaranteeing that the amount of work it has to perform on each call is as small as possible.

And with that code, after Spark.connect has returned (so the only thing that’s happening besides the LED PWMing is the implicit calls to Spark.process), the indicator freezes for seconds at a time as the Photon is connecting to WiFi and cloud.

Which is why I am complaining about this: given the current unpredictable blocking behavior of Spark.process, with delays in 100s of ms they create, it is simply impossible to create a quality user experience on a Photon. You can’t reasonably update any kind of quality UI or register any kind of quality input (such as touch input) when your refresh/sampling rate drops below 10 Hz. For some applications, this is completely immaterial — if you’re reading an environment sensor once a minute and dumping it into the cloud, you will never care about this. But if you need a user-facing device, Spark.process really hurts the way it works today.

bko · July 28, 2015, 2:57am

Hi @airbornemint

The problem is that there are not a lot of little chunks of time required to service the cloud, but one big chunk instead as a packet comes in or needs to go out.

So I think things will go better for you if you think of the Spark cloud service as one big chunk of time taken at least every ten seconds. If that means that you don’t go around loop() very often but instead have your own for/while loop inside of the loop() function, you can “schedule” the time at which the cloud is serviced to be convenient for you when you are doing other tasks or UI functions.

Just my opinion from my experience.

airbornemint · July 28, 2015, 3:13am

Ok, first of all, I am not disagreeing with your opinion here. What I am saying is that “at least once every ten seconds you UI refresh rate drops below 10 Hz” is pretty much the same as “quality UI is impossible on this device”.

No matter how I slice this, and no matter how hard I try to only call Spark.process around the interactions that my user is having with the device, the only way I see to build a decent experience around this constraint is to simply a. prevent the user from interacting with the touchscreen while I am making any network calls (because touch experience will be bad at 10 Hz) and b. prevent Spark networking code from being called while the user is interacting with the touch screen (by setting the device in manual mode).

Which I can do. I can set the Photon in manual mode, and lock the user out of the touchscreen while I am doing things with the network.

But that’s a pretty crappy user experience, and I am not convinced that it has to be this way.

I get it, right now the code consumes big chunks of time to service the cloud. Why? I looked at the firmware code, and there are blocking sends all over the place in Spark.process. Why? I don’t see anything about that code that indicates it has to be written that way, although I certainly do understand that blocking code is faster to write and easier to read.

However, as far as I understand the architecture here, the choice to use blocking sends inside Spark.process is not dictated by the architecture, and the point I am trying to make is that it’s a choice that’s incompatible with quality in user experience.

So if I am wrong, and the Photon architecture somehow forces the firmware to use blocking sends in Spark.process, then I would like to know about it, so that I can abandon Photon in favor of some architecture that serves my needs better; but if I am right, then I would like to have this behavior acknowledged as faulty, and hopefully see it fixed.

bko · July 28, 2015, 3:37am

I think the Particle team would like to make cloud functions non-blocking too, but it is a big rock to lift right now and there are lots of other rocks. This was impossible on the Core but much more reasonable on the Photon. If you want have a discussion with the team about your ideas, I urge you to file a github issue and explain it there. I am sure pull-requests are welcome too if you are up for it.

There are a lot of other options you could consider and first on my list would be a dedicated processor for the touch screen like this one, the STMPE610 but a PIC or ATTiny etc. would also work.

airbornemint · July 28, 2015, 4:10am

I am using a cap touch sensor (an FT6206) on an Adafruit breakout, actually. I’m using it over I2C, so the sensor output is unbuffered, and therefore when Spark.process hits me and drops my loop rate to 10 Hz, my touch sensor quality goes through the floor.

You are right; putting a 2nd processor in here would solve this problem. (It would also solve a bunch of other problems that I am having, like the flaky I2C on the Photon, or the fact that my Photon seems to crash while being woken from sleep by an external interrupt.)

I have seriously considered that — so thanks for the reminder that I should revisit that option. If I can find an I/O protocol that works reliably with Photon firmware (so, not I2C… is SPI robust on current firmware?), I could just drop in a well-behaved MCU with a more mature ecosystem, use that for everything except for network connectivity, and use the Photon only for its WiFi/cloud capability. Which seems like a massive overkill, but it is what it is.

ScruffR · July 28, 2015, 7:56am

I perfectly see your problem with the "blocking" nature of the cloud stuff happening between loop() iterations or when calling Spark.process(), but I'm not quite sure what you mean with above statement.
AFAIK you won't need to be worried about thread syncing your application code (unless you want to activly adopt it), since (at least to start with) the only two threads will be your app thread and the cloud background thread - unfortunately I've no idea of ETA I'm waiting for it too.

But another question jumps to mind. Will you actually need the cloud functionality or do you just need WiFi (with TCP/UDP)?
This might be a bit speedier.

On the other hand the amount of times you see this issue seems odd. I do seem to have such a problem only when having "bad" WiFi reception.

airbornemint · July 28, 2015, 8:23am

I suspect you misunderstood me. I meant "I do want both", meaning I want threading and I want Spark.process to be (as documented) a lightweight call.

I do actually use cloud capability; I have two Protons communicating with each other through publish/subscribe.

The reason that I see this issue more than you do is that I am putting the Photon to sleep to minimize power consumption. When the user wakes it (by interacting with the touchscreen), I encounter problems due to Spark.process delays. If your WiFi is solid and you never let your Photon disconnect from the cloud, you will not see this as often.

ScruffR · July 28, 2015, 8:55am

OK, I see now - sorry for my ignorance

I obviously focused too much on the term “Multi-threading […] is a bigger problem”

But with MT the need to call Spark.process() and the time loss for reconnect should fall away or become “unnoticable” - eventually making things behave (more) as the docs already suggest.

I don’t know how close to production you already are, but if you can stick with the Photon as is (with some clunky workarounds) for the time being and can get on with your dev on some other end, your problems might be solved by Particle - but again, unfortunately no ETA yet.

rvnash · July 28, 2015, 12:27pm

I agree that my tests were done under near ideal network conditions, with a router in the same room, and a reliable high speed ISP connecting me to Particle's servers. I'm sure that the calls to Spark.process() could block for much longer, as you are seeing.

Basically this means to me that you can't use the Spark cloud ecosystem if your application needs to meet some sort of minimum latency requirement. In your case @airbornemint, it is about UI responsiveness, in my case it is about a requirement to respond in so many milliseconds to a complex set of sensor inputs.

Below is my code for doing these measurements. This was meant to be throwaway code, and isn't top quality. Perhaps you can mine something useful out of it.

    // This #include statement was automatically added by the Particle IDE.
#include "elapsedMillis.h"

uint32_t returnTime = 0;
uint32_t enterTime;
uint32_t last;
uint32_t worst;
uint32_t skipOnModeChange;
uint64_t total;
uint64_t loopCount;
uint32_t sparkProcessStart;
uint32_t sparkProcessEnd;
int variableAdded;

elapsedMillis sinceLastPrint;
elapsedMillis connectAfter;

#define MYMODE AUTOMATIC
//#define MYMODE SEMI_AUTOMATIC
//#define MYMODE MANUAL

SYSTEM_MODE(MYMODE);

int brewCoffee(String command)
{
    Serial.println("\n\nGot brew coffee\n\n");
}


void setup() {
    Serial.begin(9600);
    Serial.println("Starting test");
    worst = 0;
    skipOnModeChange = 100;  // Skip the first 100 samples
    loopCount = 0;
    sinceLastPrint = 0;
    variableAdded = 0;
}

void printlonglong(uint64_t v)
{
  uint64_t xx = v/1000000000ULL;

  if (xx >0) Serial.print((long)xx);
  Serial.print((long)(v-xx*1000000000));
}

void loop() {
    enterTime = DWT->CYCCNT;

    loopCount++;
    
    if (skipOnModeChange == 0) {
        last = enterTime - returnTime;
        if (last > worst) {
            worst = last;
            Serial.print("New worst at loop count: ");
            printlonglong(loopCount);
            Serial.println();
            Serial.print("worst: ");
            Serial.println((double)worst / (double)120, 2);
        }
    } else {
        skipOnModeChange--;
    }
    
    if (sinceLastPrint >= 1000) {
        Serial.print("Loop Count: ");
        printlonglong(loopCount);
        Serial.println();
        Serial.print("Last us: ");
        Serial.println((double)last / (double)120, 2);
        Serial.print("worst: ");
        Serial.println((double)worst / (double)120, 2);
        if (MYMODE == MANUAL) {
            Serial.print("Spark.process: ");
            Serial.println((double)(sparkProcessEnd - sparkProcessStart) / (double)120, 2);
        }
        sinceLastPrint = 0;
        
    }
    
    if (connectAfter > 10000) {
        if (!WiFi.ready()) {
            Serial.print("\n\nConnecting WiFi at loop count: ");
            printlonglong(loopCount);
            Serial.println();
            WiFi.connect();
            worst = 0;
            skipOnModeChange = 100;
        }
    }
    
    if (connectAfter > 20000) {
        if (!Spark.connected()) {
            Serial.print("\n\nConnecting Spark at loop count: ");
            printlonglong(loopCount);
            Serial.println();
            Spark.connect();
            worst = 0;
            skipOnModeChange = 100;
        }
    }
    
    if (connectAfter > 30000) {
        if (!variableAdded && Spark.connected()) {
            Serial.print("\n\Adding a variable!: ");
            printlonglong(loopCount);
            Serial.println();
            Spark.variable("enterTime", &enterTime, INT);
            
            Spark.function("brew", brewCoffee);

            variableAdded = 1;
        }
    }
    
    if (MYMODE == MANUAL) {
        if (Spark.connected()) {
            sparkProcessStart = DWT->CYCCNT;
            Spark.process();
            sparkProcessEnd = DWT->CYCCNT;
        }
    }
        

    returnTime = DWT->CYCCNT;
}

ScruffR · October 20, 2015, 6:30pm

Since the last post in this thread things have moved on considerably.

If your use case involves leaving WiFi range regularly you’d need to choose a better suited SYSTEM_MODE than the default AUTOMATIC.

wesner0019 · October 31, 2016, 3:38pm

Hi @airbornemint can you by chance share your ported FT6206 library? I’m looking to start experimenting with this screen.

Also do you know if the reset and interrupt pins need to be used?

Thanks!

airbornemint · November 1, 2016, 12:29am

Yes and no.

I completely gave up on using the Particle cloud IDE because of the extremely regrettable decision by Particle to use a library format that is completely incompatible with the existing Arduino library format (which I understand, as they Arduino came up with this format after Particle web IDE was released) and the further regrettable decision to ignore this problem for a year now.

But since you asked, I put my (not very elaborate) changes to the existing Adafruit FT6206 library up as a patch file.

If you want to be the maintainer of a contributed FT2606 Particle library, you can start with the Adafruit library and my changes and then do all the shenanigans to get around Particle’s bad library format choices.

You don’t have to use the RSTN (reset) and INT (interrupt) pins, but you may want to do it anyway, depending on how you are using the sensor.

peekay123 · November 1, 2016, 2:12am

@airbornemint, Particle is very close to releasing Libraries 2.0 which will be compatible with the new Arduino library format. Stay tuned!

airbornemint · November 1, 2016, 2:37am

In that case, I’d like to encourage better transparency in the form of following up on GitHub issues that are actively being worked on (minimally, to assign them to a milestone). Thanks!

Topic		Replies	Views
Known issue: long delays or blocking code kills the connection to the Cloud Troubleshooting	39	11516	May 25, 2016
Spark.process blocking for exactly 20 seconds Troubleshooting	6	1387	July 3, 2015
Create time out for Spark.connect()? Firmware	43	11254	August 31, 2016
Spark Core Execution Speed Firmware	20	7711	July 2, 2014
Core becoming unresponsive, Timed out Troubleshooting	14	2105	February 21, 2014

Spark.process() blocks for extended periods of time, contrary to documentation

Related topics