Photon/P1 loop()/user code blocking in SEMI_AUTOMATIC mode

Hi folks,

I am running into an issue of the user code being blocked while both the Photon and P1 try to reconnect to the Particle cloud after being dropped. I have read through many discussion threads regarding this issue but had so far no luck finding a definite solution.

My impression with the Photon change log published a while ago seemed to suggest that the new Photon firmware should be able to multi-thread. Does that mean we could run background connection code and user code in parallel? Has this been realized in firmware release 0.4.3, @mdma? (AWESOME release btw!) I didn’t see it in the Github document.

I have tried to use the SparkTimeInterval library but it only seems to work on Core.
@ScruffR, you seem to be the domain expert on this issue. Do you know if there is a clear workaround? I have a routine() in my loop that I would like to execute once every few seconds. It is not possible if the connection to cloud is lost. My current solution is to go completely off-cloud if I detect (!Spark.connected()) by calling Spark.disconnect(). It would be nice to try reconnect to cloud while being able to run the routine code in the loop().

Thanks to everyone in advance for the discussion!

3 Likes

I'm not 100% sure, but I got the impression that @peekay123 will be having the SparkIntervalTimer up and running soon after the release of 0.4.3. for application code via cloud build (which is scheduled for this week).
Once this crucial (for lots of people) library is up an running on the Photon too, you'll have less headache - and I think you should wait for that.

If you can't wait, you'd have to go through the bother of setting up your own timers and interrupts - definetly an interesting task, but tedious :wink:

BTW: I'm by far no expert on this topic, I just seem to be the "loudest" :sunglasses:

1 Like

@ScruffR, @bing1106, I’ll be working on porting (ALL!) my libraries starting tonight. The updated SparkIntervalTimer will be an interim release that will work with both the Core and the Photon. Testing so far has shown an interrupt latency time which affects the minimum timer value when using microseconds. The lower limit seems to be about 5uS after which interrupts get missed. When @mdma gets back, I’ll be looking for the cause of the delay.

Ultimately, I am looking at creating an entirely new library which supports all timer and timer/channel modes.

4 Likes

@bing1106, I’m not sure if you actually tested how long application code is blocked during connection attempts.
But if you want to get a feeling how the Photon behaves in your outlined scenario have a play with this code

SYSTEM_MODE(SEMI_AUTOMATIC);

const uint32_t waitConnect = 5000; // at least 5 sec between reconnect attempts

uint32_t msConnect;
bool firstAttempt;

void setup() 
{
    pinMode(D7, OUTPUT);
    WiFi.connect();
    firstAttempt = false;
}

void loop() 
{
    if (firstAttempt && Spark.connected())
    {
        firstAttempt = false;
    }
    else if(!firstAttempt  && !Spark.connected())
    {
        Spark.connect();
        firstAttempt = true;
        msConnect = millis();
    }

    digitalWrite(D7, millis() & 0x80); // blink every 256ms
}

When looking at the onboard LED, you’ll see that it does not block permanentliy till a cloud connection is reestablished, but only for up to 5sec (on my device this is).
If your code can cope with this kind of blocking periode, you can even get away without any interrupts.

1 Like

I can totally wait! I have kept my code pretty simple in the .ino and had moved most of the core functions to separate cpp and header files for this reason so when I come around to implement the fix it won't be a mess :grinning:

@ScruffR, I actually went through many of your code suggestions and learned a lot! Thank you for that!

1 Like

@peekay123, I look forward to that! Thank you so much for doing this! Is there a place where I could get notified? (Maybe you have a newsletter?)

I actually tried something similar on the Photon and P1 and got similar results like yours. When the connection is lost, I get once or twice in a minute that the user code is running. The user code usually runs for 2-3 seconds and freezes (which is not good) while the Photon goes back to reconnecting. I want to use the Photon as a general hardware controller. If I connect control switches/buttons to the board I would want it to work whenever I press it, even if the cloud connection is lost. So I guess I won't be getting away with the blocking period :sob:

BTW, very cool way to blink an LED!!

1 Like

That's odd. In my test I got the application code running 10 to 20 times a minute for about 2-3 sec.
And if you'd not always immediately try to reconnect you can improve the response time too.
And if you "code-wire" your buttons/switches via attachInterrupt(), you can be rely on catching every trigger (inside your ISR you cold even cancel connection attempts).

If you like that blink, try this one :wink:

    digitalWrite(D7, !(millis() & 0xA0)); 
    // or 
    digitalWrite(D7, millis() & 0xC0); 
    // or 
    digitalWrite(D7, (millis() >> 2) & 0x88); 
2 Likes

Wow the blinking codes are pretty cool! I wonder if there is a whole list out there :smile: - :pensive: - :smile: - :pensive: (HIGH-LOW-HIGH-LOW is less elegant as a single line of :wink:)

Here is an interesting find:
The user code blocking behavior while Photon tries to reconnect actually also depends on the Photons themselves. I have three and they all give somewhat different behaviors. One Photon runs the code once a minute, the second runs 3 times a minute, and the third one runs 7 times in a minute, with each execution lasting 2-3 seconds.

I think what I’ll do is that I will just, as you said, use attachInterrupt(). I image a pseudo code like this (pardon the syntax I will of course need to test this later today):

SYSTEM_MODE(SEMI_AUTOMATIC;)

setup(){
     initialize(); // user application
     attachInterrupt(pin, connect, CHANGE)
     Spark.connect();
     delay(t); // 2 seconds should be enough
}

loop(){
     if(!Spark.connected()){
         if(flag not set){
             Spark.disconnect();
             set flag;
             start timer; // when time is up, try to reconnect after some time;
         }
         if(timer time reached){
              pin goes high
         }
    }
}

void connect(){
     pin goes low
     flag reset
     Spark.connect();
     delay(2000);
}

Maybe this would work? Feel free to point out any obvious mistakes :slight_smile: Thanks!

If using switches/buttons I’d advise against the use of CHANGE since you might experience problems when the pin is floating.
So I’d go for RISING in conjunction with the approprate pinMode(pin, INPUT_PULLDOWN) or FALLING with INPUT_PULLUP.

The different behaviour of your Photons is odd again.
Have you got system firmware 0.4.3rc2 on all of your Photons?
Could it be that you’ve got a lot of different SSIDs on the slowes Photon and only a few on the quickest?
On my test Photon I’ve only got one network registered.

1 Like

@bing1106, I would not do a delay(2000) in an interrupt service routine!

3 Likes

I will give this a try and update it tonight. Thanks for the suggestions!

@peekay123 @ScruffR,

I have tested the following code below. The good news is that the user code in the loop (in this case blinking the LED) always runs and the ISR is executed whenever it is time to reconnect after the Photon is disconnected from Cloud. The bad news is that it only works sometimes. While the code allows me to run my user code almost at all times with the exception of 2-3 seconds in a minute when the Photon is disconnected from Particle Cloud, it seems a bit unstable. @ScruffR I actually dfu-ed all my Photons and P1s when I receive them with the firmware release 0.4.3 so things should be up to date.

I noticed that if the Photon drops from cloud, it could either go into flashing green or breathing white. I think in the case of breathing white, the Photon would still fire the ISR every minute it is disconnected but would never connect back to the cloud provided working Wi-Fi. Do you see any obvious bugs regarding Wi-Fi control? Would I need to give more time for the Spark.connect()?

Sorry with all the print statements, that’s how Imake sure things ran. Thanks for any help in advance.

SYSTEM_MODE(SEMI_AUTOMATIC);

int pin = D1;
int trigger = D2; // D2 (trigger) is connected to D1 (pin);
bool flag = FALSE;
unsigned int minute = 0; 

void setup(){
    Serial.begin(9600);
    pinMode(pin, INPUT_PULLDOWN);
    pinMode(D7, OUTPUT);
    pinMode(trigger, OUTPUT);
    digitalWrite(trigger, LOW);
    Time.zone(-7);
    attachInterrupt(pin, connect, RISING);
    Spark.connect();
    delay(3000);
}

void loop(){
    
    digitalWrite(D7, (millis() >> 2) & 0x88);
    
    if(!Spark.connected()){
        if(!flag){
            Spark.disconnect();
            WiFi.off();
            Serial.println("Connection lost!");
            flag = TRUE;
            Serial.println("Flag set!");
            minute = Time.minute() + 1; // try to reconnect after a minute;
            Serial.print("Will reconnect again at (add one minute to this time): "); Serial.println(Time.timeStr());
        }
        if(flag && (Time.minute() == minute)){ // time to reconnect
            Serial.print("Time to reconnect: ");
            Serial.println(Time.timeStr());
            Serial.println("Trigger fires!\n");
            digitalWrite(trigger, HIGH);
        }
    }
}

void connect(){
    digitalWrite(trigger, LOW);
    flag = FALSE;
    Serial.println("Reset triger to low and flag to FALSE.");
    if (!Spark.connected()) {
        Spark.connect();
        Serial.println("Trying to connect...\n");
    }
}

Just some side notes:

That doesn't do what the comment states. Imagine you do this at 9:40:59 or at 9:59:05.
For a time offset I'd either use millis() or a add seconds to Time.now(). I also would not use == but >=.

Do you really want to switch WiFi off (e.g. for power saving) or do you rather want to simulate loss of connection?
If the latter, I'd actually cut the AP (easiest with a mobile phone AP - easily switched on and off with one tap).

I'd avoid Serial.print() inside an ISR. Rather set some "debug" flags and do the printing in loop().

I guess your trigger pin is only for testing. If you can set the pin in code, you'd not need to use interrupts, since you can just run the ISR yourself.

I'm also not sure how this plays together

void loop()
  ... 
  if(!Spark.connected()){
    if(!flag){
      Spark.disconnect();
      ...
    }
    ...
  }
}

void connect(){
  digitalWrite(trigger, LOW);
  flag = FALSE;
  ...
  if (!Spark.connected()) {
    Spark.connect();
  }
}  

Doesn't the disconnect immediately hit after the ISR returns?

2 Likes

@ScruffR, the immediate disconnection explains why the reconnect is so flaky.My girlfriend is :rage: with me turning the Wi-Fi on and off so I will try out your suggestions tomorrow when she is not home. I have to turn the router on and off so many times I actually hooked up its power jack to a 433 MHz radio relay.

Good call on the offset comment! I use that code sometimes for quick and dirty results so I am aware of the problem. I will probably switch to (Time.now() + seconds) as you suggested. It is funny how Time.now() can come out to be such a huge number!!

Stay tuned for more test results tomorrow! :smile:

Time.now() gives you the UNIX Epoch time which is seconds since 1/1 1970 - so it’s bound to be big :wink:

For the sake of peace@home - try a seperate AP :wink: maybe your mobile supports tethering :sunglasses:

1 Like

@ScruffR, @peekay123, I had the understanding that the connection process will be ran in hal so that user code can run without blocking by the connection. Is this still the case but not yet implemented?

Thanks

@wesner0019, I think you meant it would run in another thread which is still planned though I don’t have a timeline. @mdma will establish the target for that implementation when he gets back.

1 Like

@peekay123, sorry that’s what I meant. thanks for the update. Do you know if wifi.connect is blocking too or just spark.connect?

@wesner0019, HAL is the hardware abstraction layer which does help to port from one hardware to another without the high level programmer needing to know too much about it, but hasn’t really got alot to do with multithreading.
Multithreading will come some time in the future.


I should checked, before I presse “Post” - Paul was quicker again - put the breaks on to allow normal humans to answer, too :wink:

1 Like

@wesner0019, all actions that are serviced in the one and only thread on the Particles for the time being, will have some impact on the code flow, but neither of the ones you’re talking about are actually blocking (not returning until actually connected) in SEMI_AUTOMATIC or MANUAL mode.

If you have a fiddle (add some of the otherwise implicitly performed steps like WiFi.on() and WiFi.connect() explicitly and alter the blink pattern depending on your current action) with my blinky code from post #4, you can get a feeling of the impact.