OTA systematically crashes Photon - Easy to reproduce

Hi There!

Back in July, when I was taken the first steps in the platform, I noticed a SOS pattern when sending an
OTA update to my Photon.
A few months lather and hundreds of programming hours later, here I’m again, and I hope that this
time a fix to the issue could be made.

Code:

PRODUCT_ID(xxxx); //you product id here
PRODUCT_VERSION(x); //your code version here

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(SEMI_AUTOMATIC);

STARTUP(WiFi.selectAntenna(ANT_EXTERNAL));

int led1 = D7;

void setup() {
  pinMode(led1, OUTPUT);
}

void loop() {

  Particle.connect();
  waitUntil(Particle.connected);

  while (true)  {
  digitalWrite(led1, HIGH);

  delay(1000);

  digitalWrite(led1, LOW);

  delay(1000);
  }
}

Above is a simple code which can be used to easily demonstrate how to crash the Photon
(stack overflow) when sending an OTA update. Any attempt to OTA update the code above will brick your Photon, leaving it in an endless loop (trying to update - crash - user code run - trying to update…). You need physical access to the device in order to restore functionality.

For the OTA update to be performed without crashing the stack you need to insert a

delay(2000);

in your code. That delay does the trick, but obviously is not a satisfactory solution and is not always
possible to pause the code for such long time.

Another “fix” is to use

    SYSTEM_THREAD(DISABLED);

but the above enormously penalizes the execution time of any

Particle.function

that you have defined in your code. (it’s almost instantaneous when ST is enabled and can take
up to 8 seconds if ST is disabled).

I think that it should be possible to manage the OTA update like an ISR, just flag the need
to update, and then pause the user code, save the stack to memory, perform the update, restore
the stack and run the user code again.

Thank you for your time.

Would any of these help achieving what you are after?
coming with 0.6.0 (currently to be tested with 0.6.0-rc.2)

https://prerelease-docs.particle.io/reference/firmware/photon/#ota-updates
https://prerelease-docs.particle.io/reference/firmware/photon/#system-events-reference

You could also use SINGLE_THREADED_BLOCK() the protect some portions of your code against crashes due to race conditions, instead of completely disabling threading.

Hi @ScruffR and thank you for your reply.
I tested System.updatesPending() a time ago without luck (the updates were never detected)
I will give it another chance.

A code like the one in the example cannot be run in SINGLE_THREADED_BLOCK() as is
effectively disabling threading. The problem is that you cannot know in which part of the executing
code the OTA update will “land”.

Sorry for insist, but I worked hard trying to provide a single piece of code that can be easily used
to demonstrate a crash. It will be great to have someone to take a look at this.
If we are hitting some device limitation here then we will need updated documentation, if not then a bigger
stack (user definable), a function to determine the stack usage…among others.

Thank you again.

Maybe @rickkas7 or @BDub can comment on that.

I think the problem is that you have an infinite loop inside of loop() and you are depending on the delay() function to service to Particle cloud in the background, but in threaded mode I don’t think that works. If you wrote your loop differently and let the loop() function return and get called again, I think it would work. I think you could also call Particle.process() in your infinite loop and it world work.

Perhaps @mdma can comment on the design of delay() with threading?

3 Likes

Hi!
Any news from Particle about this issue?

Thank you!