Main loop delay best practice + OTA flash

Hi Particle Community!

I’m using particle electron and had problems with flashing OTA. It just worked in safe mode, although my firmware runs otherwise very well with no issues. After some time, I figured out, that it seems to work sometimes in normal mode with a longer delay in the main loop (2s), but unfortunately not always.
So what do you think, could a too short delay in loop() really cause issues with flashing OTA?
What would be a best practice for a fast as possible stable firmware? So I mean the lowest delay value in loop(), but with really stable functionallity.
Could any other code cause the OTA flash issues?

Would be great to get some feedback and experience on this.

Thanks a lot!
Regards,
McSanz

Usually OTA problems come from too long delays or blocking code.
If you have none of the above, you will hardly ever see OTA problems.

If you can show your code, we might spot he culprit.

Also have you tried SYSTEM_THREAD(ENABLED)?

@ScruffR, another cause of OTA problems is interrupt starvation, ie. having an ISR being service too often and taking too much time from the other processes like OTA.

1 Like

Thank you very much for your feedback!

Since I develop for a company, I unfortunately can not post the complete code. This is a pity for me…

Essentially, the Particle connects to an MQTT broker and listens on an SPI interface. If something is received via SPI, the particle publishes it via MQTT. If something is received via MQTT, digital IOs are set. So if no events occur, I see absolutely nothing blocking in my code, until just a delay() in the main loop. This delay does not have to exist from my side. But as far as I know, microcontrollers should always have a certain delay in the main loop, right? So my question about a meaningful value for the Particle, or a brief info as I can roughly find.

What is still there, is a timer that calls a function after expiration. Could this timer have effects?

I have not tried SYSTEM_THREAD (ENABLED) yet, according to the documentation of the system thread at the Electron is still in the beta: https://docs.particle.io/reference/firmware/electron/#system-thread

I would be happy to receive a feedback again!

loop() is just a function that's called from within the main() loop, so you won't need to "care" for the minimum delay (if such a demand should really exists - I doubt it tho').

Ok, thank you for this information. So I have no delay in my code now.

I also tried SYSTEM_THREAD(ENABLED) now. If I add this to my code at the top, right after the includes, the Particle cannot connect to the cloud.It is blinking cyan (the “really fast blinking”) all the time.

Without the SYSTEM_THREAD(ENABLED) the firmware works fine. OTA is also possible for a short time after startup/reset. If the Particle is running a bit longer, OTA is not working anymore.

If you can’t share your code, you can still dismantle the code and put it back together with the most likely offending bits first, and then add more and more features till things break - this way you can backtrack what the actual cause is.

As said above there are several likely causes and the addition of delay() will just add a “regular” Particle.process() call on every accumulated second of waiting time in non-threaded mode.

Hello!
I’m sorry for being late. I was pretty busy lately.

I reduced the code to an absolutely minimum now:

#include "application.h"
#include "cellular_hal.h"

// product settings
PRODUCT_ID(0000); // of course i use my real product id in my tests
PRODUCT_VERSION(1);

// set own cellular credentials
STARTUP(cellular_credentials_set("a1.net", "ppp@a1plus.at", "ppp", NULL));

void setup() {

}

void loop() {

}

But I have still the same problem, the OTA flash just works a short time after a reset (maybe 1-2 minutes).
I first tested with system firmware 0.6.1 and did an upgrade to 0.6.2 now. Still the same behavior.
Do you think maybe the third party SIM could cause my problems or do you have any other ideas?

I’d say the absolute minimum code for a 3rd party SIM needs to feature a Particle.keepAlive(<ServiceProvidersRequirement>); line in setup() or in the STARTUP() macro.

2 Likes

I did some tests with Particle.keepAlive(30); now. It is much much better. So I think that was the main problem. Unfortunately, after about one or two days the OTA flash does not work anyway. I will try to make some further firmware improvements such as replacing a software timer with comparision against millis(), etc.
Otherwise I think I will reset the Electron or at least the modem once a day or somethin else…

I guess that's with some other code than the minimal sketch above.
If you use String a lot, I'd put my bet on that as possible cause.

Sorry for the very late feedback and thanks again for the great support. I would like to take this topic but again, because my Electron is still not really stable.

Yes, I have quite a few strings in use, but unfortunately see little way to get these away. What’s the problem with the strings anyway? Does the firmware hang completely after a certain time? If yes, why? RAM full? What can I do about it? Regular resets? Or are there better options?

I would be very happy to receive new feedback. Thank you very much!

Yes, “over-use” of String objects will cause heap fragmentation which - over time - can cause system hangs or even crashes. But for stability crashes would even be better as they’d clear the heap and give your code a fresh start while hangs keep your code from doing its work.

So regular resets (including deep sleep induces ones) would be a possible workaround.

Particel is aware of the issue and currently investigating ways to prevent heap fragmentation at the root - no ETA tho’

1 Like

Thank you for the info @ScruffR.

Currently, the firmware is using FreeRTOS 8.2.2, maybe using a newer version would improve this heap issue.

I came across @rickkas7 electron sample during my research. Many thanks to him for his great work!

Especially his Smart Reboot function and the Application Watchdog helped me a lot. In addition, I have also removed all “debug code” (Serial) from the firmware. So far with great results. Let’s see how stable my Electrons will be in the next few days, weeks and months.
I’ve now removed periodic restarts with these measures, especially because of the Application Watchdog.

Regarding OTA Flash, it is now in my situation that I connect only to the Particle Cloud, just if a flash is necessary. This is triggered by an own MQTT broker, where the Electron is permanently subscribed and through which the other communication required for the application takes place. This also works very well so far. More details in this thread.

Nevertheless, I will try to further optimize my code and reduce string objects as well as possible.