DeviceOS 3.2.0 crashing during OTA from 3.1.0

I’m OTA’ing from DeviceOS 3.1.0 to 3.2.0 and we have enable/disableUpdates in our firmware (see this issue Calling System.disableUpdates() ends in SOS Hard fault (DeviceOS 3.2.0)). After the following OTA sequence the LED is constant white with no user code running (no watchdog triggering)

First device logs

0001202148 [comm.ota] INFO: Received UpdateStart request
0001202148 [comm.ota] INFO: File size: 42073
0001202148 [comm.ota] INFO: Chunk size: 512
0001202150 [comm.ota] INFO: File checksum:
30086c33166b64b583c6cc63a8a0e11dba618a548d22e21df6908edc5461bfdd
0001202150 [comm.ota] INFO: Starting firmware update
0001202712 [comm.ota] INFO: Start offset: 0
0001202712 [comm.ota] INFO: Chunk count: 83
0001208550 [comm.ota] INFO: Received UpdateFinish request
0001208551 [comm.ota] INFO: Validating firmware update
0001209187 [comm.ota] INFO: Update time: 7039
0001209188 [comm.ota] INFO: Transfer time: 5189
0001209188 [comm.ota] INFO: Processing time: 1432
0001209188 [comm.ota] INFO: Chunks received: 83
0001209190 [comm.ota] INFO: Chunk ACKs sent: 49
0001209190 [comm.ota] INFO: Duplicate chunks: 0
0001209190 [comm.ota] INFO: Out-of-order chunks: 10
0001209191 [comm.ota] INFO: Applying firmware update
0001209966 [system] INFO: Cloud: disconnecting

After cloud disconnect, full white LED.

Edit: I’ve tested it on two devices, same issue. Removing power and connecting it does not seem to affect the state. It will boot directly into an open serial connection with constant white LED.

Second device logs:

0001653536 [comm.ota] INFO: Received UpdateStart request
0001653537 [comm.ota] INFO: File size: 387296
0001653537 [comm.ota] INFO: Chunk size: 512
0001653537 [comm.ota] INFO: File checksum:
b2e41d322b30056dcdeece5a5052ddf746ca35f5f79ba83a645471d3c5c611d7
0001653539 [comm.ota] INFO: Starting firmware update
0001658324 [comm.ota] INFO: Start offset: 0
0001658325 [comm.ota] INFO: Chunk count: 757
0001698094 [comm.ota] INFO: Received UpdateFinish request
0001698094 [comm.ota] INFO: Validating firmware update
0001703094 [comm.protocol] INFO: Received DESCRIBE request; flags: 0x04
0001703122 [comm.protocol] INFO: Posting 'M' describe message
0001703446 [comm.ota] INFO: Update time: 49910
0001703446 [comm.ota] INFO: Transfer time: 39127
0001703448 [comm.ota] INFO: Processing time: 13354
0001703448 [comm.ota] INFO: Chunks received: 771
0001703448 [comm.ota] INFO: Chunk ACKs sent: 463
0001703449 [comm.ota] INFO: Duplicate chunks: 14
0001703450 [comm.ota] INFO: Out-of-order chunks: 66
0001703450 [comm.ota] INFO: Applying firmware update
0001709214 [system] INFO: Cloud: disconnecting


Serial connection closed.  Attempting to reconnect...
Serial monitor opened successfully:

Updated

I can flash the units manually (using USB). I rolled one of the devices back to an earlier firmware version and compiled with 3.1.0. However, flashing from FW v.1, DeviceOS 3.1.0 to FW v.2 DeviceOS 3.1.0, the issue still persists. So I am unsure whether it is due to the DeviceOS or something in the firmware?

could it be that the firmware is taking a long time(or blocking/crapped out) before running the setup() function?
This can happen if a constructor of a class that is called before setup() takes long time to finish.

While troubleshooting my issue, I got a lot of solid white and even blinking white too.
I could not pinpoint the reason for the blinking white yet.

But I can tell you how to “artificially create” a solid white.

imagine there is MyClass declared before setup():

MyClass _myClass;

setup()
{
  blahblah;
}
loop()
{
  blahblah2;
}

Now, make the constructor of MyClass take a loooong time:

MyClass::MyClass()
{
  int i = 1;
  while (i <= 100000000000) {
    ++i;
  }
}

and there you will get a solid white situation. Seems to me is like the “pre-boot” phase of DeviceOS.

For some esoteric reason, the delay() function does not work here. It seems ignored.

Back to your code, could it be that something blocking was added? something that takes a long time? something that under some situations can stop the code from going further?

Do you have the System.disableUpdates() called before setup() gets called (example: in a constructor of a class that gets instantiated before loop() )?

Cheers

1 Like

Hello @gusgonnet,

That is actually a very solid point, thank you for elaborating this in details. I don’t have anything blocking as far as I can see in my class constructor, but let me see if I can refactor it to initialize only pinModes and set required pins low, so It doesn’t take a long time to execute. Everything else I can move into a initialization function during setup. Let me try it out and I’ll get back to you!

1 Like

While this has changed from time to time between Device OS versions it's best to not set pinMode() in a constructor as (in some versions) these functions themselves rely on other objects to already be instantiated which they may not be.

See here some old hint in that direction

Then a few years later the same issue reappeared (e.g. here pietteTech_DHT library does not work with latest Particle OS (redux) - #4 by ScruffR)

You can also try SYSTEM_THREAD(ENABLED).

I can confirm in my case I'm always using that, still getting the issue.

Yes, and I suspect a call to System.disableUpdates(); is in the same position. Perhaps it cannot be called before setup() is invoked, but I could not find any limitations stated on the docs.

Thanks for the extra perspective and tips!

1 Like

The limitation regarding constructor execution order are not something Particle specific but a general thing with C++, hence I'd not expect each and every C++ fundamental to be documented over and over for each and every function :wink:

1 Like

You definitely should not call System.disableUpdates() any time before setup().

You should not call anything that sets or gets system flags from a STARTUP() macro or the constructor of a globally object. Basically assume that everything is unsafe from STARTUP or a global constructor unless otherwise stated. You can set pinMode and digitalWrite, but I would hesitate to do much more.

Hmm, can you clarify?
Wasn't e.g. System.enableFeature(FEATURE_RETAINED_MEMORY); one of the system flags that should have been set in STARTUP()? (I well remember official docs to state so in the past :wink: )
When has that changed?

With the given ambiguity (and historical back and forth) I'll contradict my own assertion about C++ fundamentals above and would suggest some kind of "global footnote" on functions whether they are safe or not to be called in STARTUP() and/or global constructors.

1 Like

The feature flags enabled with System.enableFeature() like FEATURE_RETAINED_MEMORY are different than the system flags, like System.disableUpdates(). The feature flags can be set from STARTUP or a global constructor.

So I guess it is true that there are a few other things that are safe, but you should still assume that most things are not safe from STARTUP or a global contructor.

2 Likes

Update from my end. It seemed like moving logic out of the class constructor fixed the issue at hand. I did not have any System.enableUpdates or similar in the constructor, but mostly pinModes and digitalWrite, alongside some Serial.begin/Wire.begin logic. I refactored most of this into an initialization function for the class and now devices are updating as expected. I do not have any system flags or similar in the constructor or STARTUP.

Huge thanks goes out to @gusgonnet for pinpointing it so quickly!

2 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.