Verify a "Good" Firmware Update - User Code

I have a bug in my user code that is causing I2C Bus Faults during setup() and the P2 keeps going to SOS before it's able to connect to the internet to be recovered, so I have essentially pushed a firmware that boot-loops a number of devices that I don't have access to. Oops!

My question is related to the "Automatic Rollback" behavior that the devices use for flashing firmware updates, and if theres any way I can utilize that behavior to check for a "good" update/boot and roll back or go to safe mode if not.

I fully understand that as far as the P2 is concerned that it DID do a successful OTA update, and user code is in fact running, so there's nothing that the P2 did wrong here, I just happen to be a bad programmer, ha.

I've had experiences in the past with SBC's having an A/B firmware updater where we had to set a "success" flag in the user firmware, otherwise after switching to the B slot, on the next boot/panic/crash the updater would revert to the non-broken A slot. On a normal push, once the user firmware got to some arbitrary point, it would mark the boot as "successful" then that firmware became the new A slot. If that make sense.

Does any feature like this exist in the device OS? It seems like a complicated box to open so I understand if not but it can't hurt to ask.

I can kind of mimic this behavior i'm asking for in user code during setup where I check some flag or something in eeprom, but it still relies on user code, which is the problem i'm trying to get around.

Thanks for any assistance!

1 Like

This is not a feature of Device OS, but you could add a feature to your user firmware fairly easily.

You store a flag and a counter in EEPROM or the file system. It could be in retained memory on Gen 3 and earlier, but that won't work right for this code on Gen 4 (P2, Photon 2, M-SoM).

You clear the successful boot flag at the very beginning of setup, and set it sometime later from loop, like 3 minutes after boot. You can use the millis() counter to check, which resets to 0 on system reset. Also clear the counter in successful boot.

If, at boot, you already have a clear boot flag, you increment a counter. If that counter exceeds 2 or 3, then you are in a rolling reboot, and you call System.enterSafeMode() so the device goes into safe mode and does not run your firmware but can be flashed OTA.

Thats fair; was worth seeing if there was a separate system to force a safe mode thats totally separate from userland.

Related to userspace implementation:

I implemented something like the above but had an issue where I'd like the counter to reset to 0 whenever an update is applied, to give the new update it's 2-3 shots to get running. I was able to get it to work by using your trick of making a struct to hold the counter and putting a "version" member that was tied to the firmware version number; if the eeprom version is different than the compilation version, reset the counter to 0.

This seems to work well, but for my knowledge, is there a way I can get from the System that "this is the first boot after an update was successfully applied?" would System.resetReason() == RESET_REASON_UPDATE work for that?