OTA update fails in Argon

We have a product based on the Argon board. We added some mechanisms to trigger an OTA update only when the end user decides to do it (by using System.disableUpdates() in Setup, unless a special flag has been set in EEPROM).
When there are updates pending, we let the end user know, and if they decide to update, we set a flag in EEPROM, and we call System.reset(). When this happens, it all looks OK: the Argon reboots (and we delete the EEPROM flag but we do not call disableUpdates), connects to the cloud, the LED starts flashing magenta, the console shows:

spark/flash/status started

After a while (*) it shows:

spark/flash/status success

then the device reboots (by itself, since we are running almost an empty loop in this case), but it comes up with the old firmware.

(*) The time that it takes for the device to signal flash successful is sometimes very short (2 seconds), and this is for a program that is about 240KB long.

I’m using device OS 4.0.2

Any ideas of what could be causing this behavior? It looks like the update is failing, so it keeps the old version, but there is no indication of the failure, or its cause.

Hey-

Are you getting any error messages? Is this limited to one Argon or does it happen on every board?

The problem in my board went away somehow. I am able to lock it to any of the ~5 firmware versions that I have deployed for that product and will update with no issues now. I know there is at least one more Argon board in that product group (out of a total of about 10 devices) with that same problem, but I do not have access to it at the moment.
There were no error messages. Like I mentioned, the Particle Console even showed the message

spark/flash/status success

I will try to get access to that other board to run some more tests, as well as testing other boards in the group. I’ll report back here.

That is really weird- please let me know when you get access to that board.

So, several months later I'm back with this issue. We are moving to a P2-based product, and we are seeing the same issue, only more often.

Sometimes after flashing a few different versions over USB (when developing and testing), a device will recover and start taking OTA updates for a while and then fail again.

Like I mentioned before, on startup we check for a flag from an internal (Flash) config file, and depending on that flag, we disable (or not) the updates with System.disableUpdates(). If the flag indicates that we have to update firmware, we do not call disableUpdates() and most of the code that we have in the main loop does not execute (we skip it). This way, we create an "Update Mode" where our product does not perform any function, but just waits for the update to complete.

In this case, the LED blinks magenta (as it should while it is updating), and in the console I see
spark/flash/status started
and a few seconds later
spark/flash/status success

The P2 reboots, but it starts up in the old version. It shows that it has an update pending, and if we enable the update on the unit (either by our normal method or by Force Enable OTA), the cycle repeats. This time, the blinking magenta during the "update" is just a blip (less than 0.5 seconds), and then reboots, but back to the original version. I still get the spark/flash/status started and then spark/flash/status success, but they are at most 1 second apart (as per the timestamps).

So it looks like the firmware downloads, it somehow verifies OK (I'm assuming that's what "success" means), but then upon reboot, some other check fails and it decides to boot to the older version. When doing it again, it doesn't look like it's even trying to download, but checks something (a hash?) and decides that it is OK ("success") and reboots. But again, after rebooting it seems to ignore the newly downloaded image and starts the old one.

If I comment out the disableUpdates() line, the device gets stuck in a boot loop (it reboots almost as soon as it connects to WiFi), in a similar way that is described here:
https://community.particle.io/t/stuck-in-ota-update-safe-mode-boot-loop/32632/4

In that post, the problem mysteriously fixed by itself (as I have seen occasionally happen to me too). Because this product has several design changes, I am expecting that some code tweaking will be necessary, and therefore I will have to be pushing OTA updates. Needless to say, I need this to work.

Thanks
MGL

This seems to be related to the flash file system being full. We are logging some info and apparently when the flash file system is full, the OTA updates will fail (even if they are supposed to be on different partitions). Does this make sense?

The binaries are stored in a different location (OTA sectors). However, it's possible that while the system is updating some other information on the file system after the update, that's when the problem is occurring.

We definitely recommend leaving empty space in the file system to prevent updating files from failing, and also because having a nearly full file system can lead to premature flash wear.