I am trying to push a new firmware version out to some of my units before a larger update. I am using the same firmware version number during development, let’s say v24. When I’ve made a new change, I roll back to v23 on my test units OTA and then compile using particle compile b5som --target 5.3.0, I then delete the previous version 24 firmware on the product page. Then I upload the new bin file that I just compiled.
When I OTA my devices, for some reason it seems like they run the old version 24, instead of the new version 24, as the logging I’ve implemented doesn’t work? I’ve tried flashing one of the test devices manually using the compiled binary and running particle flash --usb <file_name_of_new_v24>, which works. But when I pull the same procedure using the OTA process of the platform, it seems to download a previous version.
In short, my question is whether some of the OTA binary files are cached when deleting and uploading using the same firmware version number?
I don’t have an answer but was curious… what’s the rationale or re-using the same version number vs incrementing the Version number? What’s the problem you are trying to solve or avoid by not just incrementing it up?
It’s a strict numerical comparison between the integer stored in the PRODUCT_VERSION in the firmware on the device and the version number in the cloud. If they match, even if the firmware is different, the device will not be upgraded.
You really should just increment the number each time. If you don’t want to use a bad version you uploaded, delete it after you upload the new one.
The exception to this is the development flow. During software development, mark the device as development and claim it to your account. Increment the PRODUCT_VERSION to the version after the latest in the product. You can repeatedly flash this code directly from Workbench or the Web IDE without uploading it to the console. Then once it’s working, upload it to the console.
Just so I understand correctly - let’s say I downgrade from v24 to v23 over-the-air, delete the v24 firmware on the platform and then re-upload a new v24, which I then OTA flash the new v24 to the device - shouldn’t it be running the new firmware, since It downgraded already?
The reason why I don’t increment every time is that the firmware version is tied to our internal systems, referenced in changelogs, roadmapping etc. etc. Seeing how many changes/updates we deploy, the number goes up quite quickly as opposed to a semantic versioning scheme.
I missed that. Since you downgraded first, the device should get the new version.
How did you downgrade? By lock and flash? Then it should work. Of course you need to unlock the device again to get the latest release.
If you did the downgrade manually, I suspect something out of sync between what the cloud believes the device should have and what is has.
Are you using intelligent OTA? If so, the device should get the update immediately after releasing the new version to the fleet (or device groups) unless you’ve disabled updates on-device.
If you are not using intelligent OTA the update won’t occur until the next cloud connection.
I downgraded by a lock and flash on each specific device, whereafter I waited for it to downgrade (confirming through events, such as flash success and a firmware version number change on the device page).
I double checked by hooking one of the nearby test units up to my PC and watching the serial data - I could see that it was running the older v.24 code, not the new v.24 code that I had OTA-updated.
Flashing the same binary as I uploaded, but over USB, worked however. I retried the same process a couple of times.
I figured out the problem. Yes, the firmware is cached, since February 2023. Before that, retrieving each firmware binary from the database created a lot of load when very large fleets did an intelligent firmware update.
Do you know what the TTL on the cache is? Thanks for checking it out - first time I ran into the problem, so something changed since last big fleet deployment. It could be nice to validate caching against checksum or binary filename when doing a hit/miss check.
Just checking in on this, as we’ve run into the problem a couple of times - how long is the time-to-live on the cache? Then we can plan around it, I suppose it’s minimum 12 hours?
The TTL is 24 hours. However, engineering in investigating invalidating the cache when you delete a firmware version, which would be a better solution.
Turns out we were so close with our cache busting! We busted the list of available firmware for an update, so we wouldn’t keep trying to send down deleted firmware, but didn’t bust the actual binary cache so if you replaced it, well… you know.
I’ve deployed fixes to our infrastructure, please let us know how things work out for you!