[Solved] Red hard fault SOS if you upgrade 1.2.1 and have incorrect wifi details


#1

Update: Upgrade any 1.2.1+ boards to 1.4.4 as a priority to avoid this critical bug
Do a npm install -g particle-cli to get 1.53.0+ version of particle-cli and then particle update! More info:

Old post:
** WARNING, particularly to commercial customers **

Friendly heads up to anyone like me who got tape and reels over the past few months and they came with 1.2.1 from factory.

If you upgrade from 1.2.1 (possibly others!) to 1.4.2 or 1.4.3 (the “latest and greatest”) and then supply wifi credentials that do not match the environment then it will red SOS.

particle identify; # validate 1.2.1, maybe others!
particle update;
particle identify; # validate 1.4.2
particle serial wifi --file yourwificredentialsthatdontmatchanything.json

This means if you provision at factory, then ship to a customer who does not have matching details, it will red SOS within 9 seconds of being turned on. Fun!

The team have been sitting on it for a fortnight: https://github.com/particle-iot/device-os/issues/1968

They should have been emailing and calling folk, but not even a fix as yet.

The way we’ve been remediating 500 units of ours is to run a wifi network matching our factory. We have a script watching the devices come online with a specific version that was used for this batch at the factory and then flash a new version (so that the script isn’t triggered again later on). We then call a cloud function that sits in the firmware that does a wifi.clearcredentials. About 25% of the boards don’t go online at all and require manual button pressing to clear creds, then we issue wifi creds over serial, and wait for it to do the prior process. Even worse is we’ve got about 40/500 that don’t even respond to serial anymore, for DFU or serial.

Hope you haven’t sold too many of those devices to folk for Black Friday…


#2

@ScruffR, that bundle of fun with 1.4.2 is a lot bigger than originally thought.


#3

@mterrill Thanks for the heads up. We clear the factory wifi details before shipping and have not started using 1.4.2 or 1.4.3 in production. I don’t understand what you mean by WiFi network matching - do you preload WiFi credentials for customers or are you referring to the factory WiFi credentials. In any case if credentials get changed in the future by the customer or the device is out of range of a stored AP then instead of handling this it just blows up ? Great!!


#4

Was referring to factory wifi credentials. Which we normally leave on the device, as then there is the exact same startup process (green blinking for a minute, then goes to dark blue / setup) for user experience consistency.

I’m honestly not sure about future usage and edge cases. Our testing has been that it’s ok once you clear wifi creds after the upgrade. there could be scenarios for subsequent issues but touch wood we’re ok.

they’re working on a fix https://github.com/particle-iot/device-os/pull/1976


#5

Overview

We have confirmed that the SOS is caused by incorrectly initialized (0xff, instead of expected 0x00) AP/credentials list in WICED-specific DCT area on 1.2.1+ manufactured Photons and P1s.

In the absence of sanity checks both in DeviceOS and WICED (Broadcom/Cypress), the empty 0xff-filled entries are being treated as valid and cause the crash after being passed to the WICED stack. The problem easily showcases itself after a connection to the last valid configured access point fails (e.g. due to not being in view, or invalid passphrase) and the device attempts to go further down the list of stored credentials.

We are still looking into the details on why the DCT was not initialized correctly by DeviceOS on first startup through appropriate system flags after we’ve moved things around in the factory image in order to fit latest DeviceOS releases (#1887), however we’ve already identified that the factory DCT image has this problem and should have been correctly initialized in the first place without resorting to runtime initialization.

Solution

We have a long-term solution #1976 which fixes both problems:

  1. We are adding sanity checks when loading the stored WiFi credentials.
  2. We are zero-initializing the AP/credentials list in the factory DCT image. While this shouldn’t have been required and that area was supposed to be initialized by DeviceOS, we are resolving this problem this way, while still investigating the original causes.

These changes will be included in the nearest upcoming DeviceOS release as soon as possible. The DCT image in the manufacturing release has already been updated.

Workarounds

  1. This problem can be avoided by clearing WiFi credentials once on freshly out of the box devices either with the MODE/SETUP button or running an application with STARTUP(WiFi.clearCredentials()) if clearing with a button is not possible.
  2. If credentials need to be kept the following code snippet may be used to fixup the stored credentials list on startup in the application. The credentials list will only be modified if there are 0xff-filled entries present.

Unfortunately, at the moment there is no easy way to resolve this without using an application, manually clearing the credentials or using currently unreleased DeviceOS feature branch #1976. We are researching a couple of options, however we are being presented with technical difficulties being unable to write into the WICED-specific DCT using e.g. DFU.

We’d like to apologize to any customers affected by this issue and are going to make a separate announcement with the progress updates and additional measures to be taken to prevent such issues from happening going forward.


#6

Cheers! Have updated to particle-cli 1.53.0 and 1.4.4 device-os.

@Dave have validated 1.4.4 fixes virgin 1.2.1 PCBs with no manual or wifi.clearcredential intervention on our behalf.

We’ll flash the remaining ~80 virgin boards with this combo.

We’re satisfied that the other 400 after manual intervention will be ok, but we’ll schedule a fleet upgrade post Christmas cooking season (as a lot of people don’t cook often, then pull out their Smartfire for major holidays and it’s easier support wise to do it outside the holidays).


#7

Thanks for the code snippet @avtolstoy. Just a quick question regarding the include files in that code snippet. How do I get them into my project???

Getting this error:

In file included from ../hal/src/photon/dct_hal.h:12:0,
                 from src/LargeGrainFather.cpp:11:
../hal/src/photon/wiced/platform/include/platform_dct.h:430:37: fatal error: ../../utilities/crc/crc.h: No such file or directory
 #include "../../utilities/crc/crc.h"
                                     ^
compilation terminated.

#8

Replied on GitHub: https://github.com/particle-iot/device-os/issues/1968#issuecomment-572355510