Updating from DeviceOS 1.4.4 to 1.5.0-rc.1 -> SOS Panic 1

@TrikkStar @UMD Thanks for providing additional info.

I think we are dealing with two separate problems here.

@TrikkStar I have reproduced your issue and it’s related to our changes to have better thread-safety for SPI between system and application on Gen 3, however some changes affected Gen 2 devices as well. The key problem on Gen 2 devices is due to the undefined order in which the global C++ constructors are called (STARTUP() macro creates an object in the global namespace with a constructor containing the code in the macro), the application-specific mutex for SPI class is not yet initialized at the point pinMode() is called in your STARTUP() macro and it needs to be for safety-checks on a pin configuration to be successfully made.

We’ll fix this in 1.5.0-rc.2. As a temporary workaround, please move any code that interacts with GPIO out of STARTUP() macro and perhaps into setup().

@UMD, I will still need a bit more information about your application to figure this out, unfortunately. It’s also somewhat surprising that according to serial inspect you are running an application built for 1.4.4, even though supposedly your device went through a normal OTA update. If your device is experiencing hard faults in this state (1.4.4 application, 1.5.0-rc.1 system parts), we have something that broke backwards compatibility. I would appreciate if you could provide some minimal application that reproduces the behavior.

@avtolstoy,

Not sure what is going on with the app built for 1.4.4 - would have been my testing. Starting again to ensure that we are on the same page, apologies for this.

FYI: here is my startup:

// ----------------------------
// Startup code that runs early
// ----------------------------

STARTUP(
        Keyboard.begin();           // Allows the HID device attach for the first time after boot with *both* Serial and Keyboard
        Serial1.begin(115200L);     // 2017-10-25 To stop, or at least reduce, spurious output upon start up
                                    // High baud rate seems to help too.

        // If EVT_II is using GPIO P1S6 for KEYPAD_ROW_1, must add
        // But we are not, but leave it in anyway
        System.disableFeature(FEATURE_WIFI_POWERSAVE_CLOCK);

        // Refer https://community.particle.io/t/application-softap-http-pages-issue/22499/4
        // Be sure to initialize the softAP pages in a STARTUP() macro so they
        // are setup *before* the device connects to the internet.
        //
        // If it is initialized in the setup() method, then SoftAP pages
        // won’t be available until the device has connected to the cloud.
        softap_set_application_page_handler(myPage, nullptr);
        );

Here is what I am doing right now on the errant device:

Serial inspect

Platform: 8 - P1
Modules
  Bootloader module #0 - version 400, main location, 16384 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
  System module #1 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 207
  System module #2 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #1 - version 1500
      Bootloader module #0 - version 400
  User module #1 - version 6, main location, 131072 bytes max size
    UUID: FF158A...
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 1406
  • Device is now at a different location (ie different Access Point), was breathing white (expected)
  • Could not enter Listening mode… (bad)
  • Put it into Safe Mode
  • Could enter Listening mode (good)
  • Programmed in the new Access Point credentials
  • Safe Mode could connect with Particle - witnessed connection in Particle console (good)
  • Reset
  • Program started ok (good)
  • Device would NOT connect with access point, breathing white (bad) <-- Is this a clue?
  • Put the device back to Safe Mode so that I could OTA
  • Problem - flashing green - so could not connect with access point (but it did before)… (bad)
  • Could enter Listening mode (good)
  • Programmed in the new Access Point credentials again
  • Safe Mode could connect with Particle - witnessed connection in Particle console (good)
  • Did not reset this time as suspect that the program is doing something bad to the WiFi credentials
  • Compiled my app to target 1.5.0-rc.1
  • OTA
  • Immediate SOS #1 (bad)
  • Reset to Safe Mode
  • Entered Listening Mode
  • Serial Inspect now shows:
Platform: 8 - P1
Modules
  Bootloader module #0 - version 400, main location, 16384 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
  System module #1 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 207
  System module #2 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #1 - version 1500
      Bootloader module #0 - version 400
  User module #1 - version 6, main location, 131072 bytes max size
    UUID: 13BB56...
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 1500
  • Reset (so as to go back to the application)
  • Immediate SOS #1 (bad)
  • Recompiled my app targetting v1.4.4
  • Reset to Safe Mode
  • App now works ok.
  • Back to safe mode
  • Back to listening mode
    Serial inspect
Platform: 8 - P1
Modules
  Bootloader module #0 - version 400, main location, 16384 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
  System module #1 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 207
  System module #2 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #1 - version 1500
      Bootloader module #0 - version 400
  User module #1 - version 6, main location, 131072 bytes max size
    UUID: 05B40464D3B67D8FAF37C21203099EA8CBFA38E35E08B241DAC9D11366172F73
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 1406

Conclusion

My app compiled for 1.5.0-rc.1 will SOS #1 with DeviceOS: 1.5.0-rc.1.
My app compiled for 1.4.4 will work okay with DeviceOS: 1.5.0-rc.1.

Hope this amount of detail is helpful.

I will now work on experiments with the app source code to see if I can get it to start okay when targetting for 1.5.0-rc.1…

@avtolstoy, I commented out via #ifdef blocks

  • STARTUP() - made no difference…
  • setup() - made no difference…
  • loop() - made no difference…

You mentioned SPI - this is used by the OLED display and SDCARD interfaces.

I will see if I can reproduce the fault using minimal code.

@avtolstoy as I could not reproduce the fault, am thinking that it is best that I give you the following two binaries via PM (if you are still curious):

  • app compiled for 1.5.0-rc.1 which SOS #1 with DeviceOS: 1.5.0-rc.1.
  • app compiled for 1.4.4 which works okay with DeviceOS: 1.5.0-rc.1.

Of course my app won’t do much for you as all the I/O is different, but it should allow you to see the immediate panic fault for yourself. Hopefully you can apply some debugging tools to see what is going on.

Will send the code in a couple of hours.

PS - Am going to try updating the bootloader from version 400 to 501 as per Particle Photon Flash successful but nothing happening and report back on this too.

@UMD, I had this happen on an upgrade of an Electron 3G. For some reason, everything but the bootloader seemed to update. When I flashed the new bootloader and tinker via USB (the device is local) everything worked as expected.

HOWEVER, when I flashed my app compiled to v1.5.0-rc.1 it when straight to hard fault. Like you, when I flashed the SAME code but compiled to v1.4.4, it worked just fine. The code in this case is @rickkas7 AssistNow example.

1 Like

@peekay123, this is why we all need to get stuck into this issue - “breaking” upgrades need to be understood and squashed asap otherwise we could be stuck at a particular version!

The evidence (app does not event start) could well be the compiler when it is targetting 1.5.0-rc.1…

Here is the list from Particle Device OS Updates Thread of the compiler/environment changes:

  • Enables C++14 chrono string literals for wiring APIs #1709
  • [Gen 3] Implements persistent antenna selection ( Mesh.selectAntenna() ) #1933
  • GCC 8 support #1971
  • Implements EnumFlags class for bitwise operations on C++ enum classes #1978
  • Adds missing platforms to manifest.json #1959
  • Prevents expansion of EXTRA_CFLAGS variable #2012
  • Fixes a call to objdump when the compiler is not in PATH #1961
  • [gcc] Ensure GPIO pinmap is initialized on first use #1963

Is it possible that one or more of these changes has an undesirable side effect?

Have just PM’s the my two binaries to @avtolstoy for research.

@peekay123 I don’t know how complicated “AssistNow” is (nor what it does), but it strikes me that we may have an opportunity with its source code to work out the cause of the issue…

Updated bootloader from version 400 to 501:

Platform: 8 - P1
Modules
  Bootloader module #0 - version 501, main location, 16384 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
  System module #1 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 207
  System module #2 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #1 - version 1500
      Bootloader module #0 - version 400
  User module #1 - version 6, main location, 131072 bytes max size
    UUID: 4962D89....
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 1500

Issue remains…

I’ve moved all my code out of STARTUP and am still encountering the same issue, tried an OTA update and that didn’t make a difference vs DFU.
I see mentions about SPI and should inform you I’m using both SPI buses on my P1 for I/O (SPI has an ePaper display ans SPI1 an SD card)

Attempting to flash the bootloader off github got me the following:

particle flash --usb p1-bootloader@1.5.0-rc.1+lto.bin
Error writing firmware: unknown module function 2, use --force to override

I ran an OTA flash of Tinker targeting 1.5.0-rc1 and got the following from an inspect

particle flash *** tinker --target 1.5.0-rc1  
? Which type of device? P1
attempting to flash firmware to your device ***
Flash device OK:  Update started

Flash success!

particle serial inspect
Platform: 8 - P1
Modules
  Bootloader module #0 - version 400, main location, 16384 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
  System module #1 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 207
  System module #2 - version 1500, main location, 262144 bytes max size
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #1 - version 1500
      Bootloader module #0 - version 400
  User module #1 - version 3, main location, 131072 bytes max size
    UUID: ***
    Integrity: PASS
    Address Range: PASS
    Platform: PASS
    Dependencies: PASS
      System module #2 - version 6

@TrikkStar, my experiments revealed that the bootloader version was not the issue.

I think we are circling the same SPI issue brought up by @avtolstoy because my kit also uses the same SPI bus to interface to both the

  • OLED display, and,
  • SDCARD

We should now chase this theory. Unfortunately, for me to “unhook” the display and SDCARD interfaces from my very complicated app is a nightmare.

Wondering if you are able to unhook these two interfaces from your application without too much pain? My guess (hope) is that it will work after this…

@peekay123, are you using any SPI interfaces in your errant app by any chance?

@UMD I don’t believe the @rickkas7’s GPS library or the asset tracker v2 uses SPI.

1 Like

It is the SPI mutex issue causing the immediate SOS+1 fault on 1.5.0-rc.1.

It’s the LIS3DH accelerometer that’s causing it.

panic_@0x0803b9a2 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/services/src/panic.c:68)
prvGetRegistersFromStack@0x08026baa (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/hal/src/stm32f2xx/core_hal_stm32f2xx.c:104)
<signal handler called>@0xfffffff9 (Unknown Source:0)
uxListRemove@0x08045a6c (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/third_party/freertos/freertos/FreeRTOS/Source/list.c:183)
xTaskRemoveFromEventList@0x08046980 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/third_party/freertos/freertos/FreeRTOS/Source/tasks.c:3114)
xQueueSemaphoreTake@0x08045f72 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/third_party/freertos/freertos/FreeRTOS/Source/queue.c:1479)
os_mutex_lock@0x0802be30 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/hal/src/stm32f2xx/concurrent_hal.cpp:387)
Mutex::lock@0x08025662 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/wiring/inc/spark_wiring_thread.h:246)
SPIClass::lock@0x080256a4 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/wiring/inc/spark_wiring_spi.h:246)
SPIClass::begin@0x080256a4 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/wiring/src/spark_wiring_spi.cpp:110)
LIS3DHSPI::LIS3DHSPI@0x080226cc (/Test/lib/LIS3DH/src/LIS3DH.cpp:297)
__static_initialization_and_destruction_0@0x08020c44 (/Test/lib/AssetTrackerRK/src/AssetTrackerRK.cpp:23)
_GLOBAL__sub_I_emptyResponse@0x08020c44 (/Test/lib/AssetTrackerRK/src/AssetTrackerRK.cpp:289)
call_constructors@0x0802b7a0 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/hal/src/electron/newlib_stubs.cpp:48)
LoopFillZerobss@0x080201f6 (/.particle/toolchains/deviceOS/1.5.0-rc.1/firmware-1.5.0-rc.1/build/arm/startup/spark_init.S:6)

@rickkas7, for some reason I thought the LIS3DH was using I2C. I guess this means a library update. The more reason to create the SPI_HAS_TRANSACTION #define in Particle.h and/or Arduino.h IMO.

2 Likes

It’s actually because I called SPI.begin() from the constructor, which shouldn’t be done. I’ll release new versions of the libraries to fix this, and also enable SPI transactions.

4 Likes

Good to hear that for your library, but it would be time to address this long standing issue about the Arduino SPI_HAS_TRANSACTION switch mentioned here
https://github.com/particle-iot/device-os/issues/1668

Over a year should have been plenty of time to add that single line of code to the device OS repo.

Excuse the sentiment, but I have lost count of issues I reported that never got a single official response.
Indifference is creeping in on my side for quite some time now.

4 Likes

I released version 0.3.2 of AssetTrackerRK that fixes the SOS+1 at startup with Device OS 1.5.0.

It contains LIS3DH 0.2.5 which actually has the fix; the problem was that SPI.begin() should not be called from a global object constructor. It also uses SPI transactions for compatibility with the Ethernet FeatherWing.

1 Like

@rickkas7, I can confirm that it is now working.

2 Likes

This is excellent news then - we have at least one root cause to the panic, ie SPI access

To close this case, it looks like I may need to update from SDFat 0.0.7 to 1.0.16 as per @ScruffR’s comments: DeviceOS 1.5.0-rc.1 - SoftAP using SDCARD - problem (assumption being that 0.0.7 may not be using SPI TRANSACTION or has SPI.begin() in the constructor???).

I note that I had tried quite some time ago to use the updated lib but came to grief for some reason.

As I had already modified Adafruit’s SSD1306 lib for SPI TRANSACTION usage, this should be sweet as it is.

Will report back with results…

Results are in:

First - removed SD CARD routines from my app - problem remains
Secondly - removed DISPLAY routines from my app - problem remains

Nothing else uses the SPI interface…

Conclusion

In my case, the issue is not related to SPI usage (as suspected by @avtolstoy here: Updating from DeviceOS 1.4.4 to 1.5.0-rc.1 -> SOS Panic 1)

Hmmmm… investigations to continue…