Determine SoS or Panic reset reason programmatically

When my device goes into SoS mode (blinks SoS signal and then blinks a code from 1 o 14 red flashed indicating the reason for the SoS), is there a way to programmatically read or save that enum before/after the device resets?

For example, if the red blink was due to Out of heap memory as described on the red blink SoS UI page, and therefore caused 8 red blinks to be displayed on the device, is there some way that I can save the number 8 to be used later on a subsequent runtime?

It doesn't seem to be a documented feature of Particle OS, however I suspected there might be something "under the hood" here?

Hey Tom,

Can you use System.resetReason() and System.resetReasonData() - for after-reset analysis?

Example:

    if (System.resetReason() == RESET_REASON_PANIC) {
        uint32_t data = System.resetReasonData();
        if (data == 8) { // Out of heap memory
          <your actions>
        }
   }

you may know this, so please disregard if you do.

1 Like

The reset reason after reboot is the correct way. Your firmware can't be called before reset, because the device in an unpredictable state once the panic handler has been called.

1 Like

That makes sense, but does the Panic handler set any values that can be read on subsequent runtime?

For example, if I call System.resetReasonData() after a PANIC reset, will I get the number value 8 after a "out of heap memory" reset?

Yes, the resetReason() will be RESET_REASON_PANIC and the resetReasonData() is the panic code such as 8 for OutOfHeap.

OK good to know. I would humbly suggest this get added to the resetReasonData() documentation here and to the red flash SoS documentation here

I'm trying to unit test if reset reason 8 will be shown after an out of heap memory SoS reset using the following code, however I can't seem to force Particle OS to have an "out of heap memory" error:

void cause_sos_reset(uint32_t cause)
{
   myLog.warn("Causing SOS reset with cause %s (%u)", SoS_reason_name(cause), cause);
   Serial.flush();
   if(cause == SOS_REASON_HARD_FAULT){
      volatile int* ptr = NULL;
      *ptr = 0;
   }
   //Cause an out of heap memory reset on the device
   if(cause == SOS_REASON_OUT_OF_HEAP_MEMORY){
      while(System.freeMemory() > 1024){
         char* ptr = (char*)malloc(100);
         if(!ptr){
            myLog.error("Failed to allocate more memory");
            
            break;
         }
      };
      //Now try to force particle OS to throw out of heap error
      myLog.info("System.freeMemory() = %u", System.freeMemory());
      myLog.info("transferring to Particle.process() loop");
      Serial.flush();
      Particle.connect();
      while(1){
         Particle.process();
         if(Particle.connected()){
            Particle.publish("out of heap");
         }
      }; 
   }
}

Which gives the following output (the last bit loops forever... Particle.process() calls never result in a SoS reset when I try to connect to the Particle servers with only 1008 bytes of heap.

How can I force an "out of heap" SoS reset for testing purposes?

0000000925 [app.main] WARN: Causing SOS reset with cause Out of heap memory (8)
0000000937 [app.main] INFO: System.freeMemory() = 1008
0000000938 [app.main] INFO: transferring to Particle.process() loop
0000001046 [system.nm] INFO: State changed: DISABLED -> IFACE_DOWN
0000001054 [system.nm] INFO: State changed: IFACE_DOWN -> IFACE_REQUEST_UP
0000001063 [net.ifapi] INFO: Netif wl3 state UP
0000001064 [net.esp32ncp] TRACE: NCP event 3
0000001065 [net.esp32ncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_UP
0000001065 [system.nm] INFO: State changed: IFACE_REQUEST_UP -> IFACE_UP
0000001070 [system.nm] TRACE: Interface 4 power state changed: POWERING_UP
0000002370 [ncp.esp32.at] TRACE: > AT
0000002371 [ncp.esp32.at] TRACE: < OK
0000003372 [ncp.esp32.client] TRACE: NCP ready to accept AT commands
0000003372 [ncp.esp32.at] TRACE: > AT+MVER
0000003374 [ncp.esp32.at] TRACE: < 5
0000003374 [ncp.esp32.at] TRACE: < OK
0000003374 [ncp.esp32.at] TRACE: > AT+GETMAC=0
0000003376 [ncp.esp32.at] TRACE: < +GETMAC: "e8:9f:6d:ec:0b:dc"
0000003377 [ncp.esp32.at] TRACE: < OK
0000003379 [ncp.esp32.at] TRACE: > AT+CMUX=0
0000003385 [ncp.esp32.at] TRACE: < OK
0000003386 [ncp.esp32.mux] INFO: Starting GSM07.10 muxer
0000003386 [ncp.esp32.client] ERROR: Failed to perform early initialization
0000003387 [net.esp32ncp] TRACE: NCP event 3
0000003387 [net.esp32ncp] TRACE: NCP power state changed: IF_POWER_STATE_DOWN
0000003388 [system.nm] TRACE: Interface 4 power state changed: DOWN
0000003388 [net.esp32ncp] ERROR: Failed to initialize wifi NCP client: -210
0000003488 [net.esp32ncp] TRACE: NCP event 3
0000003488 [net.esp32ncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_UP
0000003489 [system.nm] TRACE: Interface 4 power state changed: POWERING_UP

It will be hard to get an out of heap memory SOS because normally memory allocations that can't be fulfilled return 0 and fail silently instead of causing an SOS. I just looked and indeed the only time you get an OutOfHeap SOS is if you have an out of memory handler, and an out of memory error occurs within your out of memory handler.

Also added additional documentation on System.resetReason() to specify what happens after a system panic, and also added links between that, the LED status page, and the last_reset cloud event documentation.

2 Likes

OK makes sense.

I have successfully unit tested Hard fault and stack overflow and confirmed that the resetReasonData() does indeed match the SoS spec.

Below code for reference if anyone is interested in playing around with this. Just include reset_logic.h and call unit_test_panic_reset_reasons() as first line of code in setup()

reset_logic.h

//From https://docs.particle.io/troubleshooting/led/#red-flash-sos
typedef enum sos_reasons_t{
    SOS_REASON_HARD_FAULT = 1,
    SOS_REASON_NMI_FAULT = 2,
    SOS_REASON_MEMORY_MANAGER_FAULT = 3,
    SOS_REASON_BUS_FAULT = 4,
    SOS_REASON_USAGE_FAULT = 5,
    SOS_REASON_INVALID_LENGTH = 6,
    SOS_REASON_EXIT = 7,
    SOS_REASON_OUT_OF_HEAP_MEMORY = 8,
    SOS_REASON_SPI_OVERRUN = 9,
    SOS_REASON_ASSERTION_FAILURE = 10,
    SOS_REASON_INVALID_CASE = 11,
    SOS_REASON_PURE_VIRTUAL_CALL = 12,
    SOS_REASON_STACK_OVERFLOW = 13,
    SOS_REASON_HEAP_ERROR = 14
}sos_reasons_t;


//Helper functions
const char* reset_reason_name(int reason);
const char* SoS_reason_name(uint32_t data);
//Unit testing functions
void cause_sos_reset(uint32_t cause);
void unit_test_panic_reset_reasons();


#endif

reset_logic.cpp

#include "Particle.h"
#include "reset_logic.h"

#define PANIC_MAGIC 0xDEADBEEF

static Logger myLog("app.reset");

//Unit testing variables
typedef struct panic_unit_testing_t{
    uint32_t start_magic;
    uint32_t reset_count;
    int expected_reset_reason;
    uint32_t expected_reset_data;
    uint32_t end_magic;
}panic_unit_testing_t;

retained panic_unit_testing_t panic_tests;


//--------------------------------------------------------------------------------------------------
void recursive_stack_overflow_test(uint32_t sSize)
{
    //Define some variables on the stack so that it will overflow eventually!
    uint32_t stackVars[sSize];
    delay(1);
    for(int32_t i = 0; i < sSize; i++){
        stackVars[i] = 1 + millis();
    }
    recursive_stack_overflow_test(sSize * 2);
}

//- 
void cause_sos_reset(uint32_t cause)
{
   myLog.warn("Causing SOS reset with cause %s (%u)", SoS_reason_name(cause), cause);
   Serial.flush();
   delay(1000);
   if(cause == SOS_REASON_HARD_FAULT){
      volatile int* ptr = NULL;
      *ptr = 0;
   }
   //Cause a stack overflow
   if(cause == SOS_REASON_STACK_OVERFLOW){
      recursive_stack_overflow_test(1);
   }

   //Cause an out of heap memory reset on the device
   if(cause == SOS_REASON_OUT_OF_HEAP_MEMORY){
    //Can't make this work easily unless follow recipe here:
    ///https://community.particle.io/t/determine-sos-or-panic-reset-reason-programmatically/68985/8?u=jaza_tom
   }
   //Should not make it to here Print occasional info and loop forever
   while (true) {
      myLog.warn("cause_sos_reset(%u) failed to cause an SoS reset!", cause);
      delay(1000);
   };
}

//--------------------------------------------------------------------------------------------------
void unit_test_panic_reset_reasons()
{
    Serial.begin(230400);
    waitFor(Serial.isConnected, 5000);
    delay(1000);
    myLog.info("UNIT_TEST_PANIC_RESET_RESETREASON");
    int rReason = System.resetReason();
    uint32_t rData = System.resetReasonData();
    myLog.info("Reset reason: %s (%d)", reset_reason_name(rReason), rReason);
    myLog.info("Reset data: %s (%u)", SoS_reason_name(rData), rData);
    myLog.info("Start magic = %08lx  |  End Magic = %08lx", panic_tests.start_magic, panic_tests.end_magic);
    //Do we need to initialize the retained unit testing variables?
    //Re-Initialize if the SRAM just isn't initialized yet or if the test has run its course previously
    if(
        panic_tests.start_magic != PANIC_MAGIC
        || rReason != RESET_REASON_PANIC
        || panic_tests.start_magic == panic_tests.end_magic
    ){
        myLog.info("resetting retained panic_tests vars");
        memset(&panic_tests, 0, sizeof(panic_tests));
        panic_tests.start_magic = PANIC_MAGIC;
    }
    //If we have previously saved expected reset reason, evaluate it!
    else{
        panic_tests.reset_count++;
        myLog.info("Reset count: %u", panic_tests.reset_count);
        myLog.info("Expected reset reason: %s (%d)", reset_reason_name(panic_tests.expected_reset_reason), panic_tests.expected_reset_reason);
        myLog.info("Expected reset data: %s (%u)", SoS_reason_name(panic_tests.expected_reset_data), panic_tests.expected_reset_data);
        if(panic_tests.expected_reset_reason != rReason){
            myLog.error("Reset reason mismatch");
        }
        if(panic_tests.expected_reset_data != rData){
            myLog.error("Reset data mismatch");
        }
    }
    //Don't do the process more than 3 times
    if(panic_tests.reset_count < 4){
        panic_tests.expected_reset_reason = RESET_REASON_PANIC;
        //Change up the cause of the PANIC reset
        if(rReason == RESET_REASON_PANIC){
            if(rData == SOS_REASON_HARD_FAULT){
                panic_tests.expected_reset_data = SOS_REASON_STACK_OVERFLOW;
            }
            // else if(rData == SOS_REASON_STACK_OVERFLOW){
            //     panic_tests.expected_reset_data = SOS_REASON_OUT_OF_HEAP_MEMORY;
            // }
            else{
                panic_tests.expected_reset_data = SOS_REASON_HARD_FAULT;
            }
        }
        else{
            panic_tests.expected_reset_data = SOS_REASON_HARD_FAULT;
        }
        cause_sos_reset(panic_tests.expected_reset_data);
    }
    myLog.info("SoS reset retention unit testing complete!");
    panic_tests.end_magic = PANIC_MAGIC;
    while(1){
        Particle.process();
    };
}



const char* reset_reason_name(int code)
{
    switch (code){
        case RESET_REASON_NONE:
        return "NONE";
        break;
        case RESET_REASON_UNKNOWN:
        return "UNKNOWN";
        break;
        case RESET_REASON_PIN_RESET:
        return "PIN_RESET";
        break;
        case RESET_REASON_POWER_MANAGEMENT:
        return "POWER_MANAGEMENT";
        break;
        case RESET_REASON_POWER_DOWN:
        return "POWER_DOWN";
        break;
        case RESET_REASON_POWER_BROWNOUT:
        return "POWER_BROWNOUT";
        break;
        case RESET_REASON_WATCHDOG:
        return "WATCHDOG";
        break;
        case RESET_REASON_UPDATE:
        return "UPDATE";
        break;
        case RESET_REASON_UPDATE_ERROR:
        return "UPDATE_ERROR";
        break;
        case RESET_REASON_UPDATE_TIMEOUT:
        return "UPDATE_TIMEOUT";
        break;
        case RESET_REASON_FACTORY_RESET:
        return "FACTORY_RESET";
        break;
        case RESET_REASON_SAFE_MODE:
        return "SAFE_MODE";
        break;
        case RESET_REASON_DFU_MODE:
        return "DFU_MODE";
        break;
        case RESET_REASON_PANIC:
        return "PANIC";
        break;
        case RESET_REASON_USER:
        return "USER";
        break;
        case RESET_REASON_CONFIG_UPDATE:
        return "CONFIG_UPDATE";
        break;
    }
        return "??";
}


//From https://docs.particle.io/troubleshooting/led/#red-flash-sos
const char* SoS_reason_name(uint32_t data){
    switch (data){
        case 1: return "Hard fault"; break;
        case 2: return "Non-maskable interrupt fault"; break;
        case 3: return "Memory Manager fault"; break;
        case 4: return "Bus fault"; break;
        case 5: return "Usage fault"; break;
        case 6: return "Invalid length"; break;
        case 7: return "Exit"; break;
        case 8: return "Out of heap memory"; break;
        case 9: return "SPI over-run"; break;
        case 10: return "Assertion failure"; break;
        case 11: return "Invalid case"; break;
        case 12: return "Pure virtual call"; break;
        case 13: return "Stack overflow"; break;
        case 14: return "Heap error"; break;
    }
    return "?";
}

Output:

0000001776 [app.unit_testing] INFO: UNIT_TEST_PANIC_RESET_RESETREASON
0000001776 [app.unit_testing] INFO: Reset reason: DFU_MODE (120)
0000001777 [app.unit_testing] INFO: Reset data: ? (0)
0000001778 [app.unit_testing] INFO: Start magic = deadbeef  |  End Magic = deadbeef
0000001778 [app.unit_testing] INFO: resetting retained panic_tests vars
0000001779 [app.unit_testing] WARN: Causing SOS reset with cause Hard fault (1)
0000001776 [app.unit_testing] INFO: UNIT_TEST_PANIC_RESET_RESETREASON
0000001777 [app.unit_testing] INFO: Reset reason: PANIC (130)
0000001777 [app.unit_testing] INFO: Reset data: Hard fault (1)
0000001778 [app.unit_testing] INFO: Start magic = deadbeef  |  End Magic = 00000000
0000001778 [app.unit_testing] INFO: Reset count: 1
0000001779 [app.unit_testing] INFO: Expected reset reason: PANIC (130)
0000001779 [app.unit_testing] INFO: Expected reset data: Hard fault (1)
0000001780 [app.unit_testing] WARN: Causing SOS reset with cause Stack overflow (13)
0000001791 [app.unit_testing] INFO: UNIT_TEST_PANIC_RESET_RESETREASON
0000001791 [app.unit_testing] INFO: Reset reason: PANIC (130)
0000001792 [app.unit_testing] INFO: Reset data: Stack overflow (13)
0000001792 [app.unit_testing] INFO: Start magic = deadbeef  |  End Magic = 00000000
0000001793 [app.unit_testing] INFO: Reset count: 2
0000001793 [app.unit_testing] INFO: Expected reset reason: PANIC (130)
0000001794 [app.unit_testing] INFO: Expected reset data: Stack overflow (13)
0000001794 [app.unit_testing] WARN: Causing SOS reset with cause Hard fault (1)
0000001774 [app.unit_testing] INFO: UNIT_TEST_PANIC_RESET_RESETREASON
0000001775 [app.unit_testing] INFO: Reset reason: PANIC (130)
0000001775 [app.unit_testing] INFO: Reset data: Hard fault (1)
0000001776 [app.unit_testing] INFO: Start magic = deadbeef  |  End Magic = 00000000
0000001776 [app.unit_testing] INFO: Reset count: 3
0000001777 [app.unit_testing] INFO: Expected reset reason: PANIC (130)
0000001777 [app.unit_testing] INFO: Expected reset data: Hard fault (1)
0000001778 [app.unit_testing] WARN: Causing SOS reset with cause Stack overflow (13)
0000001789 [app.unit_testing] INFO: UNIT_TEST_PANIC_RESET_RESETREASON
0000001789 [app.unit_testing] INFO: Reset reason: PANIC (130)
0000001789 [app.unit_testing] INFO: Reset data: Stack overflow (13)
0000001790 [app.unit_testing] INFO: Start magic = deadbeef  |  End Magic = 00000000
0000001791 [app.unit_testing] INFO: Reset count: 4
0000001791 [app.unit_testing] INFO: Expected reset reason: PANIC (130)
0000001792 [app.unit_testing] INFO: Expected reset data: Stack overflow (13)
0000001792 [app.unit_testing] INFO: SoS reset retention unit testing complete!
3 Likes