I am seeming some hard faults in my firmware, just got a stack overflow. I have just refactored our app to run in singletons. I came across this comment
My app is crashing in my Bluetooth singleton, which is actually running from within another singleton which acts as my FSM, which is in turn called in my main .cpp file. One thing which is weird, is that after the stack overflow panic, if i press the reset button, the device immediately gets another stack overflow panic. Sometimes it takes multiple resets before it will restart normally.
Could nesting one singleton inside another cause an issue here? In general is nesting functions an issue when it comes to Stack Overflow?
Not enough information to know. Ideally a finite state machine should rarely encounter stack issues because each state should be a small discrete operation that returns immediately, so there's little chance for stack growth.
The main application loop thread has a 6Kbyte stack and cannot be changed. When you enter loop the stack is nearly empty, and when you return from loop, it's the same as it was when you entered.
Whenever you allocate a local variable (that's not static) in a function or method, it's added to the stack, and removed when the function returns. For things like int the size is the size of the variable (4 bytes, in that case). For arrays, it's the size of the array, for example char buf is 256-ish bytes. There can be some padding, in particular if the variable is not a multiple of 4 bytes. For objects created as a local variable (not with new, and not static), the storage occupied by the object itself is added to the stack. It's usually buffers or objects that contain buffers that cause stack overflow.
If you have deeply nested functions, remember that the stack use adds up for every nested function up to loop, with some overhead for each nested call plus local variables.
Less common, but still possible in C++, is recursion. If a function calls itself it can end up growing the stack for each call for the return address, unless it's tail recursion, where the recursive call is the last thing in the function and the compiler optimizes out the stack growth.
since you are playing with BLE, you might be using arrays and pointers.
You can look into how your code is using them, since reading (or writing) one byte after the end of an array can cause a hard fault.
Also BLE data does not have a null pointer terminator, it's all raw bytes, so some c char functions might misbehave if one does not pay attention.
Pointers are a... source of fun. Good luck!
I know that PublishQueuePosixRK uses the internal flash file system. Does this library take memory away from the stack or heap? Is it easier to hit a stack overflow if the file queue or ram queue is large while using this library?
There shouldn't be any significant stack use. And each thread has its own stack, so the publishing thread is completely separate from the main application thread, software timer thread, any other thread stack.
If you use software timers (Timer class) make sure to examine that stack usage closely. Software timers run out of a thread with a 2K stack instead of the 6K stack in the main application thread, so it's much easier to overflow the stack in a timer callback.
After getting the stack overflow the SOS panic LED sequence should display, then the device should reboot. Depending on the reason for the panic it will either clear itself, or reboot again. There's no way to change that behavior if the device panics before user firmware starts to run.
It's not clear what could be causing the behavior you are seeing. The only other thing that I can think of that's sort of a long shot is a problem with something occurring in a global object constructor. The initialization order of globally constructed object is not specified by the compiler and can cause a panic before user firmware runs. However it should not behave differently after pressing the reset button, so that doesn't fit perfectly either.
Ok thanks a million. I'll keep an eye out, if I get any more information I'll update here.
If it was the global object constructor, could this potentially result in a Stack Overflow panic? It just showed the SOS sequence and then 13 blinks, then it repeats the SOS and the 13 blinks then resets and rinse repeat until the reset is pressed.
It could. I have seen this happening. Usually there was a variable that I though it was initialized that was not, or it was not happening in the order I thought it was, or how it was in the past.
In your firmware you can increase a variable if last reset was a panic, and then after a number of times you set your device to safe mode.
However, if the firmware have global classes instantiating before setup and that's were the panic happens, the device will be toast nonetheless before reaching this code.
Global classes before setup() are easy to use but can be nasty, and I find that's a great reason to use singletons that instantiate in setup().
This might be an idea, would I do this in setup do you think. I don't instantiate anything before setup except for an LED mirror. All instantiation is done as you say via singletons that setup within my setup function.
@StngBo, it is good practice to avoid complex (and possibly dependent) constructors and use a begin() member function that does all the work, which you call from setup(). This avoids problems with unpredictable constructor ordering and gives you full control over initialization order. Generally, I only set class private variables (if needed) passed as part of the singleton constructor call.