I have noticed that most of my Xenons reset and shows in particle console this several times a day:
“spark/device/last_reset panic, assert_failed”
I have reduced to SW to bare minimum that causes panics. It does not do anything else than publishes and sleeps.
Removing sleep removes the problem, so seems to removing the publish. But these are pretty standard stuff, should work without problems.
This appears in Xenons but I am not using any Xenon features in test program. Of course it is part of the mesh in order to publish and I have one Argon in mesh to connect to internet.
I have also noticed that actually there are more resets than the console shows, the panic text is not always shown.
This works same way with 1.4.4. and 1.5.0
Is this a known problem? How can I debug it?
The code (could be even simpler than this):
unsigned long waitEndTime = 0;
enum State { wait, measure, ready } state = wait ;
void setup()
{
waitEndTime = millis() + 30*1000;
}
void loop()
{
if (state == wait)
{ // Wait a while in beginning to allow cloud flash after reset.
// After this the SW is online so briefly that cloud flash is not possible
if (millis() > waitEndTime){
state = measure;
}
}
else if (state == measure)
{
Particle.publish("test", "testing...", PRIVATE);
state = ready;
}
else if (state == ready)
{
System.sleep(D2, RISING, 10);
state = measure;
}
}
OK, thanks. Target is of course to wait until connected. I thought that publish will make sure of that.
I will update the code so that it loops until connected returns true and then publishes. Lets see how it impacts. BTW, this is not mentioned in documentation.
(Yes, switch-case is usually used, I just left it this way since it showed the problem, changing and testing the change takes about one day.)
I added test around publish so that is publishes only when connected and then moves to next state:
if (Particle.connected()){
Particle.publish("test", "testing...", PRIVATE);
state = ready;
}
Unfortunately, testing for connected did not have any impact. Still panics.
Any other ideas? No one else seeing this? It should happen with any Xenon, that goes to sleep and then publishes. I have multiple Xenons with different SW and all that do this sleep/publish pattern do panics too.
I’m still figuring it out myself. @meshmesh Are you running DeviceOS 1.4.4 or 1.5? 1.5 has an entirely revamped sleep configurator, that is a little tricky to figure out. You can’t use the old method of calling sleep any more.
One little note: Hibernate is not deep sleep, and Stop is standby - I might have that wrong too, so take this all with a grain of salt.
@no1089, I am using 1.5.0. The “old” sleep seems to work fine (most of the time). But the Sleep 2.0 does not, I have another thread about specifically sleep 2.0: Sleep 2.0 in 1.5.0 not working as expected. Maybe it is better discuss Sleep API there.
I think I finally cracked this!
I debugged the problem and found out that the notify event queue (m_ntf_queue in device-os/third_party/openthread/openthread/radio/nrf_802154_swi.c) was full of NTF_TYPE_RECEIVE_FAILED events with error code NRF_802154_RX_ERROR_INVALID_DEST_ADDR. So other mesh devices were sending more frames that could fit into event queue at the time. I also found out that this happens when another Xenon comes out of sleep. I have six other Xenons running and three of them execute periodic sleep with different intervals (5 and 10 min, 1 hour). This caused seemingly random pattern of panics.
When testing the sleep, I had one extra Xenon with 5 seconds sleep interval so the problem manifestated almost immediately. >
So if someone is running the test program without any other device executing sleep sequence, the problem does not appear.
I fixed this now by ignoring these events since they are frames that are not meant for this device, they will be eventually more or less ignored anyway. So I added in device-os/third_party/openthread/openthread/radio/nrf_802154_core.c:
/** Notify MAC layer that receive procedure failed. */
static void receive_failed_notify(nrf_802154_rx_error_t error)
{
if (error == NRF_802154_RX_ERROR_INVALID_DEST_ADDR)
{
return; // this frame was not meant to us, ignore it
}
...
The event then never gets inserted in notify queue.
This problem is probably related to receiving the events from another device while still waking up from sleep. It occurs more easily with tight sleep loops like in the example. But it occurs in all my Xenons having sleep cycle, regardless of the structure of the SW.