Device lockup question

I’ve got a little temperature monitor device sitting in my backyard keeping track of temp/humidity for me, along with talking to a lightning sensor over SPI. The temp device is an SHT31x type weatherproof sensor. I use it to send MQTT locally to my house and to send MQTT to AdafruitIO for external visualization. This setup normally works great and is very consistent. My problem is that it locks up solid after running for between 1 and 5 days (roughly), though it does seem to be getting a little bit more frequent. The device itself is sealed up in a weatherproof container, so I don’t believe it’s the environment. It reports data using the MQTT library and the AdafruitIO library. As a debug aid, I’m tracking System.freeMemory(), RSSI, and System.uptime() every 5 minutes to help me figure out if there is a pattern (nothing yet). I also enabled a SW watchdog which does not help as I think the HW is just borked when this happens. I’m not running out of memory, as the last time it froze, I had 40k still available. I don’t allocate memory other than JSON, and that’s limited to 512 bytes max, and is local to functions. It’s not just networking that I lose. If I go out and take the enclosure apart, the device just shows a solid light blue LED like it hung in the middle of it’s happy breathing.

I found this example of the ApplicationWatchdog, did I miss something?

ApplicationWatchdog wd(60000, System.reset);

What kind of SW bug could cause this behavior where the SW watchdog doesn’t trap and it stops breathing? Is it possible the Photon itself is just failing? I would be super happy if it just reset when this happened, it only reports data once a minute. Reboots would go unnoticed as long as they weren’t constant. I can replace the photon if it might be a HW problem. Honestly, I am going to try that this weekend along with some more code reviews and local testing on a second device. I don’t have enough sensors to completely duplicate though, so I can’t do this at my desk.

The challenge is that I’m going to shelter in place in Maryland in a few weeks, which is quite a distance from my house, and I would like the weather data if I can get it. I won’t be able to reset it remotely as it’s just plugged into the wall outlet on our patio and since it’s hung, I can’t send it messages.

This is often caused by the use of String or other dynamic memory stuff.
One way to get around this is to regularly reset the device (e.g. using deep sleep between periods of action) or better replace String with good ol’ C strings (aka char array).

Heap fragmentation would typically not show up with System.freeMemory() as the memory is free but not available in consecutive chunks of required size.

For the Application Watchdog to trigger the device mustnot be deadlocked and but hung hard enough to not run the network/cloud tasks - for these reasons the Application Watchdog is of very limited use IMO.

Thanks, I was afraid of that. I’ll post back next week if pulling out String helps.

Is heap fragmentation a thing on the photon? FreeRTOS Heap 4 should not be affected, but I don’t know which heap is in use here.

I only use String for MQTT messaging, and do no other heap allocation in functions or globally. I took out all of the Strings completely, so we’ll see if it’s going to stay up longer now. I expect @ScruffR is correct though, it makes sense, though why it was so wildly inconsistent in how long it took seems odd.

OK, so I’ve been running now for a few weeks just great. I’ve been pushing out the auto reboot a bit and have been rock stable. Thanks for the help, not using String with MQTT seems to have fixed it.

1 Like