Hard fault when using MQTT, tracked to a TCPClient::connected() call

I have a bit of a head scratcher. I would think it’s a memory issue on my end, except that everything I do seems to point toward the TCPClient lib.

My code starts hard faulting after several loops, always at the moment that the MQTT lib checks that the TCPClient is connected. I’ve turned off compiler optimizations for the relevant functions, so it’s very clearly happening in the call to TCPClient::connected(). And of course that’s the point where it disappears into the bowels of the OS

0.7.0 or 0.8.0-rc.11 doesn’t make a difference, it still fails at the same spot.

However, if I start moving memory around, e.g. putting local variables into the global stack, then by magic things start working again. This very strongly points toward a memory issue, but I can’t for the life of me figure it out.

Has anyone seen anything similar? I know I’ve read several reports of TCPClient instability, but it’s all too convenient to blame an outside lib. I haven’t installed local compiler tools, is there a way to download the elf as part of the Particle CLI compile command?

I have used both mqtt as well as mqtt-tls with rock solid connections for months at a time (interrupted only by me doing resets and other goofy stuff) running on 0.6.3. Seems to me that TCPclient is ok at least in 0.6.3

0.7 uses a lot more RAM (0.8 improves a bit but not by much) and reduces the particle system to be able to run not much more than (imho) thermometer appslications hence, 0.6.3 for me only.

Your issue does feel like you are stepping on some variables - do you know how much free ram you’ve got left? Running out of stack space perhaps? Can you run it on 0.6.3?

Hadn’t even thought of downgrading, I should give that a look.

70kB of free RAM, although the FreeRTOS application task itself might be running low (6144 bytes statically allocated at boot time). I don’t remember seeing an API call to get the free memory, and xPortGetMinimumEverFreeHeapSize() isn’t directly available so I’m a little blind on that front.

How about this?

That’s unfortunately for the system heap, not the application task heap. If a FreeRTOS heap is exhausted, it will trigger a hard fault in much the same way as if the system heap is exhausted.

I wasn’t aware that there was a distinction for the heap.
I know of dedicated stacks but only one heap.

Back in my Tau Labs/OpenPilot days, we were so short on task memory we would have major discussions whenever anyone of us needed an extra byte or two of stack space for one of the FreeRTOS tasks. (E.g. https://github.com/TauLabs/TauLabs/commit/0e392be46ac0aeaa0492cfde53c79abadf3b206a)

The documentation for FreeRTOS heap memory management is here, https://docs.aws.amazon.com/freertos-kernel/latest/dg/heap-management.html, although reading through it I see FreeRTOS has clearly advanced since v7. I’d have to dive into Particle’s OS to understand how it configures FreeRTOS, it’s a very flexible system and what we did for the autopilot does’t necessarily map one-to-one to Particle.

@kubark42 Did you ever solve this problem? I’m having exactly the same issue. Something memory-related, if I move things around it starts working, otherwise constant hard faults. This with deviceOS 2.0.0-rc2 (also tried 1.5.x) and latest MQTT libs (actually running under UbidotsMQTT)
Frustrating as I can’t seem to get deep enough to find the fix.

Never, no. MQTT v5 is far more interesting so I never dove into Particle’s MQTT v4 stuff anymore.