TLDR: It’s a problem with overall RAM usage, MQTT-TLS requires minimum 35KB of available RAM to function normally during connection.
Yeah good call, it does seem like that is the case, here are the logs for memory use across a bunch of scenarios. Not about SYSTEM_THREAD, just about the RAM usage.
All with char* certificates (not const char*), so stored in RAM:
SYSTEM_THREAD(DISABLED)
:
SystemFreeMemory before client.enableTls: 59672
SystemFreeMemory after client.enableTls: 38032
SystemFreeMemory before client.connect: 37976
SystemFreeMemory after client.connect: 31296
WARNING | 18295 | 1542296283 | MQTT Successfully Reconnected
SYSTEM_THREAD(ENABLED)
, but with a bunch of buffers commented out, no threads created:
SystemFreeMemory before client.enableTls: 54920
SystemFreeMemory after client.enableTls: 33272
SystemFreeMemory before client.connect: 33224
SystemFreeMemory after client.connect: 26512
WARNING | 18295 | 1542296032 | MQTT Successfully Reconnected
SYSTEM_THREAD(ENABLED)
, with all my normal buffers, no threads created:
SystemFreeMemory before client.enableTls: 48664
SystemFreeMemory after client.enableTls: 27024
SystemFreeMemory before client.connect: 26976
SystemFreeMemory after client.connect: 20408
WARNING | 18295 | 1542296521 | MQTT Successfully Reconnected
SYSTEM_THREAD(ENABLED)
, with all my normal buffers, 3 threads created:
SystemFreeMemory before client.enableTls: 38096
SystemFreeMemory after client.enableTls: 16456
SystemFreeMemory before client.connect: 16392
SystemFreeMemory after client.connect: 36672
ERROR | 15674 | 1542295991 | MQTT Reconnection Unsuccessful
Digging into the library, here is the memory for different steps of the (unsuccessful) ssl handshake:
DEBUG | 3168 | 1542296885 | Connecting to network...
DEBUG | 4168 | 1542296886 | Connected to network
DEBUG | 5168 | 1542296888 | Connecting to Particle Cloud...
DEBUG | 6168 | 1542296889 | Connected to Particle Cloud
DEBUG | 10070 | 1542296893 | Initializing SD Card...
DEBUG | 10090 | 1542296893 | SD Card Detected
DEBUG | 10090 | 1542296893 | Starting SDcardProcessingThread...
DEBUG | 10091 | 1542296893 | Starting CANbusProcessing input thread...
DEBUG | 10091 | 1542296893 | Beginning CAN Bus...
CAN | Beginning NodeID Scan...
RESET | 10093 | 1542296893 | Last reset with system code: (0) Information is not available
SystemFreeMemory before client.enableTls: 38088
SystemFreeMemory after client.enableTls: 16448
DEBUG | 10268 | 1542296893 | Connection regained, attempting to reconnect to MQTT
SystemFreeMemory before client.connect: 16384
CAN | NodeID Scan Complete. Discovered NodeIDs:
CAN | No Nodes Discovered
SystemFreeMemory for ssl->state 0: 16320
SystemFreeMemory for ssl->state 1: 16320
SystemFreeMemory for ssl->state 2: 16320
...(repeat x12)...
SystemFreeMemory for ssl->state 2: 16320
SystemFreeMemory for ssl->state 3: 16320
SystemFreeMemory for ssl->state 4: 6280
SystemFreeMemory for ssl->state 5: 6280
SystemFreeMemory for ssl->state 6: 6280
SystemFreeMemory for ssl->state 7: 6280
SystemFreeMemory for ssl->state 8: 6280
SystemFreeMemory for ssl->state 9: 5744
SystemFreeMemory for ssl->state 9: 9192
SystemFreeMemory after client.connect: 36664
WARNING | 15350 | 1542296898 | TIMING | checkConnection: client.connect() took 5081ms
ERROR | 15351 | 1542296898 | MQTT Reconnection Unsuccessful
DEBUG | 15354 | 1542296898 | System free memory is: 36632
DEBUG | 15381 | 1542296898 | Connection regained, attempting to reconnect to MQTT
SystemFreeMemory before client.connect: 36664
Here is the freeMemory output for a (successful) ssl handshake:
DEBUG | 3168 | 1542297193 | Connecting to network...
DEBUG | 4168 | 1542297194 | Connected to network
DEBUG | 5168 | 1542297195 | Connecting to Particle Cloud...
DEBUG | 7168 | 1542297197 | Connected to Particle Cloud
RESET | 10070 | 1542297200 | Last reset with system code: (0) Information is not available
WARNING | 10071 | 1542297200 | SD Card not detected or not functioning, using RAM buffer only
SystemFreeMemory before client.enableTls: 48784
SystemFreeMemory after client.enableTls: 27144
DEBUG | 10237 | 1542297200 | Connection regained, attempting to reconnect to MQTT
SystemFreeMemory before client.connect: 27080
SystemFreeMemory for ssl->state 0: 27024
SystemFreeMemory for ssl->state 1: 27024
SystemFreeMemory for ssl->state 2: 27024
...(repeat x12)...
SystemFreeMemory for ssl->state 2: 27024
SystemFreeMemory for ssl->state 3: 27024
SystemFreeMemory for ssl->state 4: 17000
SystemFreeMemory for ssl->state 5: 17000
SystemFreeMemory for ssl->state 6: 17000
SystemFreeMemory for ssl->state 7: 17000
SystemFreeMemory for ssl->state 8: 17000
SystemFreeMemory for ssl->state 9: 16464
SystemFreeMemory for ssl->state 10: 13472
SystemFreeMemory for ssl->state 11: 13472
SystemFreeMemory for ssl->state 12: 13472
...(repeat x12)...
SystemFreeMemory for ssl->state 12: 13472
SystemFreeMemory for ssl->state 13: 13472
SystemFreeMemory for ssl->state 14: 13472
SystemFreeMemory for ssl->state 15: 13472
SystemFreeMemory after client.connect: 20432
WARNING | 18421 | 1542297208 | TIMING | checkConnection: client.connect() took 8183ms
WARNING | 18432 | 1542297208 | MQTT Successfully Reconnected
DEBUG | 18435 | 1542297208 | System free memory is: 20392
WARNING | 18696 | 1542297209 | MQTT reconnected, was disconnected for: 8.384000 seconds; signal RSSI: -75dB; Qual: 19/49
DEBUG | 18697 | 1542297209 | Beginning one-time initialization for MQTT network functionality
DEBUG | 19147 | 1542297209 | Going to OPERATIONAL state...
DEBUG | 19147 | 1542297209 | New publishing state: 1 from state: 0
DEBUG | 33645 | 1542297224 | System free memory is: 20392
So essentially from before TLS is enabled to a successful connection you need a minimum of 35KB free RAM, plus a margin of at least 10KB extra, at the very minimum. So looks like I should be shooting for 45KB free RAM minimum prior to starting MQTT. Without TLS it only takes up around 7KB of runtime RAM (for my max packet size). That’s a ton of extra memory, but I suppose it goes with the territory for TLS. Hopefully this helps other folks plan when using this library.
Looks like I’ll just have to really tighten up my RAM usage in other parts of my code in order to have confidence in being able to consistently connect.