Memory leak in Gen3 network handling

boron-3g
Tags: #<Tag:0x00007fe21f35ebf8>

#1

I isolated my issue in a simple program on a v1.2.1 Boron 3G doing only this:

  • Send a udp message on boot
  • Send a udp message every hour showing the free system memory

The messages look like this:

Jul 23 16:55:07 Booted. Device OS 1.2.1, Reset 140 0
Jul 23 17:54:42 uptime 1h, mem 64448
[…]
Jul 23 23:54:40 uptime 7h, mem 64448
Jul 24 00:54:41 uptime 8h, mem 58520
[…]
Jul 24 07:54:41 uptime 15h, mem 58520
Jul 24 08:54:40 uptime 16h, mem 53192
[…]
Jul 24 14:54:40 uptime 22h, mem 53192
Jul 24 15:54:40 uptime 23h, mem 47872
Jul 24 16:54:41 uptime 24h, mem 37288

I removed messages with no memory change. This continues until there is about ~15kb memory left. Then the device reboots or it freezes breathing cyan (no pings, led in other test stopped blinking from loop()). The memory always drops by (multiples of) ~5kb. I run this code several time on different devices, sometimes without problems for multiple days, sometimes crashing after 1 day.

Why I believe this is a bug in the deviceOS code:

  • Before I had this minimal code to reproduce this, I logged errors using the papertrail log handler and every time before the free memory dropped again, there was this error message:

    [system] ERROR: Failed to load session data from persistent storage

    There was no memory drop without this message and no message without memory drop.

  • I see this message at every boot (which is understandable). Getting this message during normal operation looks like temporary network problems, which can happen, but must not leak memory.

  • Someone reported the same behavior, memory dropping by the same ~5kb when doing things with the modem: Memory leak in Argon doing WiFi.off() / WiFi.on() and Boron doing Cellular.off() / Cellular.on()

Thank you in advance for any help on this. I don’t know where to continue searching for the source of this bug.

This is my code:

SYSTEM_THREAD(ENABLED);

UDP udp;
uint16_t udpPort = xxxxx;
const char *host = "xxxxx.com";

void setup() { }

void sendUdpMessage(String msg) {
  static int inited = 0;
  static IPAddress address;

  if(!address) {
    address = Cellular.resolve(host);
    if(!address) return;
  }

  if(!inited) {
    uint8_t udpBeginStatus = udp.begin(8888);
    if(udpBeginStatus != 0) inited = 1;
    else return;
  }

  String time = Time.format(Time.now(), TIME_FORMAT_ISO8601_FULL);
  String packet = String::format("<22>1 %s %s %s - - - %s", time.c_str(), System.deviceID().c_str(), "jay-0", msg.c_str());
  int ret = udp.sendPacket(packet, packet.length(), address, udpPort);

  if(ret < 1) inited = 0;
}

void loop() {
  /* Heartbeat */
  static system_tick_t lastHeartbeat = 0;
  static int uptime = 0;
  static int bootMessageSent = 0;

  if(millis() - lastHeartbeat >= 3600000) {
    lastHeartbeat = millis();
    uptime++;
    if(Particle.connected()) sendUdpMessage(String::format("uptime %ih, mem %lu", uptime, System.freeMemory()));
  }

  /* Boot message */
  if(!bootMessageSent && Particle.connected()) {
    sendUdpMessage(String::format("Booted. Device OS %s, Reset %i %lu", System.version().c_str(), System.resetReason(), System.resetReasonData()));
    bootMessageSent = 1;
  }
}

#2

Thanks for posting! I’m hoping that @avtolstoy or @cheong can attempt to replicate and get an issue filed in GitHub.


#3

Thank you!
Let me know if I can do any additional tests or help in other ways, I would really like to get this issue solved.


#4

Thanks for reporting the issue. This has been resolved in https://github.com/particle-iot/device-os/pull/1862 and the fix will be included in the upcoming 1.3.1-rc.1 release.


#5

I can confirm this fix solves my issue. The memory leak is gone.


#6

As I discovered later, the reason for triggering the memory leak bug was a hardware issue. I describe it here to help people in the future having the same problem:

I was powering the boron via the VUSB pin with a powerful (3A) supply (always stable 5V here, ensured with scope), but I didn’t increase the PMIC input current limit. The default is 500mA which is not enough for the Boron 2G/3Gs current peaks. More details: Correct power supply for Boron 2G/3G without LiPo battery

This lead to voltage drops on VSYS which caused the modem to be unresponsive. The DeviceOS then restarted the modem, which leaked memory, also observed in thread linked above and confirmed here:

As this occurs infrequently, depending on cellular signal conditions, it is hard to debug. To reproduce this issue, I connected an electronic load to VSYS and pulled current (~650mA) until the voltage dropped to 3V for about 300ms. This causes the modem to become unresponsive but is enough to let the Boron not restart. Logs (level WARN) look like this:

0000063889 [gsm0710muxer] ERROR: The other end has not replied to keep alives (TESTs) 5 times, considering muxed connection dead
0000070480 [app] INFO: network_status_connecting
0000070629 [comm.protocol] ERROR: Event loop error 3
0000070632 [system] WARN: Communication loop error, closing cloud socket
0000083129 [app] INFO: network_status_connected
0000083137 [system] ERROR: Failed to load session data from persistent storage

Correct power supply for Boron 2G/3G without LiPo battery
#7

This should also be resolved in 1.3.1-rc.1: https://github.com/particle-iot/device-os/pull/1846