Particle Tracker One freezes after period of time using custom firmware

Hello,

I have modified the Tracker Edge firmware to batch together 10 seconds worth of location data and publish using a CloudEvent publish as recommended in the documentation. I am running DeviceOS 6.3.0.

To ensure I am only collecting location data when the vehicle is moving, I have enabled Motion Sensitivity and set Maximum location update frequency to 10 in the Fleet Config. I use the regLocGenCallback to set a flag, so that my code in the main loop can do what it needs to do without blocking.

To make sure I'm not wasting data ops, I have commented out location_publish(); around line 1120 in tracker_location.cpp. This prevents the normal loc publish from happening. This is the only edit i've made outside of main.cpp

My code works well for a while. However, I'm finding that if I leave my device for a few hours, it freezes and becomes unreachable via the cloud console. The GPS lock LED remains on, and the cloud LED remains breathing, but the device doesn't publish or respond to anything.

I think I have a slow acting bug in my code, or I'm blocking an essential process. The only way to get the device back responsive it to open the case up and press the reset key on the PCB.

Does anyone have any ideas where I might have a bug? Everything compiles fine and runs well for upto a few hours before freezing. Code below:

 #include "Particle.h"
 #include "tracker_config.h"
 #include "tracker.h"
 #include "bmi160.h" // Add the IMU
 #include "tracker_cellular.h"

 SYSTEM_MODE(SEMI_AUTOMATIC);
 
 #if TRACKER_PRODUCT_NEEDED
 PRODUCT_ID(TRACKER_PRODUCT_ID);
 #endif // TRACKER_PRODUCT_NEEDED
 PRODUCT_VERSION(1);
 
 STARTUP(
     Tracker::startup();
 );
 
 SerialLogHandler logHandler(115200, LOG_LEVEL_TRACE, {
     { "app.gps.nmea", LOG_LEVEL_INFO },
     { "app.gps.ubx",  LOG_LEVEL_INFO },
     { "ncp.at", LOG_LEVEL_INFO },
     { "net.ppp.client", LOG_LEVEL_INFO },
 });


// Define a cache size
const int MAX_SAMPLES = 10;

// Define a cache index
int cacheIndex  = 0;
int cacheSize   = 1024;
int timer       = 0;

bool MOVING = FALSE;
bool CACHED = FALSE;

// Create a 1024 byte cache
char cache[1024];

// Set up a global JSON Writer
JSONBufferWriter json(cache, sizeof(cache) -1);

// Forward declarations
void locationCallback(JSONWriter &writer, LocationPoint &point, const void *context);
void cacheLocation(JSONBufferWriter &jsonWriter);

void setup() {
    Tracker::instance().location.regLocGenCallback(locationCallback);
    Tracker::instance().init();
}
 
void loop() {

    Tracker::instance().loop();

    // If we're moving
    if (MOVING) {

        // If we're caching for the first time
        if (cacheIndex == 0) {

            // Clear the cache
            memset(cache, 0, sizeof(cache));

            // Re-init the JSON Writer
            json = JSONBufferWriter(cache, sizeof(cache) -1);

            // Call the cache location function
            cacheLocation(json);

            // Increment the cache index
            cacheIndex++;

            // Set the timer to now
            timer = millis();

        } else if (millis() - timer >= 1000) {

            // Call the cache location function
            cacheLocation(json);

            // Increment the cache index
            cacheIndex++;

            // Set the timer to now
            timer = millis();

        }

    }


    // If the cache is full
    if (cacheIndex >= MAX_SAMPLES) {

        CloudEvent event;

        event.clear();
        event.name("loc-cache");
        event.data(cache);

        Particle.publish(event);

        // Reset the cache index
        cacheIndex = 0;

        // Reset the flags
        MOVING = FALSE;
        CACHED = FALSE;

    }

}
 
 // New function
 void locationCallback(JSONWriter &writer, LocationPoint &point, const void *context) {
 
    // Set the moving flag true
    MOVING = TRUE;
     
 }

 void cacheLocation(JSONBufferWriter &jsonWriter) {

    // Create placeholder location point object
    LocationPoint gpsLock;

    // Retreive current location
    Tracker::instance().locationService.getLocation(gpsLock);

    // Set GPS lock flag
    bool locked = gpsLock.locked;

    // If this is the first entry into the cache
    if (cacheIndex == 0) { 

        // Create main object for entire publish
        jsonWriter.beginObject(); 

        // Create status object
        jsonWriter.name("health").beginObject();

        // Add time
        jsonWriter.name("time").value((unsigned int) gpsLock.epochTime);

        // Add cellular signal strength
        CellularSignal signal;
        if(!TrackerCellular::instance().getSignal(signal)) {

            jsonWriter.name("cell").value(signal.getStrength(), 1);

        }

        // Add battery charge state
        bool batFlag = FALSE;
        int batState = System.batteryState();
        if ( batState == BATTERY_STATE_NOT_CHARGING    ) { jsonWriter.name("chrg").value(1); batFlag = TRUE; } else
        if ( batState == BATTERY_STATE_CHARGING        ) { jsonWriter.name("chrg").value(2); batFlag = TRUE; } else
        if ( batState == BATTERY_STATE_DISCHARGING     ) { jsonWriter.name("chrg").value(3); batFlag = TRUE; } else
        if ( batState == BATTERY_STATE_CHARGED         ) { jsonWriter.name("chrg").value(4); batFlag = TRUE; };

        // If battery is in a good state to continue
        if (batFlag == TRUE) {
      
            // Get the battery charge %
            float bat = System.batteryCharge();

            // Add battery charge %
            if(bat >= 0 && bat <= 100) { jsonWriter.name("batt").value((int) bat); };

        }
        
        // Add SoM temperature
        jsonWriter.name("temp").value(get_temperature(), 1);

        // Close the status object
        jsonWriter.endObject();


    };

    if (locked) {
        
        // We have GPS lock, so set CACHED flag to TRUE
        CACHED = TRUE;

        // Set the name of this location sample to the index
        String name = "";
        name += cacheIndex;

        // Start a new object for this location sample
        jsonWriter.name(name).beginObject();

        // Write the GPS coordinates
        jsonWriter.name("lat").value(gpsLock.latitude, 5);
        jsonWriter.name("lon").value(gpsLock.longitude, 5);
        jsonWriter.name("alt").value((int) gpsLock.altitude);

        // Write the speed and heading
        jsonWriter.name("h").value((unsigned int) gpsLock.heading);
        jsonWriter.name("s").value(gpsLock.speed, 2);

        // Get the accelerometer data
        Bmi160Accelerometer data;
        int ret = BMI160.getAccelerometer(data);
        
        // Write the accelerometer data
        if (ret == SYSTEM_ERROR_NONE) {
            jsonWriter.name("x").value(data.x,2);
            jsonWriter.name("y").value(data.y,2);
            jsonWriter.name("z").value(data.z,2);
        }

        // End the object
        jsonWriter.endObject();

    }

    // If we've collected enough samples, close the JSON object
    if (cacheIndex == (MAX_SAMPLES-1)) { jsonWriter.endObject(); };

 }

I don't think CloudEvent is intended to be used that way, as a stack allocated variable. The new publish API is asynchronous, and the intention is that you allocate CloudEvent as a global variable (or member variable), and use that to see when the publish completes. There are a bunch of examples on this page.

1 Like

Thank you @rickkas7, such a simple mistake to make. I've made this global and tested for a day and a half; looks like the bug has gone.

Thanks for your help.

1 Like

Hi again. Unfortunately, having run the devices for a couple of days and thinking the problem was solved, the devices froze again. This time I had deployed the code, so now I have frozen devices in the field which is a pain, but I'll find some way of fixing that later.

For now, I need to find a fix for my code. I've followed a bunch of suggestions to fix the code, too many to list, and broken things out into small functions so I can isolate the problem. I have also added a System.freeMemory() value to my publishes, which confirm I have a memory leak. I have been plotting the free memory every time I try a code fix. It looks like it's proportional to the size of the message I am publishing:

I have refactored my code here. I've tried hard to isolate what might be causing the leak, but I'm at a loss. I'm thinking it must be related to how I'm using CloudEvent publish. I've also noted that the device becomes unresponsive before the memory totally rungs out, meaning my out of memory rest code which evaluates at the start of every loop never triggers. Code below.

#include "Particle.h"
#include "tracker_config.h"
#include "tracker.h"
#include "bmi160.h" // Add the IMU
#include "tracker_cellular.h"

SYSTEM_MODE(SEMI_AUTOMATIC);

#if TRACKER_PRODUCT_NEEDED
PRODUCT_ID(TRACKER_PRODUCT_ID);
#endif // TRACKER_PRODUCT_NEEDED
PRODUCT_VERSION(3);

STARTUP( Tracker::startup(); );

SerialLogHandler logHandler(115200, LOG_LEVEL_TRACE, {
    { "app.gps.nmea", LOG_LEVEL_INFO },
    { "app.gps.ubx",  LOG_LEVEL_INFO },
    { "ncp.at", LOG_LEVEL_INFO },
    { "net.ppp.client", LOG_LEVEL_INFO },
});

// Define a cache size
const int   MAX_SAMPLES     = 10;

// Define a cache index
int         cacheIteration  = 0;
char        cache[1024];

// Memory handler variable
int         outOfMemory = -1;

// Define our time variables
uint64_t    timer   = 0;
uint64_t    now     = 0;

int delta = 0;

// Define flags
bool MOVING         = FALSE;
bool CACHED         = FALSE;
bool PUBLISH        = FALSE;
bool RECENT_PUBLISH = FALSE;

// TEMP
int ticker = 0;

// Define our globals
CloudEvent          event;
LocationPoint       gpsLock;
JSONBufferWriter    json(cache, sizeof(cache) -1);
CellularSignal      signal;
Bmi160Accelerometer acclData;


// Forward declarations
void outOfMemoryHandler(system_event_t event, int param);
void locationCallback(JSONWriter &writer, LocationPoint &point, const void *context);
void cacheLocation(JSONBufferWriter &jsonWriter);
void startCache(JSONBufferWriter &jsonWriter);
void endCache(JSONBufferWriter &jsonWriter);
void addHealthData(JSONBufferWriter &jsonWriter);
void recordTime(JSONBufferWriter &jsonWriter);
void recordCell(JSONBufferWriter &jsonWriter);
void recordBatt(JSONBufferWriter &jsonWriter);
void recordTemp(JSONBufferWriter &jsonWriter);
void recordMem(JSONBufferWriter &jsonWriter);
void addLocationData(JSONBufferWriter &jsonWriter);
void publishCache();
void clearEvent();
void clearCache();

void setup() {

    System.on(out_of_memory, outOfMemoryHandler);

    Tracker::instance().init();
    Tracker::instance().location.regLocGenCallback(locationCallback);

}
 
void loop() {

    Tracker::instance().loop();

    // MEMORY SAFETY
    // ==============================

    if (outOfMemory >= 0) { System.reset(); }               // If we're out of memory, trigger an auto reset
    if (System.millis() >= 86400000) { System.reset(); }    // If 24 hours have elapsed, trigger an auto reset


    // PROCESS PUBLISH REQUESTS
    // ==============================

    // Block until we've published. Note this doesn't wait for ACK, only waits for send.
    // We're doing it this way because we'll mess up the cache if we move on and over write it before the original is sent.
    while (PUBLISH) {

        // If there is an event still sending, then we're backed up.
        // We need to stop, wait, and clear before we can continue.
        if (event.isSending()) {
            
            // Wait here for it to send
            waitForNot(event.isSending, 60000);

            // Clear the event out of caution
            clearEvent();

        }

        // If we're connected and not already sending something then...
        if (Particle.connected() && !event.isSending()) { 
            
            if (RECENT_PUBLISH) { clearEvent(); } // If we have pending feedback, clear to continue

            publishCache(); // Publish!

            clearCache();   // Clear cache!
        
        }

    };


    // CHECK RECENT PUBLISH PROGRESS
    // ==============================

    if (RECENT_PUBLISH) {

        // If the event has published OK, we're done with it! Clear!
        // If the event has not published OK, we'll scrap it! Clear!
        if (event.isSent() || !event.isOk()) { clearEvent(); }
        
    }


    // CACHE NEXT LOCATION SAMPLE
    // ==============================

    // If we're moving
    if (MOVING) {
        
        now = System.millis();  // Record the current time
        
        // If we're caching for the first time, or 1 sec has passed since last cache
        if ((cacheIteration == 0) || (now - timer >= 1000)) {

            cacheLocation(json);    // Call the cache location function
          
            timer = now;            // Set the timer to now

        }

    }

}

void outOfMemoryHandler(system_event_t event, int param) {
    outOfMemory = param;
}

void locationCallback(JSONWriter &writer, LocationPoint &point, const void *context) {

    // Set the moving flag true
    MOVING = TRUE;

}

void cacheLocation(JSONBufferWriter &jsonWriter) {

    // First call!
    // ============================
    if (cacheIteration == 0) { 

        startCache(jsonWriter);         // Open the cache object
        addHealthData(jsonWriter);      // Add health data
    
    }; 

    // Mid call!
    // ================================

    if (cacheIteration < MAX_SAMPLES) {

        addLocationData(jsonWriter);    // Add a location sample (will skip if GPS has no lock)
        cacheIteration++;               // Increment the counter by one

    }

    // Final call!
    // ================================

    if (cacheIteration == MAX_SAMPLES) {
        
        // Close the cache object
        endCache(jsonWriter);

        // If we recorded some data then set the publish flag true, else clear the cache
        if (CACHED)  { PUBLISH = TRUE; } else { clearCache(); };

        // We're come to the end of recording due to motion. Reset the flag.
        MOVING = FALSE;

    }

}

void startCache(JSONBufferWriter &jsonWriter) {

    // Create main object for entire publish
    jsonWriter.beginObject(); 

}

void endCache(JSONBufferWriter &jsonWriter) {

    // End main object for entire publish
    jsonWriter.endObject(); 
    
    // Add null teminator - https://docs.particle.io/reference/device-os/api/json/jsonbufferwriter/
    jsonWriter.buffer()[std::min(jsonWriter.bufferSize(), jsonWriter.dataSize())] = 0;

}

void addHealthData(JSONBufferWriter &jsonWriter) {

        // Create status object
        jsonWriter.name("health").beginObject();

        // Record health data into object
        recordTime(jsonWriter);
        recordCell(jsonWriter);
        recordBatt(jsonWriter);
        recordTemp(jsonWriter);
        recordMem(jsonWriter);

        // Close the status object
        jsonWriter.endObject();

}

void recordTime(JSONBufferWriter &jsonWriter) {

    // Retreive current location
    Tracker::instance().locationService.getLocation(gpsLock);

    // Add time
    jsonWriter.name("time").value((unsigned int) gpsLock.epochTime);

}

void recordCell(JSONBufferWriter &jsonWriter) {

    if(!TrackerCellular::instance().getSignal(signal)) {

        jsonWriter.name("cell").value(signal.getStrength(), 1);

    }

}

void recordBatt(JSONBufferWriter &jsonWriter) {

    // Add battery charge state
    bool batFlag = FALSE;
    int batState = System.batteryState();
    if ( batState == BATTERY_STATE_NOT_CHARGING    ) { jsonWriter.name("chrg").value(1); batFlag = TRUE; } else
    if ( batState == BATTERY_STATE_CHARGING        ) { jsonWriter.name("chrg").value(2); batFlag = TRUE; } else
    if ( batState == BATTERY_STATE_DISCHARGING     ) { jsonWriter.name("chrg").value(3); batFlag = TRUE; } else
    if ( batState == BATTERY_STATE_CHARGED         ) { jsonWriter.name("chrg").value(4); batFlag = TRUE; };

    // If battery is in a good state to continue
    if (batFlag == TRUE) {
    
        // Get the battery charge %
        float bat = System.batteryCharge();

        // Add battery charge %
        if(bat >= 0 && bat <= 100) { jsonWriter.name("batt").value((int) bat); };

    }

}

void recordTemp(JSONBufferWriter &jsonWriter) {

    // Add SoM temperature
    jsonWriter.name("temp").value(get_temperature(), 1);

}

void recordMem(JSONBufferWriter &jsonWriter) {

    // Add free memory
    jsonWriter.name("mem").value((unsigned int) System.freeMemory());
    //jsonWriter.name("mem").value((int) delta); // Send the memory delta

}

void addLocationData(JSONBufferWriter &jsonWriter) {

    // Retreive current location
    Tracker::instance().locationService.getLocation(gpsLock);

    // Set GPS lock flag
    int locked = gpsLock.locked;

    if (locked == 1) {
        
        if (jsonWriter.dataSize() >= (sizeof(cache) - 100)) {
        
            /*

            If we have less than 100 bytes less, we're at risk of overflow. Better to break than be sorry.
            Maxed out message below is 932B. A max out sample is 94B:
            
            {
            "health":{"time":1742485755,"cell":999.9,"chrg":4,"batt":999,"temp":-99.9},
            "0":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "1":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "2":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "3":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "4":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "5":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "6":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "7":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "8":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99},
            "9":{"lat":-999.99999,"lon":-999.99999,"alt":9999,"h":360,"s":-99.99,"x":-99.99,"y":-99.99,"z":-99.99}
            }

            */

            // Increment cache index before break
            cacheIteration++;
            
            // Break to prevent overflow
            return;

        }

        // We have GPS lock, so set CACHED flag to TRUE
        CACHED = TRUE;

        // Set the name of this location sample to the index
        char name = '0' + cacheIteration;

        // Start a new object for this location sample
        jsonWriter.name(&name, 1).beginObject();

        // Write the GPS coordinates
        jsonWriter.name("lat").value(gpsLock.latitude, 5);
        jsonWriter.name("lon").value(gpsLock.longitude, 5);
        jsonWriter.name("alt").value((int) gpsLock.altitude);

        // Write the speed and heading
        jsonWriter.name("h").value((unsigned int) gpsLock.heading);
        jsonWriter.name("s").value(gpsLock.speed, 2);

        // If we get accelerometer data back
        if (BMI160.getAccelerometer(acclData) == SYSTEM_ERROR_NONE) {
            jsonWriter.name("x").value(acclData.x,2);
            jsonWriter.name("y").value(acclData.y,2);
            jsonWriter.name("z").value(acclData.z,2);
        }

        // End the object
        jsonWriter.endObject();

    }

}

void publishCache() {

    // Name the event
    event.name("loc-cache");

    // Add the cache data to the event
    event.data(cache);
    
    //delta = System.freeMemory();

    // Publish the event
    Particle.publish(event);

    // Reset flags
    RECENT_PUBLISH  = TRUE;     // Set recent publish to true!
    PUBLISH         = FALSE;    // Clear publish flag to continue!

}

void clearEvent() {

    // Clear the event so we can start afresh
    event.clear();

    // Reset the clear event flag
    RECENT_PUBLISH = FALSE;

}

void clearCache() {

    // Clear the cache
    //memset(cache, 0, sizeof(cache));

    // Re-init the JSON Writer
    //json = JSONBufferWriter(cache, sizeof(cache) -1);

    // Json buffer fix?
    //====================================================

    // Explicitly destroy json before reinitializing
    json.~JSONBufferWriter();

    memset(cache, 0 , sizeof(cache));

    // Reconstruct json in place using placement new
    new (&json) JSONBufferWriter(cache, sizeof(cache)-1);

    //====================================================

    // Reset the cache index
    cacheIteration = 0;

    // Reset the flags
    CACHED = FALSE;

}

What I would do declare the JSON buffer writer as a pointer to a heap-allocated object:

JSONBufferWriter *json = nullptr;

When you reinit the buffer writer, do it like this:

  // Re-init the JSON Writer
  if (json) {
     delete json;
     json = nullptr;
  }
  json = new JSONBufferWriter(cache, sizeof(cache) -1);

And you'll need to change any json. to json-> since it's a pointer instead of an object reference.

I'm pretty sure the leak is related to the buffer writer, which isn't really intended to be reused like that.

Hi @rickkas7, thanks for your help. I changed the code to work as you described, but haven't seen a difference in memory leak, please see graph below:

I agree I think the leak is to do with buffer writer. Am I doing something fundamentally wrong? I've tried to follow the guidelines. The main difference between my code and the examples are that I need the buffer to be persistent so I can write to it across multiple loops.

Should I maybe be using something other than CloudEvent to publish?

Also, not sure if related, but the first publish always has the wrong GPS time. It says it's some time in 2018.

It's not obvious what is leaking memory. I'd try removing things until it stops leaking. For example, turn off the health data recording and see if that changes anything.

I've tried turning off the location part of the publish. This reduces the leak significantly but not completely, which leads me to believe the leak is related to the size of the message. I can't turn the health data publish off because that's how I'm reading the free memory. I'm using Tracker Ones rather than eval board, so I don't have access to debug.

Hi, to prevent this, I recommend looking into using the hardware watchdog.

https://docs.particle.io/reference/device-os/api/watchdog-hardware/

1 Like

Thanks @gusgonnet, I will absolutely look at implementing this.

@rickkas7 To figure out the location of the memory leak, I've gone back to basics and commented out all the code in my loop, then added the non-blocking publish code from Typed and extended publish | Reference | Particle

Running the example resulted in no leak. I then changed it to use my jsonBufferWriter object instead of set object, and this again showed no leak.

When I start calling some of my other functions to add data to the json, I start getting small but recordable leaks. My code doesn't do a lot, just gets location data from the location service, and adds the values to the message.

This makes me think that the way I'm interacting with location service is wrong. I currently use:

// Defined at the top of the code as a global
LocationPoint       gpsLock;

...

// Retreive current location
Tracker::instance().locationService.getLocation(gpsLock);
 
...

// Write the GPS coordinates
json->name("lat").value(gpsLock.latitude, 5);
json->name("lon").value(gpsLock.longitude, 5);
json->name("alt").value((int) gpsLock.altitude);

I do this every time I add health data or a location sample, so a small leak like I'm observing now for a single call would add up to a big leak once I'm calling this 10x per publish. Am I accessing location data incorrectly?

Thank you for isolating it. I believe that is the problem.

Since you use the gpsLock object immediately to generate the JSON, it does not need to live after the function returns, so it can be a local variable.

The problem is that getLocation() appends to a Vector of sources used for GNSS, but since you reuse the object, it's never cleared. The vector of sources just gets longer and longer each time you call it.

By making it a stack allocated object within the function it will start out fresh for every call.

2 Likes

Thank you so much, the leak has now stopped. I've put all my code back in place and everything is working as expected, and free memory stays completely stable.

Thanks a lot for your help and patience, I'm not a programmer by trade (I'm a vehicle design engineer) so I'm a bit out of my depth with all this but with a solution that now works this project is in a good place.

Cheers!

4 Likes