Memory Management

Ever since Device OS v1.0.1 I have had a insufficient memory issue (free memory). I know the stack was reduce by 1K in one of the changes but having spent a while using Retained RAM and reducing RAM usage I am a bit stuck to avoid the issue that arrive when there is insufficient memory - typically this is when the WAP goes out of range/switched off and the device will be hunting and retrying connection.

Any advice on how to reduce program memory usage and tools/techniques to help refactor the code and identify points of more memory use? Should I employ more static inline functions or just inline functions to avoid called function stack usage?

The tips that I have for reducing flash, RAM, and stack usage are here:

https://docs.particle.io/support/particle-devices-faq/code-size-tips/

1 Like

Could these two observations be related?

I did see this post. Argon rather than a Photon. I checked the memory usage - after restart 39472 (it used to be 40480 with 0.8.0-RC.12) then after a sleep wake cycle period it drops to 38128. I am normal (stop) sleeping with a photon - pretty certain it didn’t drop this much before V1.0.1

Thanks - I think I asked about this a while back and you produced this excellent tutorial! I have done pretty much all I can. Flash in this application isn’t the limiting factor, it is RAM (stack). I guess another look at using retained RAM to ensure I am fully using that 3K extra. What I was also asking for is any analysis tools to help in refactoring to work out the depth of function calls and therefore whether using static inline void function() would save a bit more of the precious 6K stack - which I am guess is the issue and not the heap?

Every time deviceOS is updated, it uses more RAM (in general). It’s the cost of extra features. If you don’t need the features or specific stability features and are RAM constrained, you may want to avoid upgrading. I’ve stayed at v0.6.4/v0.6.3 for that and general stability reasons.

Try compiling using the particle workbench’s Local Compile tools, grab the *.elf file from the “target/” directory, and upload it to this elf file analyzer to see what is using your statically / globally declared RAM.

For dynamically used RAM, identify all heap memory usage (malloc, new operator, some Strings) and do some worst case calculations. All other dynamic memory usage should usually be inside of your stack which is pre-allocated as far as the free memory calculations are concerned (IIRC, might be wrong), so as long as you aren’t getting Stack Overflow issues that shouldn’t be doing anything bad to you.

And of course follow all of those general guidelines above.

You say you are running out of “free memory”, but you are also mentioning the limitations of the stack. Which one of those is actually your problem? What specific errors or reset codes do you get? 38KB of memory is plenty of free memory for anything I can imagine in normal usage. If you have that much free memory but are running out of stack, just move some variables to the global / static space, or use the heap.

@justicefreed_amper Thanks for the .elf file analyser - that was the tool I had in mind - I will try that.

I am glad for you that staying on 0.6.4 is an option. WPA Enterprise support in 0.7.0 was the must have feature and then the wifi stability issues meant moving to 0.8.0-RC.x and the next GA release has been 1.0.1 for Photon.

The reason I am looking into increasing free memory is that experience has shown that odd issues seem to disappear once this is increased.

Your question about where is the problem (stack or heap) is spot on. I don’t know for sure is the truthful answer. The problems are manifested as an application hang which then gets ‘caught’ by the application watchdog which then collects some data (like freememory) and restarts the application. Freememory at this point is low. I have found that circa 23000 is a level below which odd things happen. This only happens when the stored wifi credentials are not available on a WAP (say the WAP has been turned off or the device moved). If these devices were only connected to sensors the restart would not be a problem, unfortunately they have a screen and for a user this hang generally happens when they are interacting with the device, so SPI bus activity to TFT screen and reads from SD card memory - so not good.

Hmm, something hanging doesn’t normally suggest memory issues, though anything is possible. Has the device entered listening mode by chance? I’ve noticed that takes a lot of RAM.

Try and identify where your program is executing when it hangs. I used to use a variable that I would update a number that represented that unique position in program execution. I then log that out from the watchdog and then pass it in as the system reset reason data, and read that value on bootup to know where it hung. That should tell you a lot about what might be suspect.

And yeah, I wanted to offer WPA Enterprise as well, but given the stability issues and the need to use a TLS implementation of MQTT (aka lots of RAM needed) means no dice until the Argon is ready.

This is exactly what I have done. The problem I have found is in putting sufficient position updates, go broad then narrow down, then find the hang is happening somewhere else!

Listening or rather WiFi setup (using SoftAP) requires free memory of 38000 before starting otherwise it "hangs" in my experience.

As in the spot where it hangs changes? Are you multi-threaded?

Yes - SYSTEM_THREAD(ENABLED) but not using or creating threads myself.

I have suspected that Particle.connected() sometimes returns a false result, saying a device is cloud connected when it is not. Any calls to Particle.publish() will block if not cloud connected so this is important to be 100%.

I am using the GoogleMaps locator integration and I have suspected that the Publish from there might be a cause of the hang for the reason above. The Publish happens a variable amount of time after the initial call from setup() and I see this. I have now tried putting WiFi.ready() around the Particle.connected() check - so far so good - however this is does seem to be a bit random.

Ahh, yeah that makes sense as a possibility. The other thing you can do to (somewhat) force double check your connectivity is to ping your network’s RSSI. I do this to catch an otherwise uncaught bad connection state I discovered with the Electron (may also be applicable to Photon, but you’ll have to double check the source code for what values are possible to return).

Just a note, at least for Electrons, this is actually making a call to the modem itself, so you don’t want to constantly spam this. Thus, I just check it regularly. In your case, perhaps you can just check right before you publish, if you aren’t publishing very frequently.

#if Wiring_Cellular
    inline void particle_startup() { cellular_credentials_set("hologram", "", "", NULL); }
    // Cell indicator Config
    extern CellularSignal  sig;
    extern int             strength;
#endif

int rssi = 0;  // init as "network not ready"
uint32_t signal_strength_last_time = 0;
const uint32_t  signal_strength_update_period   = 500;  //ms

void update_signal_strength() {
        #if Wiring_Cellular
        sig = Cellular.RSSI();
        bool signal_ok = true;
        if (rssi >= 0) signal_ok = false;
        rssi = sig.rssi; // rssi is 0 if network not ready, 1 is error getting value (never returned), 2 is if modem returns 0 rssi aka really bad service
        if (rssi >= 0 && Cellular.ready() && !signal_ok) {
            // this is a problematic state where the system firmware doesn't
            //    realize that the modem is disconnected, just going to restart
            Resets.reset_now(RESET_REASON_BAD_CONNECTION_STATE, true);
        }
        #elif Wiring_WiFi
        rssi = WiFi.RSSI();
        if (rssi >= 0 && WiFi.ready() == true) {
            // this is a problematic state where the system firmware doesn't
            //    realize that the modem is disconnected, just going to restart
            Resets.reset_now(RESET_REASON_BAD_CONNECTION_STATE, true);

        }
        #endif

        signal_strength_last_time = millis();
    }

// somewhere in loop()

        if ((millis() - signal_strength_last_time) > signal_strength_update_period) update_signal_strength();

I also in my case wait to ensure that this bad state is present for two checks in a row, to avoid any transient state that recovers immediately. You can replicate this by delaying and checking twice for a one-shot check.

Of course, test in your use case. I haven’t super thoroughly vetted this for Photons (it should be safe, just not sure if this issue exists with Photons or not).

Edit: especially if you have blocking code in your loop, it may be worth calling the below before checking if Particle.connected() == true if you can afford the processing time:

delay(1000);   // yield to system thread
Particle.process();    // fulfill any application thread cloud stuff that's waiting
if (Particle.connected()) {
   ...
}

After much experimentation it appears as though the cause of the application hang was due to use of the SPI bus at the same time that the system thread was trying to reconnect to a WAP that was not there. It would have been very difficult to pick through each of the TFT display and SDfat libraries to figure where there was a timing issue or something that left a busy flag set - so I tried reducing the speed of the SPI bus from FULL to HALF to see if this slower operation avoided the issue occurring. I have had this running on test for over a week now and it appears that it solves the problem. Disappointing that I can’t paint the TFT at full SPI bus speed but not noticeable.

Yeah with SYSTEM_THREAD(ENABLED); unfortunately the SPI bus shares resources with communication with the WiFi chip and there can be some blocking or undefined behavior. I found that when I ran the SPI portion of my operations within a SINGLE_THREADED_BLOCK() {} that it resolved my issue while allowing for full speed operation. Probably worth giving it a try if you can stomach the small delays in your other code.

see my post on the issue

2 Likes