Firmware on Photon stuck / freezes with flashing cyan light

My firmware, running on a Photon, after running for anywhere from an hour to 10 hours seems to get stuck / freeze with a blinking cyan light. I’m running the firmware in SYSTEM_MODE(SEMI_AUTOMATIC). I suspect it may be due to a poor internet connection but it does not seem to recover by moving the photon into an area with better WiFI. I have to reset to recover as it appears indefinitely frozen. Is this sort of behavior typical for any sort of situation? If so, what sort of remedies are available. Should I drop to SYSTEM_MODE(MANUAL)? mode? My code base is large and therefore I can’t quite post it but since it’s in SEMI_AUTOMATIC I’m generally doing nothing to manage WiFi.

Heap fragmentation can lead to this behaviour.
Especially since it’s happening at unpredictable times, this would be my first point of investigation.

@ScruffR Thanks for responding. Let’s assume that Heap fragmentation is the issue for a second. That would lead me to have two questions. Why is a blinking cyan light the result of heap fragmentation and secondly, what sorts of things are the most common culprits of heap fragmentation?

Thanks!

I should also ask, I've read some older threads like this one:

And they suggest:

If the Cloud disconnects unintentionally, the Core will continue to try to reconnect to the Cloud and will block execution of user code when attempting to connect to the Cloud.

He goes on to say, to test:

For the unintentional WiFi disconnects, I find it's easiest to use a uFL Core and disconnect the antenna. To disable the Cloud, you have to do something like create a firewall setting in your router that prevents your core from seeing the internet.

But that is hard for me to do in my current situation where I don't have access to my router. This was written though in 2014. Could this also be something that would cause it to freeze?

Second part first, String objects do store their content on the heap and growing strings will cause relocation and hence small hardly reusable fragments on the heap, so heavy use of them is the most common cause for this - and many people got rid of their problems by just substituting String with char[] strings.

Now for the cyan blinking, since when a cloud connection is set up, you'll need several objects to do the communication and these will need some buffers and other stuff which will probably be stored on the heap, but if there is not enough unfragmented space, these objects can't be constructed and since the system is dealing gracefully with allocation failures you won't see the typical SOS panic which would happen in a lot of less forgivingly written code.

But I'm not saying this is the cause for the issue you're seeing, but it might be one that you might not think of but easily could cure.

BTW, flashing cyan is not considered freezing firmware. A steady RGB LED would indicate frozen system.
You can also use SYSTEM_THREAD(ENABLED) to keep your own code in control whether or not the system can connect.
As you said, this was 2014 and on a Spark Core - many things have changed since (providing you've also gone with the regular system updates)

Thanks @ScruffR. I’ll take a look at using char arrays instead of Strings. Do you generally avoid the String class for this reason when working on embedded projects?

Regarding SYSTEM_THREAD(ENABLED) we are actually using this. Which is why I was confused it was giving the appearance of freezing. Not sure if knowing that gives you more or less confidence that it is Heap fragmentation causing the blinking cyan.

Exactly that.

Looking at many of the code posted on this forum, I got to the conclusion this is the most common reason for multiple "unexpainable" SOS crashes or connectivity issues.
Hence it's the first suggestion you'll get on here to remove this from the list of usual suspects.

Since you are running multi threaded your code can check for prolonged reconnection issues and take counteractions like rebooting the device.
Other causes could be related to your actual network, but are less simple to pinpoint and solve.

I think more and more our issue falls into the ‘less simple to pinpoint and solve’ category unfortunately. It appears that the issue happens more when WiFi is spotty then when it isn’t… We’ve also replaced almost all the strings in our project with char arrays and the issue seems as persistent as ever in spotty wi-fi situations. Why it is that the system will freeze despite enabling the system thread is beyond me.