Memory still drops… although it does appear to recover after some time… it’s a bit odd.
On my test photon with ONLY this code, the photon stays active despite the memory ups and downs. Not sure what else in my code on the other units could be causing the lock-up, but it’s appears to be linked in some way (maybe…)
There’s nothing too unusual in my full code.
There some temperature readings from DS18xxx, a ten second log via httpclient to a flask instance (local networking) and a one minute log to Particle Cloud. There is also a watchdog to reset the photon if the main loop hasn’t run for 15 seconds.
This behaviour is not too surprising as the "reclaiming" of previously freed space is done asynchronously whenever there is enough idle time and since traversing the heap map to calculate free memory is time consuming this is also not done permanently.
The asynchronous nature is because the memory for each function call request is allocated by the application thread and then disposed of by the system thread when the system thread has pulled the request form the queue and executed it. While the system thread is blocked, e.g. waiting for WiFi to connect, then it’s not servicing requests. There’s a bounded limit to the number of outstanding messages, so the system will not continue pushing function call requests to the system thread queue.
I don’t think it will block vital operations - the system is blocked once wifi goes down but the application thread will keep running, so long as you don’t keep pushing requests to the queue. To have code run completely independently from the system, don’t call any system APIs.
It shouldn’t be necessary to call WiFi.connect() and similar functions if you call Particle.connect() in setup. The system will then endeavor to keep wifi and the cloud connected without any prompting from the application.
Hey guys, I tried this code, rebooted the router 10 times, and each time the photon recovered. I saw the memory decrease from ca 60,000 bytes to 58,000 and it eventually recovered that memory when the WiFi was restored. I left it disconnected for a longer period, and you see the application thread slow down as it blocks waiting to push messages to the queue, which aren’t delivered, but WiFi still recovered.
If anyone experiencing this issue could provide a small app and steps to reproduce it that would be a huge step towards us being able to address it.
It shouldn't be necessary to call WiFi.connect() and similar functions if you call Particle.connect() in setup.
So I came in this morning and ripped all the code out of my checkConnectionStatus() function and just left:
if (!Particle.connected()) {
if (!cloudConnecting) {
Serial.println("Connecting to cloud!");
Status::SetDeviceStatus(DEVICE_CLOUD_CONNECTING);
Serial.println("Particle.connect()");
Particle.connect();
cloudConnecting = true;
} else {
if (cloudConnecting) {
Serial.println("Connected to cloud!");
}
cloudConnecting = false;
}
}
I just ran through 15 cycles of the WiFi network (using v0.4.9) and everything seems to be working ok, no blocking of the system thread at all and the memory seems to recover fine.
I am going to soak it overnight with a script to drop the network a few 100 times.
So, I also, after the above, have stripped out loads of “stay alive” code.
My resulting code, now appears to be working brilliantly since last night even with (purposely) rubbish signal strength, I will continue to monitor…
my basic pseudo-code now has:
system thread & automatic mode
loop
{
do_stuff that doesn't require connectivity whenever it's required
if ( WiFi.ready() && time_to_do_something )
{
do_stuff that needs local networking
if (Particle.connected() )
{
do_stuff that needs the P. Cloud
} else { Cloud not connected
waitFor(Particle.connected(),8000)
}
} else { // Wifi not ready
if ( !Wifi.connecting() ) { WiFi.connect() }
}
}
I also use PhotonWdgs to ensure a HW reset if things die…but I don’t think it’s being triggered much now.
As a side note, I am still trying to figure out what exactly was the issue that causes/caused the lockups though… if I can get the simple reproduce I’ll post here.
So I ran this last night, I scripted a DD-WRT Access Point to ifconfig eth1 down for 60 seconds every 10 mins.
What I found was that the photon reconnected successfully about 13 times, after which it failed to reconnect. The light was flashing green and my loop() code was running away, printing to screen, but no reconnection.
I am going to try and strip it down to a basic application.cpp and upload a replication.
So I stripped it down to a bare application.cpp and ran multiple re-connections as before. In true software bug fashion, it worked perfectly fine.
Colour me confused!
I am slowly putting bits of my application back in to see if I can pinpoint where it starts to show the behaviour again - anything glaringly obvious that might be causing this problem for me?
Nothing comes to mind. I’d like to try this test myself over the coming week. If you could post application code that definitely exhibits the issue I will try to replicate then dive in. (I have the WICED sources so I can see what’s going on in the networking stack.)
Cool - thanks for that! I should be able to share application code with you ok, will play for another bit here to see if a pattern emerges. Will have something with you by start of the week.