[v0.4.8-rc1 / v0.4.9] WiFi Reconnection Issue

specialcircumstances · February 3, 2016, 12:26pm

OK - have added


if (!WiFi.connecting()) { WiFi.connect(); }

Memory still drops… although it does appear to recover after some time… it’s a bit odd.

On my test photon with ONLY this code, the photon stays active despite the memory ups and downs. Not sure what else in my code on the other units could be causing the lock-up, but it’s appears to be linked in some way (maybe…)

There’s nothing too unusual in my full code.
There some temperature readings from DS18xxx, a ten second log via httpclient to a flask instance (local networking) and a one minute log to Particle Cloud. There is also a watchdog to reset the photon if the main loop hasn’t run for 15 seconds.

ScruffR · February 3, 2016, 2:01pm

This behaviour is not too surprising as the "reclaiming" of previously freed space is done asynchronously whenever there is enough idle time and since traversing the heap map to calculate free memory is time consuming this is also not done permanently.

specialcircumstances · February 3, 2016, 2:52pm

OK. Is there any way to trigger the reclaim (or encourage it at least)?
How long would it typically run for?

mdma · February 3, 2016, 4:59pm

The asynchronous nature is because the memory for each function call request is allocated by the application thread and then disposed of by the system thread when the system thread has pulled the request form the queue and executed it. While the system thread is blocked, e.g. waiting for WiFi to connect, then it’s not servicing requests. There’s a bounded limit to the number of outstanding messages, so the system will not continue pushing function call requests to the system thread queue.

I’ve not looked into the details of the memory allocator, but I’m certain it operates synchronously - there is no background cleanup thread. https://github.com/32bitmicro/newlib-nano-2/blob/0c5e24765fb745dc7c59f00248680c22357ffd55/newlib/libc/stdlib/mallocr.c

specialcircumstances · February 3, 2016, 6:24pm

That’s interesting… so, in theory, by overly hassling the photon to connect could I in fact block vital operations?

Am trying to figure out how to maintain a bomb proof execution under bad conditions…

mdma · February 4, 2016, 3:04am

I don’t think it will block vital operations - the system is blocked once wifi goes down but the application thread will keep running, so long as you don’t keep pushing requests to the queue. To have code run completely independently from the system, don’t call any system APIs.

It shouldn’t be necessary to call WiFi.connect() and similar functions if you call Particle.connect() in setup. The system will then endeavor to keep wifi and the cloud connected without any prompting from the application.

mdma · February 4, 2016, 5:28am

Hey guys, I tried this code, rebooted the router 10 times, and each time the photon recovered. I saw the memory decrease from ca 60,000 bytes to 58,000 and it eventually recovered that memory when the WiFi was restored. I left it disconnected for a longer period, and you see the application thread slow down as it blocks waiting to push messages to the queue, which aren’t delivered, but WiFi still recovered.

If anyone experiencing this issue could provide a small app and steps to reproduce it that would be a huge step towards us being able to address it.

mhazley · February 4, 2016, 12:32pm

Hey, was reading what you said @mdma:

It shouldn't be necessary to call WiFi.connect() and similar functions if you call Particle.connect() in setup.

So I came in this morning and ripped all the code out of my checkConnectionStatus() function and just left:

if (!Particle.connected()) {

  if (!cloudConnecting) {
      Serial.println("Connecting to cloud!");
      Status::SetDeviceStatus(DEVICE_CLOUD_CONNECTING);
      Serial.println("Particle.connect()");
      Particle.connect();
      cloudConnecting = true;
  } else {
    if (cloudConnecting) {
      Serial.println("Connected to cloud!");
    }
    cloudConnecting = false;
  }
}

I just ran through 15 cycles of the WiFi network (using v0.4.9) and everything seems to be working ok, no blocking of the system thread at all and the memory seems to recover fine.

I am going to soak it overnight with a script to drop the network a few 100 times.

specialcircumstances · February 4, 2016, 6:25pm

So, I also, after the above, have stripped out loads of “stay alive” code.

My resulting code, now appears to be working brilliantly since last night even with (purposely) rubbish signal strength, I will continue to monitor…

my basic pseudo-code now has:

system thread & automatic mode

  loop
  {
      do_stuff that doesn't require connectivity whenever it's required
      if ( WiFi.ready() && time_to_do_something ) 
      {
         do_stuff that needs local networking
         if (Particle.connected() )
         {
            do_stuff that needs the P. Cloud
          } else { Cloud not connected
            waitFor(Particle.connected(),8000)
          } 
      } else { // Wifi not ready
      if ( !Wifi.connecting() ) { WiFi.connect() }
      }
   }

I also use PhotonWdgs to ensure a HW reset if things die…but I don’t think it’s being triggered much now.

As a side note, I am still trying to figure out what exactly was the issue that causes/caused the lockups though… if I can get the simple reproduce I’ll post here.

mhazley · February 5, 2016, 8:39pm

So I ran this last night, I scripted a DD-WRT Access Point to ifconfig eth1 down for 60 seconds every 10 mins.

What I found was that the photon reconnected successfully about 13 times, after which it failed to reconnect. The light was flashing green and my loop() code was running away, printing to screen, but no reconnection.

I am going to try and strip it down to a basic application.cpp and upload a replication.

mdma · February 5, 2016, 8:49pm

It would be great to know if it does this in safe mode also.

mhazley · February 5, 2016, 8:52pm

Cool, I can run this as well now - I assume I can tell it’s connected in safe mode just via the breathing magenta?

mdma · February 5, 2016, 9:08pm

yep, that’s correct. Breathing magenta means it’s connected to the cloud.

mhazley · February 6, 2016, 6:58pm

So I stripped it down to a bare application.cpp and ran multiple re-connections as before. In true software bug fashion, it worked perfectly fine.

Colour me confused!

I am slowly putting bits of my application back in to see if I can pinpoint where it starts to show the behaviour again - anything glaringly obvious that might be causing this problem for me?

mdma · February 6, 2016, 7:09pm

Nothing comes to mind. I’d like to try this test myself over the coming week. If you could post application code that definitely exhibits the issue I will try to replicate then dive in. (I have the WICED sources so I can see what’s going on in the networking stack.)

mhazley · February 6, 2016, 7:15pm

Cool - thanks for that! I should be able to share application code with you ok, will play for another bit here to see if a pattern emerges. Will have something with you by start of the week.

mdma · February 6, 2016, 7:36pm

Thanks - no hurry from my side - please take the time you need!

specialcircumstances · February 7, 2016, 5:58pm

OK… The following code does appear to eventually end up with a lock-up…

SYSTEM_THREAD(ENABLED);

bool debug_serial = true;
long unsigned int last_reconnect = 0;
IPAddress remoteIP(192, 168, 2, 1);

void setup() {
    if (debug_serial) {
        Serial.begin(9600);
        delay(250);
        Serial.println("Starting.");
    }
    last_reconnect = millis();
}

void loop() {
    if (WiFi.ready())
    {
        if (millis() - last_reconnect > 1000) 
        {
            int replies=WiFi.ping(remoteIP); 
            Serial.print(Time.timeStr());
            Serial.print(" - Connected to WiFi. At least 1s since last check.");
            Serial.printf(" System Memory is: %d, RSSI is: %d. ",System.freeMemory(), WiFi.RSSI());
            if (replies == 5) { Serial.printlnf("Ping OK (%d).", replies); } else { Serial.printlnf("Ping Failed (%d).", replies); }
            last_reconnect = millis(); 
        }
    }
    else if (millis() - last_reconnect > 1000)
    {
        Serial.print(Time.timeStr());
        Serial.print(" - NOT connected to WiFi. At least 1s since last check.");
        Serial.printlnf(" System Memory is: %d",System.freeMemory());
        //if (!WiFi.connecting()) { WiFi.connect(); }
        WiFi.connect();
        last_reconnect = millis();        
    }
}

mhazley · February 9, 2016, 12:59pm

@specialcircumstances - I have noticed lock ups when calling WiFi.RSSI() - try your above example with it removed.

specialcircumstances · February 9, 2016, 8:46pm

OK can do.

Is there a bug open for the RSSI thing?

Topic		Replies	Views
[SOLVED] Photon fails to reconnect to web Device OS	6	1148	September 22, 2020
Photon loses connection, reboot corrects the issue Troubleshooting	4	2323	September 16, 2015
Reconnection wifi issues 0.4.9 Troubleshooting	2	1736	March 16, 2016
[SOLVED] WiFi Reconnection Issue - Is there a solution? Firmware	8	3211	November 26, 2020
Photon intermittent connection Troubleshooting	12	993	August 9, 2018

[v0.4.8-rc1 / v0.4.9] WiFi Reconnection Issue

Related topics