WiFi channel switching stops WiFi connectivity and is only recoverable by reboot (BUG)

I’ll bring that forward again in one of the next sessions with Particle.

And you are right this was a non-threaded issue but off record I had discussed the “escalation” of that issue to multi-threaded scenarios too.
One argument there was that the intended behaviour would be that the blocking should only occure during the transition from setup() to loop() but I have not tested that myself.

2 Likes

I think blocking behavior might be nice for novice users, but for advanced users I think even that doesn’t make much sense.

When an application loses WiFi (channel switch, router going down, etc):

  • it should be able to reconnect without entering setup() again
  • it should be able to perform the application without interruption (temperature control in my case) while trying to reconnect. A simple example is taking a sensor reading every second. When the WiFi goes down, the application could store these values until WiFi is back, but it should still read every second! With a blocking WiFi reconnect, this is not possible.

If the SYSTEM_THREAD is going to block the entire application, what’s the point of having a system thread?

1 Like

I’m absolutely with you. I think a consistent and fully flexible, well documented behaviour would be the way to go and special treatment of the transition from setup() to loop() can’t be called consistent IMO.

When I mentioned that I didn’t mean that the code would need to go through setup() again in order to connect but that in the current use of setup() and loop() any call of Wifi.connect()/Particle.connect() issued in setup() would prevent the flow from entering loop() till the connection is actually established. But when called early in setup() the rest of setup() should still execute and when called in loop() no blocking should happen.
While this would - if actually working that way - address some issues, it’s still inconsequent IMO and also not address all issues but rather create confusion. So getting the “non-blocking” behaviour back would be desirable.

I set “non-blocking” in double quotes since the extra low level work for the task will inevitably impact the execution of the application thread and hence may cause some delay but attending to application code whenever possible is still better than a hard block IMO.

2 Likes

I have tried removing all calls to connect() from my loop() and just leave that to the particle system thread.

I have changed it to a single WiFi.connect(WIFI_CONNECT_SKIP_LISTEN) in setup().

This seems to have resolved the issues with blocking the main loop. In my investigation I found that WiFi.connect() did not block (it is correctly run on the background thread), but it seems that the main loop execution is blocked when it is called for the second time. How this is possible with the check for WiFi.connecting() is unclear to me.

I’m going to do some more testing, but it seems that this is how it should be handled:

  • Call WiFi.connect() just once.
  • Leave it to particle to reconnect if needed and don’t interfere.

This is a pretty good conclusion for Particle I guess. The system thread seems to work fine too. There might be a bug in connect() or isConnecting() that caused this behavior.

2 Likes

I’m probably not understanding this completely, but would it be possible to ping the wifi router and only call WiFi.connect if the ping succeeds (in MANUAL and with SYSTEM_THREAD)?

Am wondering if the @rickkas7 solution is the way to go for you @Elco:

I was complaining specifically about TCPCLIENT, but the general technique could be useful to your situation too.

1 Like

That is worth a shot. What I currently have is that the P1 responds to ping, but refuses to handle TCP requests.

Any downsides to having the cloud connection when it is not used at all? What would be the effect of having no internet connection, just local LAN?

Connecting to Particle instead of WiFi only seems to help stability of the TCP server indeed! Thanks @UMD

1 Like

@Elco, @UMD this is a very interesting finding. @Elco, it would be good if you could report on your stability after some operating hours so we can report back to the Particle folks.

@Elco, thanks should go to @rickkas7 !!

@Elco confirming that using the technique, you don't need to have connection with Particle to use TCPClient, just WiFi (of course).

@peekay123, the blocking nature of TCPClient() and the like should be looked at - perhaps do what Windows does and have Async versions of things, eg "TCPClientAsync" which incorporates call backs?

For example, I have this "gotcha" work around (note this @Elco):

        // This check cures the tcpClient.connected() hanging problem
        if (!WiFi.ready())
        {
            Log.warn("*NOT* WiFi.ready()");
            continue;  // looping
        }
        // We have WiFi
        // -------------
        bOk = tcpClient.connected();            // HANGING problem here if WiFi has been lost

@peekay123, would be nice to fix issues like excessive timeouts :

        // This call can take upto 5 seconds if the other end is not listening
        // (This many not be be true post v0.6.2 firmware?????)
        bOk = tcpClient.connect(ipAddressServer, nPort);

I ran into a situation where I wanted the TCPClient connect() call to be asynchronous so I wrote an alternate implementation. I haven’t used it in a long time, but I’m pretty sure it should still work.

2 Likes

System seems to be stable now that it is connected to the cloud. Even when the connection is lost, it is restored without problems. I have one test system with terrible reception.

This is what I have (it switches automatically between USB and TCP):

I have updated my findings in this issue:

The problem is still not resolved, 0.8.0-rc.2 doesn’t fix it.

Seems even worse than before, because even though I have

SYSTEM_MODE(MANUAL);
SYSTEM_THREAD(ENABLED);
The application is completely blocked when WiFi is lost.
WiFi connectivity handling still seems to be absolutely terrible.

I am getting unhappy about my choice of Particle as a platform for my products.
Is it too much to ask from an IOT platform that:

  • Losing WiFi does not completely block the system
  • Wifi disconnects and channel switches are handled correctly and when WiFi returns
  • it always reconnects
  • A TCP server ran by the device does not get into an unrecoverable error state when WiFi is bad or temporarily lost?

I have given you test code. I reported this bug over 6 months ago. I feel unheard and ignored.

Is it perhaps possible to add some system event hooks to handle WiFi disconnects?
I’m really trying to find a solution because I have a lot of customers affected by this. Why can’t someone at particle provide some example code that:

  • Handles WiFi connecting/disconnecting without blocking the main thread
  • Handles running a TCP server on the photon that in a robust way
2 Likes

@Elco and anyone else

I too am having some trouble when my P0 devices go offline. We have 2 issues that seem related to what you are describing. The scenarios are: the router disappears (users switch routers or take the product to a different location) or the internet is intermittent or down but they are still connected to the router. In both cases, the P0 successfully determines that we are offline and we get into our “offline state” which provides some tips on our display of what to do. However, we cannot seem to get OUT of that state even if the router comes back or the internet comes back.

the P0 seems to behave differently in each case: with wifi.ready reporting 0.0.0.0 when the wifi network is not available (which should be correct) and the previous ip address seems to stay if the network cable is unplugged from the router.

one reason we switched to thread enabled mode was to allow our users to press buttons on the device and to update the screen for reassurance ESPECIALLY when offline. However, this seems to have made it difficult for the hub to come BACK online. Do I need to call the wifi.connect routine again? Will it block my system for large chunks of time and if so, any suggestions on how to allow users to interact with the device to get more information?

we are trying not to rely too heavily on the particle cloud to reduce heavy loads or timeouts so our conditions for getting back online include: is there internet?, and is our server available?

other info: I’m using firmware 0.6.3 since our product is in production. Our shipping system is not using a threaded mode, but we would like to go there to enable more interaction for the user.

Have you tried Particle.connect() periodically when you’re offline (or triggering it from user input)?

This should not block application code because you’re in threaded mode.

@avtolstoy is working on fixes and made a few in this PR:
https://github.com/particle-iot/firmware/tree/fix/photon-tcp

This improved the situation, but we still see some photons not getting back on the network.
I was also able to trigger an SOS in certain conditions with the fixes.

I don’t know when Andrey scheduled time to work on it again, but it does not seem to be complete yet.
I have posted my findings in this thread:

But I’m waiting for Andrey for an actual fix. It seems to be deep in the system layer.

Once I have a combination of my own code for disconnect handling combined with fixes from particle that works, I’ll post it here.
But I don’t think you’ll get a reliable connection on 0.6.3. Just look at the bugs that were uncovered in that PR.

1 Like

Just for the record, this seems like correct behavior to me. You are still connected to the router and could in theory access other hosts on the same subnet, you just can't get to the internet.

I know there are problems in this area and fortunately they are being addressed, but I don't think this is one of them.

3 Likes

I can try particle.connect, but we always feared particle.connect would slow down our application code because significant resources would be spent trying to connect if no connection was available. This is why I was hoping to use the Wifi.ready to indicate when we would be ready to attempt to make particle connection. Also since we don’t expressly need particle to be connected to run our application, we don’t want to have to check particle if we don’t need to. We can try to resolve to google.com first before attempting particle, but will we need to turn on and off the wifi module?

Would Wifi.connect achieve the same thing and not be blocking in the same way?

This seems odd that the behavior of wifi.ready is different from online to offline vs offline to online mode.

for example:
wifi.ready returns true when online but turns false when the photon goes offline.
but when offline, wifi.ready also returns true if connected to a router?

Do we have to expressly turn off the wifi module if we lose connectivity and turn it back on expressly when we detect we are offline?

Sorry, I should have started here: Can you describe what you're doing to try to reconnect?