No WiFi blocks loop() - specifically Serial1 reception


#1

This is really an extension to @Elco’s issue documented here:

Am running 0.8.0.rc4 in SEMI_AUTOMATIC_MODE

All is fine if one has WiFi up, with or without Particle (ie intranet only), as I have documented here Particle.connect() disconnects WiFi - can this be stopped?

Turning off the Wireless Access Point causes the grief which is the issue of this ticket:

  • It does not want to reconnect to the Access Point when it is turned back on (bit problem in itself)
  • more importantly in my use case, every time I get log lines:
    ERROR: wiced_join_ap_specific(), result: 1024
    it blocks loop() and and I miss packets of data form the serial port (Serial1)

No doubt I am not the only one to suffer with this! Any suggestions?


WiFi but no Cloud causes SOS
#2

There might be issues open in GitHub that touch on that. Make sure to report your findings there to to increase visibility of these issues.
If you don’t find a issue that fits good enough, file one yourself.

But to make sure similar issues haven’t been addressed and solved yours already try the latest version (0.8.0-rc.8).


#3

Correction: I said:

  • It does not want to reconnect to the Access Point when it is turned back on (big problem in itself)

Now know that this is due to a misbehaving OpenWRT router. I fixed the DHCP issue with the router and it reconnects as expected.

The Serial1 issue still remains…


#4

Here is what works for me:

Some explanation: I use a this NetworkSerialMuxer class to stream from either WiFi or Serial, whichever is available, with a preference for Serial.

To get it stable I had to:

  • Only look for a new TcpClient when the old one has dropped. I think there are some issues with how they are destroyed. TcpServer keeps a reference to the client. I think it should probably only create it and pass it on. Not keep a copy itself.
  • Don’t call Particle.connect() with no WiFi. It will trigger listening mode, unlike (WiFi.connect(WIFI_CONNECT_SKIP_LISTEN)). Listening mode sends messages over serial. This interferes with the application’s use of Serial! Holy shit, I missed this and it totally messed up the application’s Serial reliability.

So now the system accepts one TcpClient at a time and will only start looking for a new one if the client disconnects.

WiFi has been rock stable in the last version.


#5

@Scruffr - no change with 0.8.0.rc8 - same issue.

@Elco, agree with your strategy to check WiFi.connected() before calling Particle.connect(), this has worked well for me too!

I will have to digest your PiLink code and see if this will help. Note that my application is looking at serial and is also a TCP Client, not a TCP Server, so will have to see if your code can assist with this because your code is a TCP Server.

To reiterate, the issue happens when I purposely turn off the WiFi router, ie no WiFi. Is your (yet to be looked at) code helpful in this situation?


#6

Isn’t it WiFi.ready()? I’m unaware of a function call WiFi.connected(), or do you mean WiFi.connecting()?


#7

@ScruffR, getting my particles mixed up, if you know what I mean!

I should have said:

(WiFi.ready() && WiFi.localIP())

@Elco, unfortunately I don’t think the code block in your available() method is helpful for my specific situation…


#8

Okay, I didn’t fully understand your issue.

Maybe this is related, from the docs:

Asynchronous system functions do not block the application thread, even when the system thread is busy, so these can be used liberally without causing unexpected delays in the application. (Exception: when more than 20 asynchronous system functions are invoked, but not yet serviced by the application thread, the application will block for 5 seconds while attempting to put the function on the system thread queue.)

So, if you fire too many async functions and saturate the system thread, the main loop will be blocked.
WiFi.hasCredentials() is synchronous, I’m not sure whether this could cause problems. I have gone through so many iterations that it’s hard to remember what caused unreliability in the past. I’m just glad that I found something that works.

But perhaps it’s a good idea to remove the hasCredentials() call, because I think it is superfluous with SKIP_LISTEN.


#9

@Elco, I think us two have been going down the same rabbit holes!

I always use (WiFi.ready() && WiFi.localIP()), have not used WiFi.hasCredentials().

Another thing to note is that am using Serial1 (ie physical serial port) and not the USB virtual ports Serial nor USBSerial1.

I really wish that interrupt driven serial input was implemented because this would have circumvented the blocking issue!

It could well be that your “Asynchronous system functions…” paragraph could be the lead that I am looking for because am suffering from this pattern:

good
good
good
bad
bad
wait some time...
good
good
bad
bad

ie it looks like the system clogs up after some activity (which points to your theory) or it could be a regular thing that is causing the blocks.

I love a chase!


#10

:slight_smile:

Try rate limit your system thread calls, how often you try to reconnect to WiFi or the tcp server. My guess it’s that it is indeed caused by overloading the system thread.

I have found that WiFi.ready() is enough nice the bug fixes in 0.8.0.
I just confirmed that removing WiFi.hasCredentials in my code prevents a 4 second block on WiFi loss.


#11

@Elco,
(just seen your post, will address under separate cover).

It looks like you call WiFi connect() whenever you have loss of WiFi:

if(!WiFi.ready() && WiFi.hasCredentials()){
                if(!WiFi.connecting()){
                    WiFi.connect(WIFI_CONNECT_SKIP_LISTEN);
                }
            }

I don’t do that because what @rickkas7 had commented in his strategy:

// The WiFi.connect() call never times out, it will keep trying forever

Comment?


#12

The !WiFi.connecting() is important here, because I think otherwise you would indeed overflow the system thread.

It might run for a long time on the system thread, but my main concern is that the user thread is not blocked.

I also have this fail safe in place:

Do a full disconnect and try again.


#13

@Elco, good to see that I have saved you a four second blockage, who good is that?!

Reviewing my code:

(A) I find that am performing the following every loop iteration:

     if (!(WiFi.ready() && WiFi.localIP()))

(B) and this every 200 mS:

     if (!Particle.connected())

My guess is that (A) is the culprit! Going to reduce the checking to 200 mS now. Let me know if you think I should do something different.


#14

Perhaps the localIP() function waits for a connect attempt?

WiFi.ready was unreliable in the past, but I think it should be better now, due to the network fixes that followed from my bug reports recently.


#15

@Elco,

Got it, missed the WiFi.disconnect() (hence the need to WiFi.connect() again). Nice move and nsymmetry - I am using a similar strategy with Particle.connect() and disconnect(), but not WiFi… onto it.


#16

@Elco, have neatened up the WiFi strategy code and moved it out to function which is now only called every 200 mS so as to overcome possibly overwhelming the system thread.

During this change, I added some extra logging lines which has shown that loop() is NOT being blocked as I had first thought. This is good news (but embarassing)!

Still left with the tricky situation of Serial1 (ie physical port) not receiving characters reliably. Which got me thinking… am now pretty sure that I have a hardware issue… I added test code which initiated a command with the device in play which elicits a serial response on a regular basis - no issue found, with or without WiFi…

I will confirm the hardware issue next week some time with my trusty logic analyser to confirm the serial traffic is ok theory.

Apologies to both you and @ScruffR for this wild goose chase…

Case closed.


#17

Great, thanks for the update.
Depending on the baud rate and data rate you are using, you have to ensure that you are reading the serial port often enough. The serial buffer is small and easily overrun.

I have used this code as a quick and dirty USB to RS485 transceiver:

I can run it at 256000 baud rate.

If I print all of the output to a python terminal though, python cannot keep up and the data is lost there. It’s the printing to terminal that’s too slow. You have to make sure that you empty the buffers regularly. If they fill up, data is discarded. Even the USB buffer of the desktop is tiny.


#18

@Elco, I will add some clarity to my “serial reliability” comment, it is the fact that no packet data is received for a transaction, not missing chars within the packet.

My initial thoughts were that serial1 reception was blocked, but it now looks very much like the transaction (actually an event indication) was simply not happening due to the hardware issue.