TCPClient always fails on first connect attempt after activating WiFi

Hi, I’m running in SEMI_AUTOMATIC mode and finding that after I turn on WiFi and connect to the cloud, the TCPClient will always fail to connect on its first attempt. Subsequent attempts are fine.

I’ve been careful to ensure everything is set up correctly waiting for WiFi.ready() and Spark.connected() and its even reporting a local IP address before I go to make the connection.

Is this an issue with the TCPClient class?

Here’s my debugging from two cycles of my main loop, note that the first time it fails:

Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready......... WiFi is ready!
Connecting to cloud.. Cloud connected!
IP Address: 192.168.0.5
Connecting TCPClient...
connection failed


Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready WiFi is ready!
Already on cloud!
IP Address: 192.168.0.5
Connecting TCPClient...
connected

And here’s my sample application that creates these results:

#include "application.h"

unsigned int nextTime = 0;    // Next time to contact the server

SYSTEM_MODE(SEMI_AUTOMATIC);

void setup() {
    Serial.begin(9600);
    delay(2000); // delay for me to connect serial terminal
}

void loop() {
    if (nextTime > millis()) {
        return;
    }

    Serial.println("\n\nTurning on WiFi.");
    WiFi.on();
    
    Serial.println("Connecting WiFi.");
    WiFi.connect();
    Serial.print("Waiting for WiFi to be ready");
    while(!WiFi.ready()) {
        Serial.print(".");
        delay(100);
    }
    Serial.println(" WiFi is ready!");
    
    if(Spark.connected()) {
        Serial.println("Already on cloud!");
    }
    else {
        Serial.print("Connecting to cloud");
        Spark.connect();
        while(!Spark.connected()) {
            Serial.print(".");
            delay(100);
        }
        Serial.println(" Cloud connected!");
    }
    
    Serial.print("IP Address: ");
    IPAddress localAddr = WiFi.localIP();
    byte oct1 = localAddr[0];
    byte oct2 = localAddr[1];
    byte oct3 = localAddr[2];
    byte oct4 = localAddr[3];
    char ipChars[16];  
    sprintf(ipChars, "%d.%d.%d.%d", oct1, oct2, oct3, oct4);
    Serial.println(ipChars);
    
    TCPClient myClient;
    Serial.println("Connecting TCPClient...");

    if (myClient.connect("www.timeapi.org", 80))
    {
        Serial.println("connected");
    }
    else
    {
        Serial.println("connection failed");
    }

    //Serial.println("Turning off WiFi gracefully.");
    //Spark.disconnect();
    //WiFi.disconnect();
    //WiFi.off();
    //Serial.println("WiFi is off!");

    nextTime = millis() + 10000;
}

EDIT: Just to be clear, if I were to uncomment the last part of code turning off the WiFi, the TCPClient will fail to connect every time. It’s only after being turned on for a while that it can connect successfully.

2 Likes

I’m so glad you created this thread. I’ve been having a similar issue and I think it’s caused for the same reason. I created a thread and didn’t actually get a proper workaround, yet. So at least we can team up and get more attention to find a solution. Can you try something for me please?

Can you make this modification to your code and see if it succeeds?

....
TCPClient myClient;
Serial.println("Connecting TCPClient...");

delay(5000); //<--- add this line
if (myClient.connect("www.timeapi.org", 80))
...

What I found is that it’s not the “first” request that fails, but it’s that if you do a request before the first 5 seconds of connecting to the cloud it will fail. If it still fails try a delay of 10 secs.

Let me know your results!

1 Like

I have seen similar stuff before and I have a theory–can you try adding this before your first call and reporting the results:

char hostname[] = "www.timeapi.org";
uint32_t backwardIP = 0;
if (gethostbyname(hostname, strlen(hostname), &backwardIP)>0) {
   Serial.println("Found host by name"); 
} else {
   Serial.println("Host not found");
}
1 Like

Awesome, thanks for the replies guys.

@Iv4n Good point that it is more about time and not just that the first connection always fails. I added the 5000ms delay as you suggested and I found that it helps a bit. I’ve uncommented the “turn off” section at the end of my code so that each attempt starts from having the WiFi turned off and found that with the 5 second delay I’m getting about 50% success rate, which is much better than the 0% I was getting before! Here are the logs from the first four connection attempts:

Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready.................... WiFi is ready!
Connecting to cloud. Cloud connected!
IP Address: 192.168.86.45
Connecting TCPClient...
connected
Turning off WiFi gracefully.
WiFi is off!


Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready................. WiFi is ready!
Connecting to cloud... Cloud connected!
IP Address: 192.168.86.45
Connecting TCPClient...
connection failed
Turning off WiFi gracefully.
WiFi is off!


Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready................... WiFi is ready!
Connecting to cloud. Cloud connected!
IP Address: 192.168.86.45
Connecting TCPClient...
connection failed
Turning off WiFi gracefully.
WiFi is off!


Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready.................... WiFi is ready!
Connecting to cloud.......... Cloud connected!
IP Address: 192.168.86.45
Connecting TCPClient...
connected
Turning off WiFi gracefully.
WiFi is off!

I can experiment with longer durations to see if that improves the success rate but the power usage is a concern as I’m currently running off a teeny solar panel and the WiFi uses about 200mA when it’s on so ideally I’d like to find a solution that works as fast as possible. I guess you’re in a similar situation?

@bko I’ve added the code you suggested and it is successfully finding the host by name every time, even though it isn’t able to open a TCP connection. Here is the output from the program with the code you suggested:

Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready......................... WiFi is ready!
Connecting to cloud..... Cloud connected!
IP Address: 192.168.86.45
Found host by name
Connecting TCPClient...
connection failed
Turning off WiFi gracefully.
WiFi is off!

and here’s the code snippet just to be clear:

Serial.print("IP Address: ");
IPAddress localAddr = WiFi.localIP();
byte oct1 = localAddr[0];
byte oct2 = localAddr[1];
byte oct3 = localAddr[2];
byte oct4 = localAddr[3];
char ipChars[16];  
sprintf(ipChars, "%d.%d.%d.%d", oct1, oct2, oct3, oct4);
Serial.println(ipChars);

char hostname[] = "www.timeapi.org";
uint32_t backwardIP = 0;
if (gethostbyname(hostname, strlen(hostname), &backwardIP)>0) {
   Serial.println("Found host by name"); 
} else {
   Serial.println("Host not found");
}                       

TCPClient myClient;
Serial.println("Connecting TCPClient...");
if (myClient.connect("www.timeapi.org", 80))
{
    Serial.println("connected");
}
else
{
    Serial.println("connection failed");
}

EDIT: I added some more code to check my DNS and it is 8.8.8.8 I assume this is normal.

IPAddress dnshost(ip_config.aucDNSServer[3], ip_config.aucDNSServer[2], ip_config.aucDNSServer[1], ip_config.aucDNSServer[0]);
Serial.println(dnshost);
1 Like

Hi @megabyte

No, a DNS server address of 8.8.8.8 is not normal. You are trying to use Google’s DNS service directly rather than via your router. Normally your router is your DNS host and then your router uses an upstream host like Google’s server. I wonder if some of your DNS requests time-out?

How are you connected to the Internet?

Maybe you should try changing to the IP address of your host:

if (myClient.connect(IPAddress(23,21,143,20), 80))

@megabyte i’ve had all manner of inconsistent issues even while running in AUTOMATIC mode. my core can run from a default configuration or, when prompted, request a configuration from a server running on a machine on the local network. i can always access the server from another computer with reliability, however often my core’s TCPClient refuses to successfully connect, despite having an active connection to the cloud.

sometimes a reset or two or three helps, and when i get into a “state” where the core is happy, my TCPClient can connect, disconnect, and repeat many times without issue.

what i have found is a variable, if not a true workaround, is to ping the core from my server. the core does not reply, but i’ve found that magic is happening somewhere behind the scenes and the TCPClient is subsequently able to connect. strangeness.

thanks for throwing your issues out there though, it’s nice to know that i’m not alone.

Thanks @trackdork , we’ll get to the bottom of this.

@bko, regarding my internet connection, I’m at work at the moment, the core is connecting through an Airport Express that is wired into our LAN that uses some kind of SHDSL. My Mac reports my DNS Servers are (in order):
8.8.8.8
203.12.160.35
203.12.160.36
The first is Google and the latter two are our local ISP’s DNS Servers so it’s not just the core that thinks 8.8.8.8 is the right way to go. I thought a lot of people deliberately use google as their DNS provider so do you think this could be a red herring?

I was at home when I started this thread and the problems existed there as well. Tonight I will try again using our home network and see if the spark core still uses google’s DNS servers as I hadn’t checked that before today.

I do wonder if our DNS is timing out though I’m not sure how to test this. Pings to 8.8.8.8 take about 6ms on average but I don’t know if that’s relevant. I heard that the CC3k has a 1 second timeout and we are in Australia here so I wonder if that could something to do with it?

Regarding the code snippet you provided:

if (myClient.connect(IPAddress(23,21,143,20), 80))

I apologise but I don’t know what you mean by changing my ip address to my “host”. I did try changing the code over to use the exact line you provided above but the TCPClient still failed to connect to it. I don’t know what 23.21.143.20 is meant to be, it looks like some Heroku app hosted on Amazon but I’m interested to know what IP address I should be typing in there.

Hi @megabyte

23.21.143.20 is the address of www.timeapi.org, which is really a Heroku app hosted on Amazon AWS. In the modern Internet, there is not a one-to-one correspondence between IP addresses and hosts or hostnames. Any particular computer can have many IP addresses and names and conversely a single name can have many computers with different IP addresses. All of that can change over time as well. By your “host” I just meant the web server you are trying to connect to www.timeapi.org.

So I was still thinking DNS is having problems so I converted to the IP address, but that didn’t help.

Yes, using Google’s DNS server is generally fine but normally your router provides DNS service. When I print the DNS host on my core, if I am given the address 10.0.0.3 from the DHCP server in my router, my gateway and DNS host will be 10.0.0.1, meaning I ask the local router how to find and connect to hosts not on my local subnet. This is all configurable of course, and I don’t know of a reason why your setup shouldn’t work, but it is quite different and is probably a clue.

Thanks @bko, that’s strange. I can’t find any reference to timeapi in the reverse lookups for 23.21.143.20 and when I ping www.timeapi.org I get the IP of 107.20.253.181. Also using websites like http://ipinfo.info/html/ip_checker.php gives a different ip address every time I check it.

Anyhow, I’ve changed my source code to try connecting to the all of the possible IP addresses I could find for it and none of them will connect. I’ll give it a try when I get home tonight and hopefully my home network doesn’t use google for its DNS :smile:

Hello guys, Although I know that we haven’t proven that the cause of this issue is the same I hope I’m not interfering and instead helping as I do believe that the issues are related if not the same. I just spent almost the whole night after work doing some tests and to be honest the results where so inconsistent that I’m not even sure what useful information I can give. I will try my best.

For the records my DNS was either 0.0.0.0 or 192.168.1.1 (yep, my router), most of the times it was 192.168.1.1. There was nothing constantly happening when it was 0.0.0.0, the behavior was pretty much the same in both cases.

@megabyte, you were actually correct somehow in what you said about the “first” request always failing. I have the feeling that doing a delay(5000), it’s just giving the core the opportunity to perform some kind of cleanup, another tcp operation or something similar. Doing a TCP request RIGHT after either WiFi.ready() or Spark.connected() are == true will from my tests fail 100% percent of the times.

I have separated my tests in two different stages and maybe this is changing a little bit the behavior of the issue that you are experiencing @megabyte. I’m using semi_automatic (I know you are too) but I’m running some tests after WiFi.ready() and the same ones after Spark.connected().

So from my tests:
(1a) doing at WiFi.ready():

        bool result = false;
        char hostname[] = "www.timeapi.org";
        uint32_t backwardIP3 = 0;
        do {
            result = (gethostbyname(hostname, strlen(hostname), &backwardIP3)>0);
        } while (!result);
        //put http request code here

will always result in the http request being successful. For some reason in all my tests I never needed to call SPARK_WLAN_Loop(); inside the while loop, although gethostbyname failed generally once, it usually succeeded the second time, exited the while loop and performed the HTTP request.

(1b) doing at Spark.connected():

        bool result = false;
        char hostname[] = "www.timeapi.org";
        uint32_t backwardIP3 = 0;
        do {
            result = (gethostbyname(hostname, strlen(hostname), &backwardIP3)>0);
            SPARK_WLAN_Loop() //or infinite loop sometimes
        } while (!result);
        //put http request code here

would in general result in gethostbyname being succesful and the HTTP request failing. There where some occasions where gethostbyname would fail and it would fall into an infinite loop, adding SPARK_WLAN_Loop() inside the while loop fixed this, but the HTTP request still failed.

(2a) doing at WiFi.ready():

        bool result = false;
        do {
            result = httpRequest() //call your HTTP request function here
            SPARK_WLAN_Loop(); //this never needed from my tests, but I added it after I finished all my tests just in case
        } while (!result);

this would in general fail the first time and succeed the second time. didn’t fail more than twice on my tests.

(2b) doing at Spark.connected():

        bool result = false;
        do {
            result = httpRequest() //call your HTTP request function here
            SPARK_WLAN_Loop(); //sometimes infinite loop if not here
        } while (!result);

this in general would fail in the first call, but would suceed in the second one. Something I noticed is that the first call would take a very long time to perform before returning, I got from less than 1 second to up to 8+ seconds to get a return on the first http request. 7 seconds was the wait most of the times.

Here is the code that I used in my test part 2. Note that I’m using a simplified version of the HTTP request

SYSTEM_MODE(SEMI_AUTOMATIC);
const char HTTP_API[] = "www.timeapi.org";
const uint32_t DEFAULT_RESPONSE_WAIT_TIME = 1500;
TCPClient client;
bool notificationSent = false;
bool cloudConnected = false;

void setup() {
    Serial.begin(57600);
    WiFi.connect();
}

void loop() {
    if (!cloudConnected && WiFi.ready() && Spark.connected()) {
        Serial.println("WiFi && Cloud connected");
        bool result = false;
        do {
            Serial.print("Starting HTTP request at millis: ");
            Serial.print(millis());
            result = sendValues();
            Serial.print(", request returned: ");
            Serial.print(result);
            Serial.print(" - finished at millis: ");
            Serial.println(millis());
            SPARK_WLAN_Loop();
        } while (!result);
        
        cloudConnected = true;
    }
    if (!notificationSent && WiFi.ready()) {
        notificationSent = true;
        Serial.println("WiFi connected");
        bool result = false;
        do {
            Serial.print("Starting HTTP request at millis: ");
            Serial.print(millis());
            result = sendValues();
            Serial.print(", request returned: ");
            Serial.print(result);
            Serial.print(" - finished at millis: ");
            Serial.println(millis());
            SPARK_WLAN_Loop();
        } while (!result);
        Spark.connect();
    }
}

bool sendValues() 
{
    if (client.connect(HTTP_API, 80))
    {
        bool ret = false;
        client.println(String("GET /utc/now HTTP/1.0"));
        client.println(String("Host: " + String(HTTP_API)));
        client.println("Content-Length: 0");
        client.println();
        
        uint32_t lastRead = millis();
        while (!ret && (millis() - lastRead) < DEFAULT_RESPONSE_WAIT_TIME) {
            while (client.available() > 0) {
                ret = true;
                client.flush();
            }
        }
        client.flush();
        client.stop();
        
        return ret;
    }
    else
    {
        Serial.println("... connect failed");
        return false;
    }
}

PS: I would be surprised to see if someone else tests this code and the HTTP connect succeeds in the first call from either WiFi.ready() or Spark.connected().

Hope this works for something!

OK, I just got home and I’m testing with my home network and ISP where the DNS is using my router as expected and the results are the same with no TCP connections being successful after connecting to the cloud using DNS or IP address. I just wanted to rule out the 8.8.8.8 Google DNS issue before going too much further.

Turning on WiFi.
Connecting WiFi.
Waiting for WiFi to be ready......... WiFi is ready!
Connecting to cloud.. Cloud connected!
IP Address: 192.168.0.12
DNS: 192.168.0.1
Found host by name
Connecting TCPClient...
connection failed
Turning off WiFi gracefully.
WiFi is off!

@Iv4n Thanks for the detailed post. I tried your code and got the following results:

WiFi connected
Starting HTTP request at millis: 15982... connect failed
, request returned: 0 - finished at millis: 15992
Starting HTTP request at millis: 16095, request returned: 0 - finished at millis: 19077
Starting HTTP request at millis: 19081, request returned: 0 - finished at millis: 22007
Starting HTTP request at millis: 22011, request returned: 0 - finished at millis: 24946
Starting HTTP request at millis: 24950, request returned: 0 - finished at millis: 27903
Starting HTTP request at millis: 27908, request returned: 0 - finished at millis: 31151
Starting HTTP request at millis: 31156, request returned: 0 - finished at millis: 34132
Starting HTTP request at millis: 34136, request returned: 0 - finished at millis: 37121
Starting HTTP request at millis: 37125, request returned: 0 - finished at millis: 40060
Starting HTTP request at millis: 40065, request returned: 1 - finished at millis: 43182
WiFi && Cloud connected
Starting HTTP request at millis: 44539... connect failed
, request returned: 0 - finished at millis: 52580
Starting HTTP request at millis: 52587, request returned: 0 - finished at millis: 55615
Starting HTTP request at millis: 55621

…then my core rebooted - gotta look into that. Anyhow, it did fail on the first call as expected, and it did fail once more a little later.

I tried your 1a,b suggestions after Sperk.connected() with no luck. It always gets the hostname easily but fails to connect the TCP Client. I think your 2a/b suggestions are probably the best possibility out there. Just hammer it until it succeeds lol :smile:

I also went back and tried out your first suggestion of just changing my

delay(5000); // 5 sec

to

delay(10000); // 10 sec

after Spark.connected() and this improved my success rate from ~50% to ~100% which I kind of expected. I’ll push on with your 2a,2b suggestions for now. Much thanks for suggesting this!

I wonder if the spark guys could chime in with an example that demonstrates a successful TCP connection after WiFi.ready() or Spark.connected() without using delays, or if there’s some other flag we can wait for?