Random client.connect() and client.println() failures (non cloud server)

UPDATE:
I’ve now run the test on a live host. Works much better, but still problems. Haven’t noticed any long outages but approximately 10% of the connects still fail. However it usually picks it up at the next 5 second interval. Another 5% of the time I get a connect okay, but the rest of the communication doesn’t go through. Again it usually picks it up the next time around. So, I guess it’s mandatory to monitor the return message to make sure it completed ok.

Another problem of greater concern is that the Spark will randomly reset itself (I guess that’s what it’s doing). This occurs at random intervals of between 5 minutes, and 12 minutes. When it occurs all communication stops and I see (as best I can tell) 8 red flashes, pause, 1 red flash, pause, 9 red flashes, pause, followed by 1 red flash. This is followed by some rapid green flashes, rapid blue flashes, and normal breathing pattern. NOTE: To save you some reading, I patched over this problem by adding delays after each client.println(). Doesn’t give me a warm fuzzy feeling but it seems to clear the problem up.

Here is the original post, which I will leave as is because it might be a useful data point for anyone trying to do local development (BTW, the reason for local development is that real web hosts will block your IP if you make too many bad requests on their server):

Hi,
The Spark Core appears to be randomly bombing out while sending web requests to a local (non cloud) server.  The program will run fine for long periods of time, then will fail.  Sometimes the device will reset itself, and try to recover, yet will not send out packets.  Then after some time it will correctly connect and communicate with the server for several minutes before failing again. Any thoughts will be appreciated.

Setup: Spark Core (at 192.168.1.108) with builtin Antenna, Linksys (Cisco) E2000 router (20’ away), destination server XAMPP (Apache & PHP) running on XP box connected via Ethernet to router.

Problem:

  1. Keep getting Connect Fail (ie client.connect() returns 0).
  2. Wireshark running on the XP server shows packets coming in from 192.168.1.108 when device is running normally.
  3. Wireshark shows absolutely no packets coming from 192.168.1.108 while I’m getting the Connect Fail messages (Spark is still breathing and sending out serial messages).
  4. Occasionally it will connect but won’t allow further communication (println()s).
  5. Resetting, and/or removing and reapplying power doesn’t help.  Measured power with DVM: VIn = 4.78V, V3.3 = 3.28V. Power viewed with scope looks ok.
  6. Can be reprogrammed via Cloud IDE 80% of the time while the fail to connect situation has gone on for many minutes previously, so WiFi is working.
  7. Sometimes during all of this, without any intervention, instead of “breathing” it blinks red, then eventually goes back to breathing, outputting to Serial1, but not connecting.

Note: it can run well for 30 minutes or more, and then fail, giving repeated Connect Fail messages for 30 minutes or more before recovering by itself. Totally unrelated to any other network traffic from what I can tell.


#Code:

TCPClient client;
byte serverAddress[] = { 192, 168, 1, 141 };
unsigned long lasttime = 0L;
 
void setup()
{
    Serial1.begin(9600);
    delay(2000);
    Serial1.flush();
    Serial1.println(Network.SSID());
    Serial1.println(Network.gatewayIP());
    Serial1.println(Network.subnetMask());
    Serial1.println(Network.localIP());
}
 
void loop()
    {
    unsigned long time;

    time = millis();
    
    if ((time - lasttime) >= 5000L)
        {
        lasttime += 5000L;

        if (client.connect(serverAddress, 80))
            {
            Serial1.write("(V33 Connect Success)\r\n");
            client.println("GET /farCI3F4/get_test.php?what=cheese&who=mickeymouse HTTP/1.1");
            client.println("Host: 192.168.1.141");
            client.println("Content-Length: 0");
            client.println("Connection: Keep-Alive");
            client.println();
            while ((millis() - lasttime) < 1000)
                {
                char c = client.read();
                if (c >= 0)
                    {
                    Serial1.print(c);
                    }
                }
            Serial1.println();
            client.stop();
            } 
      else
            {
            Serial1.write("(V33 Connect Fail) ");
            client.stop();
            }
        }
    }

#get_test.php
on root directory of web server

<?php
$data_array = array(
   'what' => $_GET["what"] ,
   'who' => $_GET["who"]);
echo "Ok ";
?>

#Results through Serial Port 1:

ELLDA
192.168.1.1
255.255.255.0
192.168.1.108

(V33 Connect Success)
HTTP/1.1 200 OK
Date: Fri, 27 Jun 2014 00:21:45 GMT
Server: Apache/2.4.3 (Win32) OpenSSL/1.0.1c PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 3
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

Ok
(V33 Connect Success)

(V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33
 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Con
nect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect
 Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fai
l) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (
V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33
Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Conn
ect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect
Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail
) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V
33 Connect Fail) (V33 Connect Success)
(V33 Connect Success)

(V33 Connect Success)

(V33 Connect Success)  <<== NOTE: It connects, but won't communicate

(V33 Connect Success)

(V33 Connect Success)

(V33 Connect Success)

(V33 Connect Success)

(V33 Connect Success)

(V33 Connect Success)
          HTTP/1.1 200 OK
Date: Fri, 27 Jun 2014 00:25:30 GMT
Server: Apache/2.4.3 (Win32) OpenSSL/1.0.1c PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 3
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

Ok
(V33 Connect Success)
          HTTP/1.1 200 OK
Date: Fri, 27 Jun 2014 01:02:43 GMT
Server: Apache/2.4.3 (Win32) OpenSSL/1.0.1c PHP/5.4.7
X-Powered-By: PHP/5.4.7
Content-Length: 3
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

Ok

(V33 Connect Success)
.......

ELLDA
192.168.1.1
255.255.255.0
192.168.1.108
(V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33
 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Con
nect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect
 Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fai
l) (V33 Connect Fail) (V33 Connect Fail) (V33 Connect Fail)

Hi @faraday,

Thanks for posting your findings! This isn’t super scientific of me, but I’ve found adding some small delays after making HTTP requests, before calling client.stop has been helpful. I might also move your lasttime += 5000L; call to after your client.stop calls, and instead maybe call lasttime = millis();, so your requests will stay closer to every 5 seconds.

We’re also about to release the latest CC3000 patch which we’ve found to be really helpful with stability on certain Wi-Fi networks, so that might help as well. :smile:

Thanks again for posting, the more data / examples we have the easier it is to fix issues. :slight_smile:

Thanks,
David

Hey Dave, thanks for your reply.

I’ve been trying to find explanation for the blink codes, but haven’t found anything that matches what I’m seeing.

What does the 8 red, pause, 1 red, pause, 9 red, pause 1 red sequence mean?

Is it possible that I’m doing something in my code that’s hanging up some internal process that needs to run?

I like what I see in the Spark so far, but this problem is serious for us. We are a possible customer for 100’s of units BTW.

Hi @faraday

The red LED blink codes are doc’ed here. The general pattern is SOS (fast dot-dot-dot, slower dash-dash-dash, fast dot-dot-dot) followed a count of red pulses, followed by SOS again.

It sounds like you are seeing a hard-fault which is one red in between the SOS markers. Usually a hard-fault is a programming error somewhere like following a null pointer or allocating too much static memory, but I am not seeing it in you code above. More serial debug statements might help to track it down!

1 Like

Thanks bko. That was a big help. I replayed my video of the blink sequence. Definitely SOS followed by 1 red, followed immediately by another SOS and 1 red. So hard fault. Funny, I should have known that that was an SOS!

Sorry, my bad! I meant to link in the SOS error code page as well but apparently I missed it. Thanks @bko ! :smile:

1 Like

Referring to the red blinking “hard fault” situation when talking to a real web host (not WAMPP) –

It looks like (thanks Dave) scattering some delay(2) statements after each of client.printl()s made the problem go away. This seems a little bit scary.

All I can figure is that the client.println()s are blocking, and don’t allow a required underlying process to run. And, that contained in the delay() is something that takes care of these needs.

Can anyone substantiate this or offer a better explanation?

Hi @faraday,

Glad that helped! My unscientific guess is there might be a race when closing the socket and finishing any pending writes to the socket (someone please correct me if I’m wrong), and that a delay in there gives the core a little more time to finish talking with the cc3000. Did you try adding one just before the stop, instead of after each line instead?

If this is indeed the case though, we should open an issue and get it fixed in the firmware, instead of doing work-arounds. Ping for @satishgn / @zachary, who know way more about this than I do.

Thanks!
David

My first try was to put a delay right before I went into the client.read() loop. That by itself SEEMED to fix it. But since I have no idea why this is happening I erred on the safe side and put it after every client.println().

I have 30+ years of C programming behind me, much of it in embedded systems where real debugging can take place. This whole try something and see if it masks the problem approach doesn’t seem quite right to me, yet that’s how things seem to be not just with Spark, but also Imp and RN-171.

Also, while I’m no longer getting the red SOS messages I’m still not getting solid communication with our live server.

Why don’t the client.println()s and other client functions return some type of meaningful status or error codes?

1 Like