TCPClient hangs up core

roderikv · December 23, 2013, 8:57am

I became inspired from the local communication example. so now I am trying to make a local Restfull api.
So i started humbly with creating a webserver that returns the message that had been sent.
similarely to the other problems (mentioned below) i run into stability issues.

here is my code:

TCPServer server = TCPServer(8080);

char lastByteOfAddress[4];
bool on = false; // the state of the led.

void setup()
{  
    server.begin(); // start listening 

    //using api.Spark.io to retrieve the last part of the ip address of the spark
    sprintf(lastByteOfAddress,"%d",int(Network.localIP()[3])); 
    Spark.variable("ipaddress", &lastByteOfAddress, STRING); 
   
    delay(1000);
}

void loop()
{
    TCPClient client = server.available();
    if (client){
        while (client.available()){
            client.write(client.read());
        }
        client.flush();
        client.stop();
      }
}

calling the webserver will usually give a result the first time but becomes very unstable subsequent times either returning empty result or timing out. Eventually the core will not let itself be reflashed, and i have to do a factory reset.

Is this an issue or am I doing something wrong?

I also looked at these similar issues, but they were both not using the spark as TCPServer.

https://community.spark.io/t/after-using-tcpclient-core-becomes-unstable/762
https://community.spark.io/t/problem-with-tcp-socket/926

dermotos · December 23, 2013, 11:14am

I’m getting this too! I thought my cores were broken. When I read your post I realised I’m using TCPClient too, and when I tested it last night and got it working, it was with simple test app, not the one with TCPClient. Once I reflashed the project using TCPClient, the problem re-emerged.

Glad to see others are experiencing it too, I thought I was going insane.

See previous post

roderikv · December 23, 2013, 12:28pm

There might be truth to the idea that the TCP stack is not as stable as we users expect it to be seeing the other posts and your experiences.

I read your post earlier. it says you retracted it. At the time it didn’t look like a TCPClient issue to me.

The “requiring a factory reset to flash” issue is more general than the “TCPClient instability” issue. In your case you might have had both issues.

For example i have had the “requiring a factory reset to flash” in simple firmware without TCPClient.

Where the only way out was:

restart the core and reflash before the runtime error presents itself. This is only possible when there is enough time between restart and the occurance of the exception.
do a factory reset and reconnect the core to the wifi network.

The common factor in these programs were that

there were runtime errors in it (in my case there was an indexer that went out of bounds)
flashing did not work any more after a certain time
the color led was still ‘sighing’ the right color (the one for “connected to the cloud”).
the issue went away when the code was corrected.

Perhaps somebody knows of a try/catch construct that is easy to use and which can be used to make the built-in led (D7) blink
or maybe the multicolor led could indicate a runtime error as an additional state.

Stevie · December 23, 2013, 1:30pm

Hi,

I had a similar problem, but never the problem that I had to reflash. Resetting a few times always helped. See my earlier post. From my point of view the TCP stack seems to be unstable. This is pretty weird because I actually went down to the socket API which is - apparently - provided by TI. I hope this can be fixed, without this the Spark core is a bit useless…

zach · December 23, 2013, 1:56pm

Hey guys - the issue here is that the CC3000 connect() call blocks, and so if you try to open a TCP socket and it fails, then it blocks for 60 seconds, which basically kills the connection to the Cloud.

We’re working on a fix for this, which will decouple the user application and the Spark code so they don’t block each other. Fix coming in a couple weeks, since it’s unfortunately not a quick solution.

zach · December 23, 2013, 2:18pm

Just created a new thread specifically for this issue:

roderikv · December 23, 2013, 2:38pm

Hi stevie, your post was included but not shown. i have edited my post to show your post. Good read.

roderikv · December 23, 2013, 3:32pm

Your post is regarding the more general “requiring a factory reset to flash” issue.
For that issue, I fully support the long term fix you’re making. That should make a world of difference.

Seeing your reply, it seems that this will not be a fix for the TCPClient instability.

@zach : As the TCP instability issue is only part of the issue, (the title covers both the “TCP instability” and the “cannot flash core” issue) maybe the TCP instability discussion should be spawned on it’s own thread?

Regarding the instability issue, I have some additional information.
I have tried to debug this issue. Specifically the long blocking loop.
I had the blue led alternate from on to off on each iteration of the loop (added a delay(250), and it kept on blinking at around 2X a second (meaning 4 loops occur within a second).
Mind you this is before and after the tcp client was blocked.

I have tried several methods;

within the scope of the loop opening TCPClient, reading all data until TCPClient.available() <=0 and closing the client
making the TCPClient variable global and opening in the loop’s 1st iteration, reading one character on each progressive iteration untill the last iteration would end the TCPClient.
and any number of variations between the two.

All solutions had the same outcome.

long story short, the issue is not solely dependent on the loop portion taking too long.

zach · December 23, 2013, 3:52pm

@roderikv I believe this issue and the one I discuss in the other thread might be one and the same in that server.available() may be blocking, and therefore creates the same connectivity issues that long delays would cause.

roderikv · December 23, 2013, 4:06pm

@zach, thanks for your reply.

Any suggestions how to code around this issue?
The code above is the most simple version i could think of, and i cannot get it to work stably.
or is there no workaround until the fix has been applied.

Stevie · December 23, 2013, 4:39pm

Hi Zach,

thanks for your answer. I think we are not talking about the same issue here. Maybe I am wrong, but I think this is an instability of the TCP stack. I actually disabled the cloud completely by commenting out all the code which was related to the cloud. I even rewrote the sys tick function not only update the timers. I am not using the TCPClient anymore but went down to pure socket calls, but still got the same issue.

All that makes me believe that we are not talking about the same issue. I opened a thread with the title “Problem with TCP socket” where I posted my source code. The issue I am seeing happens after the first socket was closed. Then it gets sometimes extremely slow or does not react anymore at all. I can provoke that after 2 seconds.

I think Roderik and I are talking about the same - second - issue, here.

dermotos · December 24, 2013, 7:56am

I’m encountering strange problems with the TCPClient too. I was originally getting the couldn’t reflash without factory reset issue, but now I’m getting strange instability related to TCP.
In my situation, I have the core connecting to a node server over TCP. Initially it all works good, but my system needs to be able to handle network outages and re-establish connections. I periodically write a keep-alive character (that’s ignored by the node server) to the TCPClient. If it returns -1, I know it requires a reconnect.

However, once it reconnects, the core gets stuck in a reboot cycle. Each time it successfully connects to the node server, then resets itself, over and over. The only way to stop the cycle is to manually press the reset button (soft reset).

Incidentally, I wrote this on an arduino first and ported it over. On the arduino, TCPClient.connected() could successfully tell me if the connection was active (after a write), but on the core, it tells me it’s still connected, even after an unsuccessful write, when the connection has clearly dropped.

//centralControl is an TCPClient object
 centralControl.write(KEEP_ALIVE);
  if(!centralControl.connected()){
    centralControl.stop();
    //Serial.println("Not connected.");
    connect();
  }

roderikv · December 24, 2013, 2:45pm

Hi dermotos, I am seeing the same issue when reconnecting. Not the rebooting part though (yet).

When you say

Do you mean that the tcp connection is reset? Or the program.
If the core reboots, that could indicate some programming error involving null pointers. I am not alltogether convinced that the tcp stack can cause the spark to reboot. But you never know...

crstffr · December 24, 2013, 7:15pm

@dermotos, I think we might be in similar positions with our projects.

If you don’t mind, let me brain dump here for a minute and maybe something in my experience will help you guys.

I have the Core setup with a TCPClient and a local machine nearby with a Node.js server. A few things I’ve learned so far about doing this.

Calling client.connected() will return true even if the socket is closed. This is not very helpful in my experience, so I gave up on using it.
Calling client.connect() to a server that can not be reached will cause the core to hang for 60 seconds while it attempts to connect. This long delay breaks the connection to the cloud.
If you try to client.connect() to a server that’s not available over-and-over in any sort of loop scenario, the core will become unreachable and you will need to factory reset.
Left on their own, sockets from the core only stay open for about 60 seconds.

In attempt to keep the connection open longer I was also using a KeepAlive bit that I transmitted every N seconds from the core to the server. This seemed to work okay for a while, but I noticed that I was actually opening upwards of 8 different sockets (definitely not my intention) and the stability of the sockets became questionable. Some sockets would stop sending data and would have to timeout before allowing other data through. I blame this erratic behavior on my general lack of knowledge and experience with TCP though.

What I’ve done instead (using the Local Communication example as a starting point) is never let the core try to connect to the server on it’s own, and instead only connect when told to through a public “connect” function from the cloud. This means the core can never brick itself if the server is not available.

Then, on the flip side of this, in the Node server I use the Sparky library to run the public “connect” method as soon as the server is created, but then also also on socket.close(). So far this method has left me with a single open socket at any given time, no need for noisy KeepAlive requests, and as soon as the socket drops after 60 seconds, it immediately opens right back up again.

What I’m still missing is to setup an interval timer that tries to connect the core after N seconds if the public “connect” call fails. This should allow the connection to be reset regardless of who disconnects first.

And here’s the kicker, I have had almost no firmware flashing issues using this configuration. Seems that if the core has one stable open socket - without a lot of screwing about - the web IDE can locate the core and flash it every time.

Sorry for the novel. I have some messy code on github if anyone’s interested in digging through it.

zach · December 25, 2013, 1:04pm

Thanks for the detailed report; looks like there may be some issues here with our implementation of TCP, and possibly some issues due to the way that the CC3000 API works. I’ll add this to our backlog to look into in more detail.

Stevie · December 25, 2013, 5:27pm

@Zach: I am interested in getting the TCP connection stable as soon as possible. So I will certainly have a look myself at what’s going on. Something I am not clear about is what part of this was written by Texas Instruments and which part was written by you. I am assuming that the TI parts should be stable - after all it is a product which is used in many other products. So having such an obvious disfunction of the TCP stack seems to be unbelievable. My guess is then that it is something specific to the Spark Core implementation. So any pointers about where the division between the two parts really is would be helpful. Thanks!

zach · December 25, 2013, 7:21pm

The core-firmware repository is ours, and it references the core-common-lib library, which includes many dependencies, including the CC3000 host driver, which is from Texas Instruments:

Stevie · December 25, 2013, 11:13pm

Okay, thanks for the info. I will let you know what I find - if anything…

Stevie · December 28, 2013, 7:24pm

@Zach: So I finally found some time to play around with this. Here are my findings: It seems that the TI socket library has two problems: It cannot cope with buffers above a certain size for reading (and probably writing?!?). Also closing a socket too fast after writing is a problem as well.

Details: I have a program which uses the raw socket API to accept a connection. Then it reads from the socket. After reading it writes out a pre-canned answer and then closes the socket.

Initially my buffer for reading was 1024 bytes large. That lead to problems which manifested themselves by needing longer and longer when accepting a connection and finally stopping accepting anything at all.

Unfortunately setting the buffer to a smaller value (128) did not help by itself. In addition adding a delay(10) after writing and buffer closing the socket was also needed. Both changes alone did not make any difference. Together they fixed the issue I was seeing. After that it was possible to connect hundreds of times to the socket without delay. Great relief…

Dave · December 28, 2013, 8:50pm

Awesome, thank you! I think this info will be very helpful in tracking down issues with the TCPClient.

Thanks!

Topic		Replies	Views
WiFi unstable, goes dead temporarily at varying intervals Troubleshooting	38	6976	September 18, 2014
Hard fault caused by the TCPServer example from the Spark docs and a simple Python client Troubleshooting	53	12962	April 15, 2016
TCPClient problem connecting - help please Getting Started	43	4767	April 24, 2015
Core stops working after a while Troubleshooting	15	3266	April 9, 2014
TCPClient connect Issues Firmware	25	6415	August 18, 2015

TCPClient hangs up core

Related topics