I’m experiencing issues with the TCPClient, and I’m hoping someone here can help me out. My actual goal is audio streaming, but it turns out I cannot get even the basic ‘Google’ example from the docs to work properly! This one here.
The behavior I’m seeing is that the download starts just fine but slows down after a couple of seconds (always at the same number of bytes), transforming into a slow crawl of intermittent data bursts with pauses increasing in size, until almost no data is received at all. With some source addresses (I did a couple of tests) the core manages to finish the downloads (after minutes, like for the Google page), sometimes it just hangs/restarts.
Building in the Web IDE or using my local environment (latest master branches) does not seem to make a difference. To rule out my home network as cause I also tried using a mobile hotspot, with same results.
Any ideas what’s going on here? Modifying the basic Google example a bit to output not the source but milliseconds since start of download; every 512 bytes; illustrates the issue pretty well:
Interesting! I’m researching a potentially related issue. If you want the greatest throughput, I recommend compiling locally, and editing inc/spark_wiring_tcpclient.h and changing the buffer size to something bigger, like 256. Try making your packet sizes from your source a weird number of bytes, like 255. There’s a glitch in the host driver about packet sizes at or multiples of 256 bytes, and TCP really only goes up to 1500 anyway, so you want your buffer to be as close to your packet size as possible, and at the moment 256 / 512 byte packets might be problematic.
TCPClient client;
byte server[] = { 74, 125, 224, 72 }; // Google
system_tick_t start;
void setup()
{
// Make sure your Serial Terminal app is closed before powering your Core
Serial.begin(9600);
// Now open your Serial Terminal, and hit any key to continue!
while(!Serial.available()) SPARK_WLAN_Loop();
Serial.println("connecting...");
if (client.connect(server, 80))
{
Serial.println("connected");
client.println("GET /search?q=unicorn HTTP/1.0");
client.println("Host: www.google.com");
client.println("Content-Length: 0");
client.println();
start = millis();
}
else
{
Serial.println("connection failed");
}
}
long total = 0;
void loop()
{
if (client.available()) {
client.read();
total++;
if (total % 512 == 0) {
Serial.print(millis() - start);
Serial.print(" - download total: ");
Serial.println(total);
}
}
if (!client.connected())
{
Serial.println();
Serial.println("disconnecting.");
client.stop();
for(;;);
}
}
As I said, pretty much the basic example from the docs with some timing additions. Outputting the received bytes via Serial and/or taking time measurements does not seem to influence the stalling.
@Dave: Thanks for your reply. Throughput is not (yet) my primary concern, right now I’d be happy if I can get reliably receiving data to work at all! I already tried playing around with the TCP buffer sizes yesterday, sadly, without much success.
i noticed something strange with Serial.read() in one of my programs that did the same.
i found adding a 10ms delay in the if(Serial.available()) loop fixed it. im not sure if its the checking available or the read that messes it up but it fixed the problem.
I tried many different ways so that i didnt slow the serial transfer down, i tried adding delays every 256bytes, instead of each byte thinking it may be a buffering issue with no luck…
No problem! It looks like you’re using client.read(), which pulls in one byte at a time. I tried increasing the buffer size to 512, and I modified the firmware a bit, how about (note, this requires modifying core-firmware/inc/spark_wiring_tcpclient.h, and increasing the buffer size).
#include "application.h"
#include <math.h>
TCPClient client;
//byte server[] = { 74, 125, 224, 72 }; // Google
system_tick_t start;
uint8_t buffer[511];
uint8_t done = 0;
void setup()
{
// Make sure your Serial Terminal app is closed before powering your Core
Serial.begin(9600);
// Now open your Serial Terminal, and hit any key to continue!
while(!Serial.available()) SPARK_WLAN_Loop();
Serial.println("connecting...");
if (client.connect("google.com", 80))
{
Serial.println("connected");
client.println("GET /search?q=unicorn HTTP/1.0");
client.println("Host: www.google.com");
client.println("Content-Length: 0");
client.println();
delay(250);
start = millis();
}
else
{
Serial.println("connection failed");
}
}
long total = 0;
void loop()
{
if (done) {
return;
}
//read it fast!
int count = 0;
while ((count = client.available()) >= 0) {
for(int i=0;i<511;i++) { buffer[i] = 0; }
//Serial.println(String(count) + " bytes available");
total += client.read(buffer, min(count, 511));
//Serial.println((char*)buffer);
Serial.print(millis() - start);
Serial.print(" - download total: ");
Serial.println(total);
}
if (!client.connected())
{
Serial.println();
Serial.println("disconnecting.");
client.stop();
client.flush();
done = 1;
delay(50);
}
}
You’re definitely right that things slow down. I would want to watch this request with wireshark, but my guess is that data is coming too fast, and the delays are incremental backoffs as packets are retransmitted. I might be crazy, but I suspect this is an artifact of the buffers on the radio itself. In any case, I wanna research this more… But this should help a little.
Thanks for your help. I tried your code with a TCPCLIENT_BUF_MAX_SIZE of 512 bytes. The stalling still occurs – in fact, it’s even more visible and happens sooner than before:
What’s interesting is that including “spark_disable_cloud.h” (and using the ip byte array instead of the hostname) causes the issue to disappear, resulting in a very regular download which goes through to the end.
Hmm… I tried leaving the cloud on and adding Spark_WLAN_Loop() between reads, and it didn’t seem to help. Maybe this is something @satishgn could check out?
one thing i notice is there is no way to tell if the buffer is getting emptied… ie client available gets to 0, the while loop goes false, the code continues, client is still connected so the if loop gets skipped -> back to loop() the first if (done) is false, then data is available again so the while loop happens…
218 seems to be a magi number that the total is incrementing by each time round, even as it slows right down
i left my core in my room this morning otherwise i would give it a quick try myself.
Based on some further experiments I did, I’m starting to believe that all this does not relate to the cloud connection directly. I think having the connection enabled is just one way to trigger the stalling.
If the cloud connection is disabled, the download works perfectly, with a speed good-enough for what I originally set out to do: audio streaming. Transferring a WAV (8khz/8bit for testing) from a very simple node.js relay works great. This is pretty much the same as downloading bits of HTML from Google, as in the basic example.
However, guess what happens if I don’t read at full speed (just discarding data), but put the data in a ringbuffer instead; which is simultaneously read by an interrupt running at the sample rate of the WAV? Exactly, after about a second or two the pauses start again, with length between data bursts exponentially increasing. Which is exactly the behavior which can be seen when ‘just downloading’, but with the cloud connection on.
To me it seems like the problem is caused if available data is not immediately read, but only in specific cases – for example, adding small delays in the read loop (basic example, cloud disabled) does not result in stalling, just a slower transfer. What I can’t figure is what it is exactly that the cloud connection/my playback code is doing to trigger the issue.
I found this thread via a corresponding github issue
I think the “problem” here is normal exponential back-off from TCP NAKs at work. If the core cannot take in more TCP data and sends a NAK, the sender backs off as it should. This repeats (sometimes quickly) and the sender’s rate is slowed for the life of that connection.
Other cases of this effect come from the sender using a packet size (MTU) that is larger than the (Ethernet minimum sized) buffer on the TI CC3000. Using the cloud or any other network service which uses packets buffers on the chip will tend to make this worse.
There are some possible work-arounds like using chunked transfers or closing and reopening the connection, but none of these are very attractive.
Thanks for your post – it helps to explain and confirm my doubts on the issue I’m facing. I’m getting a very similar situation regarding a websocket implementation that works with Spacebrew. This implementation requires extended payloads that go beyond what TCPClient is able to handle. The root of my issue is the websocket sending continuous streams of data to the Spark Core in short intervals, which I suppose is similar to what @mleonh is facing.
Long story short, I had to resort to MANUAL mode and rely on a disconnect/reconnect routine in my Spark code to get around the buffer issue. Running my code with spark cloud connectivity results in a repeatable hard fault SOS:
Sorry you are having trouble! Can I ask if you are in control of the Spacebrew server? If so, you might be able to set the MTU on the server to a smaller number so that the server packetizes your payload into smaller chunks.
Another idea would be to quickly dump any data you don’t need. For instance for @BDub Facebook likes pushup man, he just throws away the first 512 bytes quickly because he knows that his good data in not in the first part of the returned data.
Yes it’s my own server, MTU 1500. Based on the buffer size set in TCPClient (128), does this mean I should start with an MTU of 128 and work both directions to find the ‘sweet spot’?
It’s certainly worth a try, although I’m averse to reducing MTUs just to get this to work with the Spark – there are other services running on my server that I don’t wish to impact down the line – assuming MTU changes are applied across the entire ethernet interface (please advise if otherwise!)
If this works, however, it will probably make sense to run my spacebrew server solo…
I will test it on my local mac, and then a raspberry pi first. Feels like i’m really close to fixing this issue. Thanks again.
So MTU of 1500 should already be ok I’m afraid. Any way you can dump unneeded data quickly?
Another thing to know is that currently using client.print() causes one packet per character to be sent. This is getting fixed but switching to client.write() can give big improvements.
The data’s formatted in JSON. I already have a means to discard data that’s not terminated properly with the closing curly braces, but there’s no way for me to bail out of the reads earlier – maybe reads with unexpected opening characters.
In any case, that approach makes it impossible for me to handle websockets beyond a single frame size (126 bytes), so it won’t be a long-term fix.
I’m using client.write(), as well as client.read(buf,len) that @ekbduffy discovered in the as-yet-undocumented TCPClient method. The problem I’m facing is more of incoming data – client.read(buf,len) causes hard faults during moments where I suspect the encoded length data in the websocket header exceeds or does not match the actual unread buffer. However, using client.read() causes inexplicable freezes and disconnections.
Another option if you can compile locally is to increase the TCPClient buffer size. I think eventually controlling these sizes will be made easier by the Spark team, but right now there are too many trade-offs to be balanced for the differing applications and a relatively small fixed size was selected.