Under some circumstances, the iMac <–> Photon connection I have going stops working. This is a snippet of the code that fails (i removed some of the variable declarations which are global):
// pre-setup
SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);
// in setup()
tcpServer.begin();
// somewhere in my main loop():
if ( tcpClient.connected() {
int i = 0;
while ( tcpClient.available() && i < MAX_TCPIP_COMMAND_LENGTH-1 )
cStr[i++] = tcpClient.read();
if ( i <= 0) { <— this is happening from time to time
response = errorResponseToJson(0);
tcpServer.write((unsigned char *)response, strlen(response));
tcpClient.stop();
}
}
else {
tcpClient = tcpServer.available();
}
When tcpClient.available() returns with zero bytes available, I send an error response to the client. On the client side, I do receive that error message, I wait for 10 seconds and create a new connection to the Photon. Generally this works one or two times but eventually this scheme fails i.e., my client hangs. When I restart my client program on the Mac, my photon picks up the comms right away without issue so it seems that the photon side is … ok (?)
What should I do to make this rock solid? Should I wait for more than 10 secs? Should I call tcpServer.begin() again? Btw, this condition seems to be happening when I have more traffic on the network (movies, downloads etc.)
Ah yes, meant 0, errors of a frustrated mind. So tcpClient.connected() == true && tcpClient.available()==0 is a valid condition is what the docs are suggesting. … right?
So any suggestions as to why this might be happening sporadically? I know this is a broad question but just some pointers to get me going would be handy. At the moment, my Mac client <–> Photon connection is continuous for hours until it fails, the error that creates this condition is unclear to me. So troubleshooting is costly.
TCP connections can stay open till a timeout occures so if your server is not permanently sending data being connected and not getting any data to read would be a common situation IMHO.
Another thing that can happen is what’s called a half open connection.
I am constantly sending data, not at an extreme rate but several messages per second (messages are around 200 bytes long). The photon nor Mac client app indicate a network failure prior to the failure mode I am detecting so I think a half-open connect is not it.
I was lucky enough just now to capture a failure event with Wireshark. All goes well for 1.5 hours or so until the photon asks for a retransmission. The client retransmits the prior message (msg looks ok), but photon device remains unhappy and continues to ask for retransmission 13 more times. Each is followed by a correct retransmission of the original message by the client. However, the 14th time, the client sends an empty message (data segment is empty).
So retransmits are normal but the photon does not accept it - what could cause this? I have a feeling something goes bonkers in the tcp/ip stack on the photon but I am just speculating here. Secondly, is that empty message from the client possibly just a time out?
One last thing to note is that if after this failure, I simply restart the client app, the photon device will pick up comms again, until the next failure happens.
I have removed all code that attempted to destroy and rebuild the TCP/IP connection. Instead I added a simple delay if tcp.Connected() == true but tcpClient.available() ==0 specifically:
if ( tcpClient.connected() {
int i = 0;
while ( tcpClient.available() && i < MAX_TCPIP_COMMAND_LENGTH-1 )
cStr[i++] = tcpClient.read();
if ( i <= 0)
delay(500);
else
tcpClient = tcpServer.available();
}
To my surprise, this scheme worked successfully for hours on end transferring the Photon ‘data base’ several times without failure. I do see the ‘delay(500)’ statement being called from time to time, usually 10-20times in a row but it always restores the communications and picks up where it left of.
Great, fantastic, I like simple fixes, but with my limited low level TCP/IP understanding, I don’t know why this works. I’d hate to put this into production not knowing what beast I just released. I’d appreciate it if anyone can enlighten me.