TCP Server stops accepting connections

With Photon running 0.6.2, the TCP Server stops accepting clients after a few hours.
The only way I have found to get it going again is to restart the Photon.
For now, I am having to restart the Photon by code every hour to ensure the TCP Server continues accepting clients, but this is a problem for other reasons.
I would rather just re-start the TCP Server, but there does not seem to be any way to do that?
Hopefully someone has a solution to this issue.
Thanks,
TJ

Looks like I may have solved this issue.

With this code the TCPServer stop accepting connections after a few hours.

**TCPClient webClient;**
TCPServer webServer = TCPServer(80);

void setup() {
            webServer.begin();
            .........
            .........
}

void loop() {
    WebServerService();
}

void WebServerService() {
    webClient = webServer.available();
    if(!webClient){
        return;
    }
    
    delay(100);
    
    String Request;
    char c;
    while (webClient.available())
    {
        c = webClient.read();
        Request+=c;
        delay(1);
    }
........
........
}

With this code it seems to be ok so far after 24 hours.

TCPServer webServer = TCPServer(80);

void setup() {
            webServer.begin();
            .........
            .........
}

void loop() {
    WebServerService();
}

void WebServerService() {
    **TCPClient webClient;**
    webClient = webServer.available();
    if(!webClient){
        return;
    }
    
    delay(100);
    
    String Request;
    char c;
    while (webClient.available())
    {
        c = webClient.read();
        Request+=c;
        delay(1);
    }
........
........
}

I spoke too soon. TCPServer stopped accepting clients again after about 25 hours.

Do you know what the main LED does when it fails? I think it is likely that you are fragmenting memory and cannot allocate more for the String object.

Can you switch to a statically allocated (usually declared globally) char array that is large enough to hold your largest client read string?

Hi bko,
The main LED is normal online/connected when the server stop accepting clients. All my code continues to run normally, except that TCPServer.available does not return clients anymore.It starts working again after I restart the Photon.
Ok., I will try a global static char array for the client read string.
Thanks for your suggestion,
TJ

Can you provide a rough estimate of how the connection is used? For example: the other side connects to the Photon and sends 100 bytes and disconnects. Or keeps the connection open sending 100 bytes per second to the Photon. Or whatever the conditions are.

I think I’ve tested this before, but I’ll try to run a test that sort of approximates what you’re trying to do.

1 Like

Hi Rickkas,
The other side is sending something similar to this very infrequently. Maybe once an hour or so.

GET /?EndPoint=LED&Action=On HTTP/1.1
Host: 192.168.0.208:2000
Accept: /

I dont think the TCP server stops accepting clients right after it accepts the last one. I think it happens sometimes when the server is just idle waiting for clients. Could it be related to a Wifi disconnect and then auto reconnect? Would the TCP server auto start listening for clients again in that scenario?

I put in a bunch more debug messages to serial and am going to monitor to see if I can detect when it goes down.

Thanks,
TJ

Oh, if Wi-Fi disconnects you absolutely have to TCPServer begin() again. It does not automatically happen, so you typically have to monitor WiFi.ready to know if you’ve disconnected from Wi-Fi and need to reestablish your listeners. This applies to both TCP and UDP. And, therefore, you kind of have to use either SYSTEM_THREAD(ENABLED), or use a non-AUTOMATIC system mode, otherwise you can’t tell if you’ve disconnected from Wi-Fi and need to restart your listener.

1 Like

I am already using SYSTEM_MODE(SEMI_AUTOMATIC);

That is what I thought too, so I did this test:

Get the Photon running, connected to Wifi and Particle
Verified TCP server is accepting clients
Powered down my Wifi router
Verified Photon showed it was no longer connected
Powered up my Wifi router
Photon reconnected to Wifi and Particle
Verified TCP server is NOT accepting clients
Through serial/code ran TCPServer begin()
Photon crashed with SOS

Eeeeeks!
TJ

I have now tried many different ways, Running TCPServer.begin() a second time always results in a Photon crash.
Even if I in code, I shutdown the Wifi connection and then re-establish the Wifi connection and the Particle connection, and then TCPServer.begin() results in a crash.
It seems that if the TCP server is listening and the Wifi goes down either by external loss of connection or by Wifi.off() the TCP server cannot be restarted without restarting the Photon.
I really think we are missing some way to “stop” the TCP server and start it again through code. Where is TCPServer.end()?
TJ

This is my server test code. I’m able to turn off the Wi-Fi access point, the Photon goes into blinking green, plug the access point back in, and the Photon eventually gets back to breathing cyan. I can make a HTTP connection to the server after losing the Wi-Fi.

#include "Particle.h"

SYSTEM_THREAD(ENABLED);

// Pages
// [start a87cffa2-e342-4f2c-9070-72b710d606c3]
// name=/index.html contentType=text/html size=293 modified=2016-11-08 12:50:47
const char fileData0[] = 
"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n"
"<!DOCTYPE html>\n"
"<html xmlns=\"http://www.w3.org/1999/xhtml\">\n"
"<head>\n"
"<meta http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\" />\n"
"\n"
"<title>Test Page</title>\n"
"\n"
"</head>\n"
"<body>\n"
"\n"
"<div id=\"main\">\n"
"\n"
"<p>Test Page</p>\n"
"\n"
"</div> <!-- main -->\n"
"\n"
"</body>\n"
"</html>\n"
"";

typedef struct {
	const char *name;
	const char *mimeType;
	const uint8_t *data;
	size_t dataSize;
	unsigned long modified;
	bool isBinary;
	int options;
} FileInfo;

const FileInfo fileInfo[] = {
	{"/index.html", "text/html", (const uint8_t *)fileData0, sizeof(fileData0) - 1, 1478627447, FALSE, 0},
	{NULL, NULL, 0, 0, FALSE, 0}
};
// [end a87cffa2-e342-4f2c-9070-72b710d606c3]

static const char *dateFormatStr = "%a, %d %b %Y %T %z";

enum State {
	FREE_STATE,
	READ_REQUEST_STATE,
	WRITE_HEADER_STATE,
	WRITE_RESPONSE_STATE
};

const int MAX_CLIENTS = 5;
const int LISTEN_PORT = 7123;
const int CLIENT_BUF_SIZE = 1024;
const int MAX_TO_WRITE = 1024;
const unsigned long INACTIVITY_TIMEOUT_MS = 30000;

class ClientConnection {
public:
	ClientConnection();
	virtual ~ClientConnection();

	void loop();
	bool accept();

protected:
	void clear();
	void readRequest();
	void generateResponseHeader();
	void writeResponse();

private:
	uint8_t clientBuf[CLIENT_BUF_SIZE+1];
	State state;
	int clientId;
	TCPClient client;
	int readOffset;
	int writeOffset;
	unsigned long lastUse;
	time_t startTime;

	// Response data
	int responseCode;
	String responseStr;
	const FileInfo *fileToSend;

	const uint8_t *sendBuf;
	size_t sendOffset;
	size_t sendLen;

};


String localIP;
TCPServer server(LISTEN_PORT);
ClientConnection clients[MAX_CLIENTS];
int nextClientId = 1;
bool wifiUp = false;

void setup() {
	Serial.begin(9600);


	// From CLI, use something like:
	// particle get test5 localip
	// to get the IP address of the Photon (replace "test5" with your device name)
	Particle.variable("localip", localIP);
}

void loop() {
	if (WiFi.ready()) {
		if (!wifiUp) {
			Serial.println("wifi up");

			// WiFi.localIP() will return 0.0.0.0 sometimes immediately after WiFi.ready()
			// This shouldn't happen very often, so a 500 millisecond delay won't be a problem.
			delay(500);
			localIP = WiFi.localIP(); // localIP must be a global variable
			Serial.printlnf("server=%s:%d", localIP.c_str(), LISTEN_PORT);

			server.begin();
			wifiUp = true;
		}

	}
	else {
		if (wifiUp) {
			Serial.println("wifi down");
			wifiUp = false;
		}
	}

	// Handle any existing connections
	for(int ii = 0; ii < MAX_CLIENTS; ii++) {
		clients[ii].loop();
	}

	// Accept a new one if there is one waiting (and we have a free client)
	for(int ii = 0; ii < MAX_CLIENTS; ii++) {
		if (clients[ii].accept()) {
			break;
		}
	}

}


ClientConnection::ClientConnection() : state(FREE_STATE) {
	clear();
}

ClientConnection::~ClientConnection() {
}

void ClientConnection::loop() {
	if (state == FREE_STATE) {
		return;
	}

	if (client.connected()) {
		switch(state) {
		case READ_REQUEST_STATE:
			readRequest();
			break;

		case WRITE_HEADER_STATE:
		case WRITE_RESPONSE_STATE:
			writeResponse();
			break;
		}

		if (millis() - lastUse > INACTIVITY_TIMEOUT_MS) {
			Serial.printlnf("%d: inactivity timeout", clientId);
			client.stop();
			clear();
		}
	}
	else {
		Serial.printlnf("%d: client disconnected", clientId);
		client.stop();
		clear();
	}
}

bool ClientConnection::accept() {
	if (state != FREE_STATE) {
		return false;
	}

	client = server.available();
	if (client.connected()) {
		lastUse = millis();
		state = READ_REQUEST_STATE;
		clientId = nextClientId++;
		startTime = Time.now();
		Serial.printlnf("%d: connection accepted", clientId);
	}
	return true;
}

void ClientConnection::clear() {
	lastUse = 0;
	readOffset = 0;
	writeOffset = 0;
	state = FREE_STATE;
	fileToSend = 0;
}

void ClientConnection::readRequest() {
	// Note: client.read returns -1 if there is no data; there is no need to call available(),
	// which basically does the same check as the one inside read().

	size_t toRead = CLIENT_BUF_SIZE - readOffset;
	if (toRead == 0) {
		// Didn't get end of header
		Serial.printlnf("%d: didn't receive end-of-header", clientId);
		client.stop();
		return;
	}

	int count = client.read(&clientBuf[readOffset], toRead);
	if (count > 0) {
		readOffset += count;
		clientBuf[readOffset] = 0;

		if (strstr((const char *)clientBuf, "\015\012\015\012")) {
			// Ignore the actual request and just return the index.html data
			responseCode = 200;
			responseStr = "OK";
			fileToSend = &fileInfo[0];

			Serial.printlnf("%d: sending %s", clientId, fileToSend->name);
			generateResponseHeader();
		}
		lastUse = millis();
	}
}



void ClientConnection::generateResponseHeader() {
	char *dst = (char *)clientBuf;
	char *end = &dst[CLIENT_BUF_SIZE];

	// Generate HTTP response header
	// HTTP/1.0 200 OK
	dst += snprintf(dst, end - dst, "HTTP/1.0 %d %s\r\n", responseCode, responseStr.c_str());

	// Date
	String s = Time.format(Time.now(), dateFormatStr);
	dst += snprintf(dst, end - dst, "Date: %s\r\n", s.c_str());

	if (responseCode == 200 && fileToSend) {
		// Content-Type
		if (fileToSend->mimeType) {
			dst += snprintf(dst, end - dst, "Content-Type: %s\r\n", fileToSend->mimeType);
		}

		// Content-Length is the length if known. contentLength is initialized to -1 (not known)
		// but it's good to set it if you know, because not settings a content length means keepalive
		// cannot be used.
		// For HEAD, Content-Length is the length the body would be, not the actual length (0 for HEAD).
		if (fileToSend->dataSize >= 0) {
			dst += snprintf(dst, end - dst, "Content-Length: %d\r\n", fileToSend->dataSize);
		}

		// Last-Modified
		if (fileToSend->modified != 0) {
			s = Time.format(fileToSend->modified, dateFormatStr);
			dst += snprintf(dst, end - dst, "Last-Modified: %s\r\n", s.c_str());
		}
	}


	// End of header
	dst += snprintf(dst, end - dst, "\r\n");

	// Now send
	sendBuf = clientBuf;
	sendOffset = 0;
	sendLen = dst - (char *)clientBuf;
	state = WRITE_HEADER_STATE;
}


void ClientConnection::writeResponse() {
	if (sendOffset == sendLen) {
		if (state == WRITE_HEADER_STATE && fileToSend) {
			// Write body now
			sendOffset = 0;
			sendBuf = fileToSend->data;
			sendLen = fileToSend->dataSize;
		}
		else {
			// Done
			Serial.printlnf("%d: send complete", clientId);
			client.stop();
			return;
		}
	}
	size_t bytesToWrite = sendLen - sendOffset;
	if (bytesToWrite >= MAX_TO_WRITE) {
		bytesToWrite = MAX_TO_WRITE;
	}

	int count = client.write(&sendBuf[sendOffset], bytesToWrite);
	if (count == -16) {
		// Special case on Photon; buffer is full, retry later
	}
	else
	if (count > 0) {
		sendOffset += count;
	}
	else {
		Serial.printlnf("%d: error writing %d", clientId, count);
		client.stop();
	}

}

Here’s the serial log that shows the sequence of events:

1: connection accepted
1: sending /index.html
1: client disconnected
2: connection accepted
2: sending /index.html
2: client disconnected
3: connection accepted
3: sending /index.html
3: client disconnected
4: connection accepted
4: sending /index.html
4: client disconnected
5: connection accepted
wifi down
5: client disconnected
wifi up
server=192.168.2.44:7123
6: connection accepted
6: sending /index.html
6: client disconnected
7: connection accepted
7: sending /index.html
7: client disconnected
8: connection accepted
8: inactivity timeout
2 Likes

Hi Rick,
Thank you much for sharing your code.

Here is what your code does on my Photon.
I started the Photon
Sent a TCP request which was received, shows an error but I believe thats because the sender is not handling the reply appropriately which should not be an issue for this test.
I then restarted my Wifi router
The log shows the disconnect, the reconnect, and the TCP server restart
Within a couple seconds the Wifi is not “ready” again and NEVER reconnects. The Photon is blinking blue forever.

(2017-09-28) (12:01:07.312) 1: connection accepted
(2017-09-28) (12:01:07.312) 1: sending /index.html
(2017-09-28) (12:01:07.331) 1: error writing -18
(2017-09-28) (12:01:07.331) 1: client disconnected
(2017-09-28) (12:01:52.736) wifi down
(2017-09-28) (12:02:31.905) wifi up
(2017-09-28) (12:02:32.405) server=192.168.0.208:2000
(2017-09-28) (12:02:37.030) wifi down

Now I am starting to believe this issue is related router compatibility.
I now notice that when I restart my router, its 2.4G light goes on for a few seconds, then off for a few seconds and then on again and stays on. This seems to related to what the Photon log shows as well. However, the Photon does not reconnect the second time 2.4G comes up. All my other non-Photon devices (computers, cameras, etc) do reconnect correctly to Wifi after I restart my router.

I am at a loss at what to do at this point except try a different router :frowning:
TJ

1 Like

Trying a different access point might be good to help isolate the problem as your symptoms are strange.

Incidentally, I use TP-LINK TL-WR702 access points for testing. They’re cheap, tiny, and powered by USB. That way I can unplug either the power or the Ethernet to test various failure scenarios without affecting my actual network. They can also be used in hotels and other captive portal situations where the Photon can’t connect as they can do Wi-Fi to Wi-Fi as well as Ethernet to Wi-Fi.

1 Like

Same results with entirely different Wifi router.
For now I have decided to do a System.reset on !Wifi.ready. Its not elegant but at least its workable.
I hope Particle will look into what happens to the state of TCPserver when Wifi is lost and then restored. I think there is some weirdness going on there, but I just cant quite put my finger on it. I suspect its something about how the Wifi goes down with certain routers, leaves the Photon is a strange state. I really think there needs to be a way in code to stop and then restart the TCPserver after a Wifi loss.
TJ

Hi bko,
I am now using a static global string for the TCP reply, but it didnt solve the issue.
Thanks,
TJ

Thanks for trying! I think it is better practice on small devices such as these.

It looks like your discussion with @rickkas7 is fruitful and I would expand on his test code. Have you tried calling server.stop(); before trying bring everything back up again?

The code in TCPserver::stop closes the underlying TCP connection and marks the socket as not valid.

I wonder if you are somehow running out of TCP sockets in the layer underneath.

1 Like

Hu? Server.stop? I didn’t know that existed. Its not in the Particle reference docs.
I see Client.stop but not Server.stop.
TJ

That seems to be an omission in the docs
https://github.com/spark/firmware/blob/develop/wiring/inc/spark_wiring_tcpserver.h

class TCPServer : public Print {
private:
	uint16_t _port;
	network_interface_t _nif;
	sock_handle_t _sock;
	TCPClient _client;

public:
	TCPServer(uint16_t, network_interface_t nif=0);
        ~TCPServer() { stop(); }

	TCPClient available();
	virtual bool begin();
	virtual size_t write(uint8_t);
	virtual size_t write(const uint8_t *buf, size_t size);
        void stop();
	using Print::write;
};

(implemented 2 years ago)

1 Like