Hard Fault after TCP Connection and WiFi loss

Hi there particle community,

first of all i would like to thank Particle and his engineers for the great products and you have all my respect for the achievments you’ve made over the years!

We are using the P1 as a control unit for a Product that is connected only to the users home wifi network. The user can mirror the device data via an app wich reads data as a client. So we open up a server to write the data.

The device is in Manual Mode and System Thread Mode and must be able to handle sudden wifi network loss and reconnection. Also there is a Off-Mode where the wifi module will be disconnected and turned off.
This all works fine, until there was a TCP connection established before.

So here is the behaviour of the P1:

  1. Connecting via App to the device and reading data. All clean -> P1 breathing green.
  2. Wifi network is deactivated -> P1 blinking green
  3. Wifi network activated -> P1 blinking red (1 blink beteen SOS pattern, so Hard Fault!)

This also happens when the device is going into Off-Mode and WiFi.disconnect or WiFi.off is called.
When there was no TCP connection established before, the P1 is breathing white.

Here is some code so you can see whats going on:

class Communication
{
   ...
  TCPServer tcp_server = TCPServer(777);
  TCPClient tcp_client;
  ...
}
```
Here is how the device handles the Server-Client communication
````cpp
void Communication::init_Network()
{ 
    delay(250);
    WiFi.on();
    delay(100);

    if(WiFi.hasCredentials() && !WiFi.connecting() && !WiFi.ready())
    {
        WiFi.connect(WIFI_CONNECT_SKIP_LISTEN);
    }
    else
    {
        if(!WiFi.connecting()){
            WiFi.disconnect();
            delay(250);
            WiFi.off();
        }   
    }
    
    if(WiFi.ready()){
        init_tcp_server();
    }

    return;
}
//-------------------------------------------------------------------------------------------------------------------
void Communication::init_tcp_server()
{
    tcp_server.begin();
    delay(250);
}
//-------------------------------------------------------------------------------------------------------------------
void Communication::close_connections()
{
    tcp_client.flush();
    tcp_client.stop();
}
//-------------------------------------------------------------------------------------------------------------------
bool Communication::connection_tcp()
{
    if(WiFi.ready())
    {
        if (tcp_client.connected()) // There is a connected client
        { 
            if (!tcp_connection_status) {
                tcp_connection_status = true;
            }
        } 
        else // If no client is connected, check for a new connection
        { 
            if (tcp_connection_status) {
                tcp_connection_status = false;
            }
            tcp_client.flush(); 
            tcp_client.stop();
            tcp_client = tcp_server.available();
        }
    }
    else
    {
        tcp_client.flush();
        tcp_client.stop();
        tcp_connection_status = false;
    }
    
    return tcp_connection_status;
}
//-------------------------------------------------------------------------------------------------------------------

int Communication::read_tcp() 
{
    int CLBytes = 0;

    if (connection_tcp()) 
    {
        if (tcp_client.available()) {
            CLBytes = tcp_client.read();
        }
    }

    return CLBytes;
}
//-------------------------------------------------------------------------------------------------------------------

size_t Communication::write_tcp(uint8_t *buffer, size_t size) 
{
    int SEBytes = 0;

    if (tcp_client.available()) {
        SEBytes = tcp_server.write(buffer, size);
    }
    
    return SEBytes;
}
```

The wifi connection is simply initiated by:

````cpp
void Device::init_Device()
{
     ...
     tcp_channel.init_Network();
     ...
}
```


Here is how the device handles the wifi connection
````cpp
void Device::check_network_health()
{

    if(WiFi.ready() && !sleep_mode && !ap_setup  && !WiFi.connecting())
    {
       ... (some uninteresting procedures) .... 
        read_network_request();
    }
    else if(!WiFi.ready() && !sleep_mode && !ap_setup && WiFi.connecting() && network_reconnect_timeout)
    {
       ... (some uninteresting procedures) .... 
       tcp_channel.close_connections();
        ...
    }
    else
    {
        tcp_channel.close_connections();

    }
}
```

When we put the device in Off-Mode:
````cpp
void Device::device_sleep()
{
    ... (some uninteresting procedures) ...
    tcp_channel.close_connections();
    WiFi.disconnect();
    delay(500);
    WiFi.off();
    delay(500);
    ...
}
```

It seems that the Particle Firmware doesnt close the sockets appropriate, so the Wifi Module gets an Error? 

We would appreciate any help we can get on this.

Thank you very much!

I was able to reproduce something similar with a test program I wrote some time ago. I just ran it again on 0.5.0 on a Photon. It also uses MANUAL system mode with threading enabled and a TCP server.

  1. Flash the program. Breathes green as would be expected since it only enables Wi-Fi, not the cloud.
  2. Telnet to port 23, since that’s what the program tests.
  3. Disconnect the Wi-Fi. I powered off the Wi-Fi access point this Photon connects through. Photon blinks green.
  4. Turn Wi-Fi back on. Photon breathes green.
  5. Telnet to port 23 again. SOS!

Now I won’t guarantee that I don’t have a bug somewhere in my program, but at least this is an easy way to reproduce something similar.

// Manual System Mode Telnet Example - Wi-Fi only, no Internet/cloud access
#include "Particle.h"

SYSTEM_MODE(MANUAL);

// System Thread enabled is not required for using manual mode. However, if you have code that needs to
// run constantly you should use it, even with manual mode. The reason is that certain operations when
// connecting to the cloud will block either outside of loop(), so your loop() won't be called for 20
// seconds or longer) or in Particle.process (normally very quick, it might take 2-3 seconds). These
// delays go away when the System thread is used so your loop() will be called very regularly.
SYSTEM_THREAD(ENABLED);

TCPServer server = TCPServer(23);
TCPClient client;

enum State { WIFI_CONNECT, WIFI_CONNECT_WAIT, SERVER_CONNECT_WAIT, SERVER_HANDLE_CLIENT };
State state = WIFI_CONNECT;
unsigned long stateTime = 0;
String localIP;
unsigned long lastLoopExit = 0;

// This sample code will output a message via serial if it takes longer than timeWarnMs in one of these cases:
// 1. Outside of loop - it takes an unusually long time before your loop is called called again
// 2. Your code inside loop
// 3. Time to make the Particle.process call
const unsigned long timeWarnMs = 100;

void setup() {
	Serial.begin(9600);
}

void loop() {
	unsigned long enterLoop = millis();

	if ((lastLoopExit != 0) && ((enterLoop - lastLoopExit) > timeWarnMs)) {
		Serial.printlnf("%ld ms since last loop call", (enterLoop - lastLoopExit));
	}

	if (WiFi.ready()) {
		switch(state) {
		case WIFI_CONNECT_WAIT:
			Serial.println("connected to Wi-Fi!");
			state = SERVER_CONNECT_WAIT;

			// It appears you must server.stop() and server.begin() if you lose your Wi-Fi connection.
			server.begin();

			// The other example using the cloud uses a Particle.variable here, but this example avoids
			// using the cloud at all so it just prints the IP address to serial
			localIP = WiFi.localIP();
			Serial.println(localIP);

			break;

		case SERVER_CONNECT_WAIT:
			if (client.connected()) {
				// A TCP client has connected to the server.
				state = SERVER_HANDLE_CLIENT;
				stateTime = millis();
			}
			else {
				// Check for an incoming connection. This is called repeatedly until a connection is made.
				client = server.available();
			}
			break;

		case SERVER_HANDLE_CLIENT:
			if (client.connected()) {
				int count = 0;

				// Echo bytes back to the client while we have input bytes, but not too many.
				// (If there are lots of bytes outstanding we want to process them in chunks to
				// avoid starving the rest of the system by spending too much time in the loop)
				while (client.available() && count++ < 128) {
					int c = client.read();

					// By the way, the actual telnet program will sent a bunch of telnet escape sequences
					// which appears as random garbage characters in serial. This isn't really a bug,
					// it's just that in the interest of clarity this isn't really a telnet server.
					client.write(c);
					Serial.write(c);
				}

				if (count > 0) {
					// We received bytes, reset the timeout counter
					stateTime = millis();
				}
				else {
					// If we don't receive any bytes in 15 seconds, disconnect the client
					// In a real program you'd probably make this timeout much longer.
					if (millis() - stateTime > 15000) {
						Serial.println("client timeout");
						client.stop();
						state = SERVER_CONNECT_WAIT;
					}
				}
			}
			else {
				// Disconnected
				Serial.println("client disconnected");
				client.stop();
				state = SERVER_CONNECT_WAIT;
			}
			break;
		}
	}
	else {
		// WiFi.ready() is false here

		switch(state) {
		case WIFI_CONNECT:
			// Not connected to the cloud, either we just started up or the connection
			// was broken and we need to reconnect
			Serial.println("attempting to connect to WiFi");
			WiFi.connect();
			stateTime = millis();
			state = WIFI_CONNECT_WAIT;
			break;

		case WIFI_CONNECT_WAIT:
			if (millis() - stateTime > 60000) {
				// Allow 60 seconds for connecting; if we fail to connect, try again
				Serial.println("failed to connect");
				state = WIFI_CONNECT;
			}
			break;

		default:
			Serial.println("WiFi connection lost, retry connect");
			WiFi.disconnect();

			// Important: If you're using a server socket, be sure to stop and begin it again
			// otherwise you won't be able to make a new connection to the server after losing
			// your network connection.
			client.stop();
			server.stop();
			state = WIFI_CONNECT;
			break;
		}
	}

	// Put your code that needs to run whether you're connected or not here

	if ((millis() - enterLoop) > timeWarnMs) {
		Serial.printlnf("%ld ms spent in loop (our code)", (millis() - enterLoop));
	}

	// Only necessary in manual mode
	unsigned long beforeProcess = millis();

	Particle.process();

	if ((millis() - beforeProcess) > timeWarnMs) {
		Serial.printlnf("%ld ms spent in Particle.process", (millis() - beforeProcess));
	}


	lastLoopExit = millis();
}

1 Like

@rickkas7 i could not reproduce the Fail with your code. It works fine for me. Have you closed the Telnet connection before powering off the Wi-Fi? Do you get the Fail everytime you test this?

I tried our firmware Code also now with Telnet and Socket Test and this worked perfectly! Even when the Connection still remained open, there was no Fail!

So i was wondering what may be the difference with the connection Handling between the App and Telnet /Socket Test. So i used Wireshark an took a look at the connection Messages (sortet by Destination IP Adress):

From Android:

From Windows 7 via Socket Test:

Can someone tell if there is a main difference? Is the App overloading the Wifi Module with too fast requests or something?