Reusing a TCPClient object after a disconnect

I’ve got a Photon that keeps losing Wifi connectivity. It wouldn’t be a problem, since it looks like it reconnects fine, but I’m also using an MQTT library to push events to an MQTT broker. When the Photon drops off-line, my connection to the MQTT broker is broken and I’m having trouble re-establishing it. At least…that’s what I think is going on. I tried to add UDP/Papertrail logging, but that’s not working all that well either. Alas…

I’m still not 100% sure what is going on, but looking in the MQTT code, I see that it creates a single TCPClient object and uses (and re-uses) that for connections. Is that safe? So, if I am connected, then the connection fails (and breaks/disconnects the underlying TCP socket), is it safe to re-use that TCPClient object to create a new connection? Or, is it better to create an entirely new TCPClient object? I’m kind of grasping at straws as to what could be preventing the reconnection.

[SOLVED] - It turns out that reusing the TCPClient object was not the underlying cause of my problems. As far as I can tell, that’s fine to do. In my case, I had to add a significant delay between reconnection attempts. At first I was delaying for 50ms between attempts. I switched this to 15000ms, based on a few posts I read talking about hard-wired timeouts for the unerlying sockets on the Photon. Once I made this change, everything seemed to work a lot better. You may be able to get away with shorter delays, perhaps as short as 5000ms, but for my case a 15s reconnection delay every few hours when it loses its connection is not a problem.

Also, specific to the MQTT library, I found an issue with the MQTT host server IP address. I was passing that in as a byte array, and that array was scoped to the setup() function. I was doing it in a way that meant it could successfully connect once, but then the array fell out of scope and was deleted from the stack. So, the next time the MQTT library tried to reconnect, instead of a byte array for the IP address, I assume it had a null pointer. The lesson is: Make sure the variables you pass to the MQTT constructor do not get deleted or collected due to scoping.

@micahwedemeyer, I recall seeing something like this in another thread. I believe the issue is that the socket is not closed when the connection is lost, leaving the socket in a bad state. The socket needs to be closed and a new one created. Perhas @ScruffR or @Moors7 can chime in here.

I’m not really using MQTT library, but I’m a bit puzzled about this, since the library calls MQTT::isConnected() all over the place and this function looks like this

bool MQTT::isConnected() {
    bool rc = (int)_client->connected();
    if (!rc) _client->stop();
    return rc;
}

So either _client->connected() does not report the connection as closed or a isConnected() check is missing at a crucial place.
A call to MQTT::disconnect() should help but may also drop pending messages.

I tried having the library create a new TCPClient when it reconnects, but this didn’t fix anything.

I’m going to try putting in a long (15 second) delay when it detects the connection failed (ie. isConnected() returns false). Hopefully that will give it time to reclaim any underlying sockets. I read somewhere else that it has a total of 7 sockets available, so perhaps there’s something in there where I’m quickly claiming them all and there’s some kind of conflict.

Mainly I’m just stumped. Everything else seems to work fine, like setting Particle variables that I can see in the online dashboard. But somehow once I lose that MQTT connection, I can never get it back.

Yeah, I also use isConnected in many places in my code, so it has plenty of places to call _client->stop() and clean everything up. Just, for some reason…it doesn’t work.

After playing with it some more, it seems that reusing the TCPClient object is not the problem. Specifically, I found that if I increase the delay before it attempts to reconnect, that seems to fix the problem. At first, I was delaying for 50ms before attempting a reconnect. I switched that to 15000ms based on reading a few things that mentioned hard-wired timeouts for the underlying Photon sockets. Following that change, everything seems to work fine. Or, at least, my device has run for several days without crashing.

Thanks for all the help!