They do fail at the periods I mentioned: a week to a month or so, unless the cloud or my whole internet goes goes down, which I would say is roughly monthly. I am not currently logging but when I wrote my Spark.publish tutorial with “uptime” I did gather days and days of data with no failures publishing every 15 seconds. I had a lot of data in a spreadsheet where I was analyzing timestamps versus millis() time to try to learn more about cloud latency at the time (which for the record seems pretty constant at around 138ms from my house).
I do think that local network conditions have an effect on the stability of your connections. I have my Sparks on a separate router that does not get a lot of other traffic. Because of this I never saw the ARP problems that the Deep Update TI patch tries to fix, for instance. People here in the forum with complicated or slow networks just seem to have more trouble.
The TI part seems to have some timeouts built-in around 6 seconds long, so if you try to connect to a nonexistent host, it will ARP for that host for about 6 seconds before giving up, for instance. Some people have thought that was broken behavior, but I really can’t fault the core on that one. If you are checking every return value from every TCP and UDP call, you can recover, but not any quicker than about 6 seconds. You asked it to do something and it tried hard for 6 seconds before giving up, is how I read that.
Finally, I certainly agree that TCPServer does not work with high-speed data in volume! Which is strange because it is a pretty thin veneer over the client code. I don’t have an explanation for that.
As I have said before, I advocated with the Spark guys for the next generation Spark (now Photon) to have a WiFi chip that was more, shall we say, professional. I have great hope for the new Photon and I think it will be much more stable just because lots of other devices have used the WiFi chips in Photon successfully.