TCPClient unstable in 0.4.6

Just come back form my holiday , so just upgraded all my photons to the lastest firmware.
But one of my photons which uses the TCPClient, which for the last 10 days has conecting to my local server every minute to send it data and only failing to connect only a few times …0.4.5 firmware
Now it`s connects about 1 in 5 times if that, connects about 10 times and does a red sos reboot .

Now a solid green light … wow from 10 days fully working to half a hour of your are lucky.

// Designed for CHIP_1 with two thrmistors
#include "application.h"
#include "extras.h"
int counter = 0;
int samples = 64;
int mins = 0;
int secs = 0;
int ana_avg[4];
int total_val[4];
double temp_val[4];

int val[4];

int rtn_val;

int zero_point = 0; // value at zero dergee f
double vpd = 0; // values per degree f

String cubie_str;
String idStr = Spark.deviceID();

int led1 = D7; // Instead of writing D7 over and over again, we'll write led2

SYSTEM_MODE(AUTOMATIC);

TCPClient client;

byte server[] = { 192, 168, 0, 47 };
int port = 50007;

void setup() {
  int dif1 = 0; // value dif
  double dif2 = 0; // temp dif
  double low_temp = 69.6;
  double hi_temp  = 78.1;
  int   low_value = 1828;
  int   hi_value  = 2060;
  Spark.variable("counter", &counter , INT);
//  Spark.variable("rtn_val" , &rtn_val , INT);
  pinMode(led1, OUTPUT);
  pinMode(A0,INPUT);
  pinMode(A1,INPUT);
  pinMode(A2,INPUT);
  RGB.control(true); // turn off rbg led
  RGB.color(0, 0, 0);

  dif1 = hi_value - low_value;
  dif2 = hi_temp - low_temp;
  vpd = dif1 / dif2;
  zero_point = low_value - ( low_temp * vpd );
}

void loop()
{

if ( secs != Time.second() )
{
    secs = Time.second();
    seconds_loop();

    if ( mins != Time.minute() )
    {
        mins = Time.minute();
        minutes_loop();
    }

}
toggle_led();
//  Particle.process();
  delay(100); // 1/10 second loop
}

void send_cubie()
{
  client.connect(server, port);
  delay(10);

  if ( client.connected() == TRUE )
  {
  client.flush();
  RGB.color(0, 0, 0);

  cubie_str = idStr;
  cubie_str.concat( "/");

  cubie_str.concat( temp_val[0] );
  cubie_str.concat( "/" );

  cubie_str.concat( temp_val[1] );
  cubie_str.concat( "/" );

  cubie_str.concat( temp_val[2] );
  cubie_str.concat( "/" );

  cubie_str.concat( ana_avg[3] );
  cubie_str.concat( "/" );

  cubie_str.concat( Time.timeStr() );

  client.print( cubie_str );

  //delay(1000);

  for ( int n = 0 ; n < 500 ; n++ ) // wait upto 5 seconds for a char
  {
    if ( client.available() )
    {
      break;
    }
    delay(10);
  }

    if ( client.available() )
  {
    rtn_val = client.read();

    if ( rtn_val == 1)
    {
      RGB.color(0, 128, 0);
    }
    else
    {
      RGB.color(0, 0, 128);
    }
  }
  else
  {
    rtn_val = -1;
    RGB.color(64, 0, 0);
  }}
  else
  {
     RGB.color(255, 0, 0);
  }
}

void get_readings()
{
 double temp;

  total_val[0] = 0;
  total_val[1] = 0;
  total_val[2] = 0;

  for ( int n = 0 ; n < samples ; n++ )
  {
    val[0] = 4095 - analogRead(A0);
    val[1] = 4095 - analogRead(A1);
    total_val[0] = total_val[0] + val[0];
    total_val[1] = total_val[1] + val[1];
    delay(1);
  }
  ana_avg[0] = total_val[0]/samples;               // average out the analogue samples
  ana_avg[1] = total_val[1]/samples;
  ana_avg[3] =  ana_avg[0] - ana_avg[1];

  temp = (( ana_avg[0] - zero_point ) / vpd );
  temp_val[0] = ( temp - 32 ) * .5555555556 ;

  temp = (( ana_avg[1] - zero_point ) / vpd );
  temp_val[1] = ( temp - 32 ) * .5555555556 ;

  temp_val[2] = ana_avg[3] / vpd;

}

void minutes_loop()
{
    get_readings();
    send_cubie();
}

void seconds_loop()
{
    counter ++;
}

void toggle_led()
{
    int state = digitalRead(led1);
    if ( state == HIGH )
    {
        state = LOW;
    }
    else
    {
        state = HIGH;
    }
    digitalWrite(led1, state);
}

I`ll to reflash and plug out the power for a while to see if that fixes it,

2 Likes

Thanks for reporting! I’ll test this as soon as I get a chance and pass it on to the firmware team.

Thanks,
David

I’m facing the same issue with 0.4.6 as well using SparkFun Phant library which uses the TCPClient. The very first POST to the SparkFun cloud services is working and the second crashes the sketch and leads to SOS followed by one red flash.

Removing the POST request to SparkFun cloud services resolvers the problem, no SOS.

Thx.
Markus

1 Like

I’ve swapped photons and it’s connecting better , now it connect to my server about 15 times , then a sos reboot repeated .
Before 0.4.6 i never saw a red sos light.

1 Like

Created #672 to track that problem. I can reproduce it on two different Photons, both running different applications.

So far I’ve not had my photon working more than a 12 hours without needing to do a manual reset on 0.4.6

Have you changed anything to fix the issue(s) you previously described?

The only way Ive found to fix it is go back to version 0.4.5 , which since Ive downgraded this is whats happened :
386 connects and 0 fails , and no reboots.

Hi @mhdevx
Just curious: are your apps running with automatic mode (always cloud connected) ?
Asking because I am on 0.4.6 with three photons using MQTT (so tcpclient is involved), but in semi-automatic mode.
Apps are running since four days publishing 15 messages/sec (on average) with zero disconnections until now.
So I’m asking myself: is it an tcpclient issue, or the cloud connection ?

Claudio

@duffo64 You said “until now” so does this mean you are now experiencing the same problem as the OP?

@peter_a That is good to hear, I have two follow up questions for you if that’s alright.

  1. Are you using Multi-threading in your application?
  2. With or without Multi-threading, when your Photon(s) disconnect (WiFi goes down) do you experience a hardfault on the LED (SOS + 1 red flash + SOS) ?

No @UST. I’m just saying that last reboot was 4 days ago, and still running. Last time I checked was 10 minutes ago. “Until now” was for “fingers crossed” :slight_smile:

1 Like

Full code is above , so you can see how the code Im running. No Im using the 0.4.5 code without any changes , the only way Ive changed the code above is to add publish counter for seconds running , and connection hits or misses to my server . Yes I get the red led SOS flashes , but only monitoring in its connected to my server not the cloud.
So yes it may be disconnecting from my router I don`t know .

@duff64 my apps are running in automatic mode. The problem only appears with TCPClient used. Particle cloud events or variables seems to be no problem.

Ok, so from what I see in short:

0.4.6 with TCPClient and AUTOMATIC: issues
0.4.6 with TCPClient and SEMI_AUTOMATIC: no issues

Could this make the difference ?

I see a client.connect() at the top of the send_cubie() function but I see no corresponding client.stop(). See this post by @bko:

Admittedly that’s for the core but the same principle should apply for the photon. You would appear to be creating a new socket/connection every minute until you run out of sockets. Why it works in 0.4.5 and not 0.4.6 escapes me though :grinning: .

1 Like

o.k , what’s different between 0.4.5 and 0.4.6 , on how it deals with open sockets ?.
why does it work for ever in 0.4.5 and not 0.4.6 ?

I`m relying on the other end shutting the door and closing the connection , but if that is the problem of it running out of sockets then SOSing which reboot and renews the sockets.
I try :

  • connect :
    if connected
  • send data :
  • wait for a reply
  • close connection

and see if that works.

Two things which might or might not be related here:

  • the Sparkfun datastore sometimes is unstable, I get 50x errors in the browser as well. That might be totally different service endpoints. I did not any analysis on network level.
  • the Sparkfun Phant library which I use in my application and also to reproduce the issue uses the TCPClient in a function. It is not declared globally. Each method call creates a new TCPClient.

Sounds a good idea about the problems being related to me not closing to socket after each connection .
But the fact is that there is no change if I close the port after each connection.
I was a good try :smile:
So what do they say ?, “done that” , “tried that” .

Added a stop to my code and left it running overnight and looked at my counter in the morning which said my code was running for 7 minutes.

Aw!! Too bad! Well, it certainly didn’t explain why it works on the older version of firmware and not the newer anyway.

Does anyone have a simple test case I can use to reproduce this issue? I’d love to have something added to our automated test suite so that this regression is dealt with and doesn’t reappear.