HTTP (maybe TCP?) Broken in 1.3.1-rc.1 on Electron, and on all versions for Boron

Hello!
I am running a slightly modified version of a firmware I have for the Electron, which works great posting ~5500 character http posts (the http posting is it’s own thread) using data collected from an ESP8266 which the Particle device talks to via UART. Unfortunately on the Boron, after enabling httpClient’s logging, I keep getting the error Error: Timeout while reading response with Status code: (usually no code at all, sometimes -1) followed by Error: Can't find HTTP response body and sometimes the device simply goes into SOS 1 (hard fault) after attempting the post, and sometimes the device loses cloud/cellular connection at random. I discovered that by reducing the message size down to ~400ish characters, the post actually will go through most of the time, although sometimes logging Error: Response body larger than buffer., and after a few of these successful tiny posts, this firmware will end up in SOS 1 (hard fault) as well. I am perplexed because if I have the boron print out the collected data instead of sending it, it seems 100% correct and no different that that of the electron, but for whatever reason the post fails on the boron. The HTTP response of the endpoint I am using to test this is also extremely short:

{"response": {
  "metadata": {
    "requestId": "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxxxx",
    "status": "success",
    "statusCode": "200"
  },
  "result": "Recieved"
}}

This endpoint also sends the raw content of all requests to me as an email, and when the boron is sending the longer messages, they never come through (so it definitely isn’t just the response having issues).
I created this super-simplified snippet below to demonstrate this issue with just a boron, although I have noticed that in this version it does not fail catastrophically as often, my theory for that is that in the full code, I am also using @rickkas7 's serialBuffer library (thanks again for updating that!!) which creates a thread of it’s own, which I’m sure leads to big issues when my http posting thread gets hung up! I am not worried about that though, because I do this same thing on the electron, and when the httpclient thread doesn’t get hung up, there are no issues between it and the serialBuffer, so I suspect it wont be a problem on the boron either once this odd issue is resolved!
Here is the super-simplified code (it needs HttpClient library):

// This #include statement was automatically added by the Particle IDE.
#include <HttpClient.h>

STARTUP(startupFunction());

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

extern void startupFunction();

static char json_str[2048];
int timeAlive_s = 0;
int lastPost_s = 0;

HttpClient http;

http_header_t headers[] = {
  { "Accept" , "*/*"},
  { NULL, NULL } // NOTE: Always terminate headers will NULL
};

http_request_t request;
http_response_t response;

//Separate thread for HTTP requests so as not to block data collection
Thread *thread;

os_mutex_t mutex;

void startupFunction() {
  System.enableFeature(FEATURE_RESET_INFO);
  os_mutex_create(&mutex);
}

void setup()
{
  // SLAVE_SERIAL_CONSOLE.begin(57600);     //initialize uart port
  Serial.begin(115200);    //initialize debug port
  // SLAVE_SERIAL_CONSOLE.setup();

  os_mutex_lock(mutex);
  thread = new Thread("httpSendData", http_send_data);

  Serial.println("Calling Cellular.connect.");

  Cellular.connect();

  while (!Cellular.ready()) {
    Serial.print(".");
    delay(333);
  };

  Serial.println("!");
  delay(100);

  Serial.print("Calling Particle.connect");

  Particle.connect();

  while (!Particle.connected()) {
    Serial.print(".");
    delay(333);
  };
  Serial.println("!");

  memset(json_str, 'A', 2048);    //Fill with 'A's to simulate data
  Serial.println("Waiting");
}


void loop()
{
  uint32_t loopTime_ms = millis();
  if ((uint32_t)timeAlive_s*1000 <= loopTime_ms) {
    timeAlive_s++;
    Serial.print(".");
    if(timeAlive_s >= 30 && (timeAlive_s - 30) >= lastPost_s){
      Serial.println("!");
      Serial.println("Posting");
      os_mutex_unlock(mutex);
      lastPost_s = timeAlive_s;
      Serial.println("Waiting");
    }
  }
  Particle.process();
}

void http_send_data(){
  while(true){
    os_mutex_lock(mutex);
    request.hostname = "HOSTNAME_HERE";     //In my case for testing, I am just using an endpoint which sends the raw request to me as an email
    request.port = 80;
    request.path = "PATH_HERE";
    request.body = json_str;

    http.post(request, response, headers);
  }
}

Any help would be immensely appreciated, I am quite excited to begin deploying Borons for new projects as well as replacing electrons in old ones, but this seems to be the last hangup blocking me!

I updated the title because I did some more testing and realized that it seems I was misunderstanding the problem here: this issue manifests itself on the electron if updated to 1.3.1-rc.1 as well! 1.2.1 and below, the above code runs perfectly fine on Electron, and all posts complete successfully and receives 200 response, but when updated to 1.3.1-rc.1, it exhibits the exact same issue as the Boron! This is with nothing else changed besides updating the device from 1.2.1 to 1.3.1-rc.1.

EDIT: I downgraded the electron back to 1.2.1 to confirm and it still had the issue, although it was working on 1.2.1 before (I think?) anyways, i then downgraded it back to 1.1.0 and it’s working again, so to confirm the issue, 1.1.0 would probably be best.

Here screenshots of serial behavior:
1.1.0:


1.3.1-rc.1 (As well as all OS versions on Boron)

Can you try increasing the timeout for the HTTP client?

e.g.

  http.client.setTimeout(10000);
1 Like

Thanks for the response!
I tried your suggestion (see updated code below) and also increased the timeout defined in HttpClient.cpp to 10000ms as well, however the issue still seems to be present, it just takes 10 seconds to throw the error now instead. (Tested this code on Boron with both 1.2.1 and 1.3.1-rc.1 with the same results.)

// This #include statement was automatically added by the Particle IDE.
#include <HttpClient.h>

STARTUP(startupFunction());
#if PLATFORM_ID == PLATFORM_ELECTRON_PRODUCTION
STARTUP(cellular_credentials_set("broadband", "", "", NULL));
#endif

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);

extern void startupFunction();

static char json_str[2048];
int timeAlive_s = 0;
int lastPost_s = 0;

HttpClient http;

http_header_t headers[] = {
  { "Accept" , "*/*"},
  { NULL, NULL } // NOTE: Always terminate headers will NULL
};

http_request_t request;
http_response_t response;

//Separate thread for HTTP requests so as not to block data collection
Thread *thread;

os_mutex_t mutex;

void startupFunction() {
  System.enableFeature(FEATURE_RESET_INFO);
  os_mutex_create(&mutex);
}

void setup()
{
  // SLAVE_SERIAL_CONSOLE.begin(57600);     //initialize uart port
  Serial.begin(115200);    //initialize debug port
  // SLAVE_SERIAL_CONSOLE.setup();
  os_mutex_lock(mutex);
  thread = new Thread("httpSendData", http_send_data);

  Serial.println("Calling Cellular.connect.");

  http.client.setTimeout(10000);

  Cellular.connect();

  while (!Cellular.ready()) {
    Serial.print(".");
    delay(333);
  };

  Serial.println("!");
  delay(100);

  // Connect to cloud. This is necessary because manual mode is turned on.
  Serial.print("Calling Particle.connect");

  Particle.connect();

  while (!Particle.connected()) {
    Serial.print(".");
    delay(333);
  };
  Serial.println("!");

  memset(json_str, 'A', 2048);    //Fill with 'A's to simulate data
  Serial.println("Waiting");
}


void loop()
{
  // Update clocks
  uint32_t loopTime_ms = millis();
  // Update the time alive counter, and test for flag updates
  if ((uint32_t)timeAlive_s*1000 <= loopTime_ms) {
    timeAlive_s++;
    Serial.print(".");
    if(timeAlive_s >= 30 && (timeAlive_s - 30) >= lastPost_s){
      Serial.println("!");
      Serial.println("Posting");
      os_mutex_unlock(mutex);
      lastPost_s = timeAlive_s;
      Serial.println("Waiting");
    }
  }
  Particle.process();
}

void http_send_data(){
  while(true){
    os_mutex_lock(mutex);
    request.hostname = "YOUR HOSTNAME";
    request.port = 80;
    request.path = "PATH";
    request.body = json_str;

    http.post(request, response, headers);
  }
}

Forgive my ignorance - why does the http process need to run in a separate thread ? I am still trying to understand the need for mutex's etc?

In the full firmware, the Electron (and going forward hopefully, the Boron) is receiving data consistently from an ESP8266 via UART, which the Particle device collects and packages into a ~5500 character string and when the payload reaches a certain size, it is uploaded as an HTTP post. By having the HTTP posting as its own thread, it does not stop the Particle device from performing its other responsibilities (it is also gathering device power data from ADCs, controlling peripherals etc).

Since I hadn’t tested this already, I just modified the code but without the use of threading (see code below) and it has the exact same issue.

// This #include statement was automatically added by the Particle IDE.
#include <HttpClient.h>

STARTUP(startupFunction());
#if PLATFORM_ID == PLATFORM_ELECTRON_PRODUCTION
STARTUP(cellular_credentials_set("broadband", "", "", NULL));
#endif

SYSTEM_MODE(SEMI_AUTOMATIC);

extern void startupFunction();

static char json_str[2048];
int timeAlive_s = 0;
int lastPost_s = 0;

HttpClient http;

http_header_t headers[] = {
  { "Accept" , "*/*"},
  { NULL, NULL } // NOTE: Always terminate headers will NULL
};

http_request_t request;
http_response_t response;

void startupFunction() {
  System.enableFeature(FEATURE_RESET_INFO);
}

void setup()
{
  // SLAVE_SERIAL_CONSOLE.begin(57600);     //initialize uart port
  Serial.begin(115200);    //initialize debug port
  // SLAVE_SERIAL_CONSOLE.setup();

  #if PLATFORM_ID == PLATFORM_BORON
  Cellular.setActiveSim(INTERNAL_SIM);
  Cellular.clearCredentials();
  #endif

  Serial.println("Calling Cellular.connect.");

  Cellular.connect();

  while (!Cellular.ready()) {
    Serial.print(".");
    delay(333);
  };

  Serial.println("!");
  delay(100);

  // Connect to cloud. This is necessary because manual mode is turned on.
  Serial.print("Calling Particle.connect");

  Particle.connect();

  while (!Particle.connected()) {
    Serial.print(".");
    delay(333);
  };
  Serial.println("!");

  memset(json_str, 'A', 2048);    //Fill with 'A's to simulate data
  Serial.println("Waiting");
}


void loop()
{
  // Update clocks
  uint32_t loopTime_ms = millis();
  // Update the time alive counter, and test for flag updates
  if ((uint32_t)timeAlive_s*1000 <= loopTime_ms) {
    timeAlive_s++;
    Serial.print(".");
    if(timeAlive_s >= 30 && (timeAlive_s - 30) >= lastPost_s){
      Serial.println("!");
      Serial.println("Posting");
      http_send_data();
      lastPost_s = timeAlive_s;
      Serial.println("Waiting");
    }
  }
  Particle.process();
}

void http_send_data(){
  request.hostname = "HOST";
  request.port = 80;
  request.path = "PATH";
  request.body = json_str;

  http.post(request, response, headers);
}

I submitted this as a Github issue here with detailed instructions for reproducing the issue, but haven’t heard anything back there, here, or from particle support, and just confirmed that this issue is still present in 1.4.0. Has anyone else encountered this issue? I feel like this is pretty straightforward to reproduce, and it is blocking our entire migration from 3g to 4g, so if this is not something that will be fixed, it will still be useful to know that ASAP since I will need to adjust our Q4 planning to account for a redesign.

I’m happy to report this issue is solved in Device OS v1.5.2