Electron in "Connected" state is actually dead


#1

Hi,

We have eight 2G Electrons (our own SIM cards) in our offices that are set up pretty much the same with the exception of some hard-coded credentials in the firmware. All of them have been online for a few days. Two of these devices have some issues.

One of them works fine for a few hours then it loses it’s data connection (we can observe this in a SIM manangement platform) and becomes unable to send any data. The weird part is that the RGB LED keeps on breathing cyan (Connnected) and the Particle dashboard shows the device as online. We implemented a functionality that calls System.reset() whenever the device receives an SMS or a call but this one just sits there looking cyan :smile: . The SMS is delivered just fine. In that regard, we can see it registering on the GSM network after the initial failure (and a few more times after that) but it never requests a data connection.

The other device has a bit of a reset problem: it keeps resetting once every 30min. to 3h. Otherwise it has no problem. After it resets, it starts sending data and everything is peachy.

Could these problems be the effects of some hardware faults? Is there any way to identify them?

Has this kind of behaviour every been observed before? Does anyone have any idea about what could be the problem? Please let us know, even if it’s something that seems far fetched.

Thank you!


#2

These indicators can’t be trusted for the UDP connection Electrons use.

Have you set the keepAlive() according to the needs for the respective SIMs?

How do these devices behave with a ordinary code like Tinker or Blinky?


#3

Hi,

We went on to further investigate this issue but weren’t able to reproduce the problem, even though three other Electrons exhibited the same behaviour. Funny enough, after a reset, they went back to working properly and didn’t have this “getting stuck” problem anymore (for now).

A few days ago we were trying to obtain cell environment information by means of AT commands. AT+CGED=3 (serving cell) always returned the expected result but AT+CGED=0 or 5 (serving and neighbour cells) rarely returned anything at all. Most of the time it would just timeout.

The interesting part (where we got really lucky) is that if we would try to do a Cellular.disconnect() or Cellular.off() after such a timeout, the Electron would just go into that weird state of looking connected but actually being dead.
So we got to a point where we could consistently reproduce the problem. Here is the code we’re using:

CellLocation.h

#ifndef CellLocation_H
#define CellLocation_H

#include "string.h"

using namespace std;

void setupSerial() {
  Serial.begin(9600);

  while(!Serial.available()) {
    Particle.process();
  }

  while(Serial.available()) {
    Serial.read();
  }

  Serial.println("Serial start.");
}

void fullReset() {
  Particle.disconnect();
  Cellular.disconnect();
  Cellular.off();
  System.reset();
}

int getCellInfoCallback(int type, const char* response, int responseLength, string* const cellInfo) {
  int returnValue = RESP_ERROR;

  /* Debugging */
  // string resp;
  // resp.append(response, 2, responseLength - 4);
  // Serial.printlnf("t: %d. l: %d. r: %s.", type, responseLength, resp.c_str());

  switch (type) {
    case TYPE_UNKNOWN:
    case TYPE_TEXT:
    case TYPE_PLUS:
      cellInfo->append(response, 2, responseLength - 4);

      returnValue = WAIT;
      break;
    case TYPE_OK:
      returnValue = RESP_OK;
      break;
    case TYPE_ERROR:
      returnValue = RESP_ERROR;
      break;
    default:
      returnValue = RESP_ERROR;
      break;
  }

  return returnValue;
}

string getCellInfo() {
  int retries = 0, result;
  string cellInfo;

  while (retries < 3
          && (result = Cellular.command(getCellInfoCallback, &cellInfo, 10000, "AT+CGED=5\r\n")) != RESP_OK) {

    delay(30);
    retries++;
    Particle.process();

    Serial.printlnf("Retry. Result: %d.", result);
  }

  if (retries >= 3) {
    fullReset();
  }

  return cellInfo;
}

#endif

CellLocation.cpp

#include "application.h"
#include "CellLocation.h"

SYSTEM_MODE(SEMI_AUTOMATIC);
STARTUP(cellular_credentials_set("our.own.apn", "", "", NULL));

int ledPin = D7;
string info;

void setup() {
  pinMode(ledPin, OUTPUT);

  setupSerial();

  Cellular.connect();

  while(!Cellular.ready()) {
    Particle.process();
  }
  
  //Particle.connect();
}

void loop() {
  digitalWrite(ledPin, HIGH);
  delay(500);

  info = getCellInfo();
  Serial.printlnf("Cell info: %s.", info.c_str());

  fullReset();

  digitalWrite(ledPin, LOW);
  delay(500);
}

We’ve experimented with different timeout values and number of retries but the result is the same: when CGED times out, the fullReset function will just get stuck at Cellular.off(). If we changed CGED=5 to CGED=3 (which never times out) everything works fine.


#4

Firstly I do not know what is the point of reset the system each cycle?
If you use yours SIM card should reduce ping times from 23 minutes to 120 seconds or for some providers should be reduced the ping to 45-60 sec.

Particle.keepAlive(45);
https://docs.particle.io/reference/firmware/electron/#particle-keepalive-

If you want you can incorporate into your code and Application Watchdog.
https://docs.particle.io/reference/firmware/electron/#application-watchdog

On the following links you have two wonderful tutorials from @rickkas7.


SOS Hard Fault recovery - Electron?
#5

I have had the same problem with multiple Electrons. Check your firmware version. I had your exact issue with 0.4.8 version of the firmware. Updating to 0.5.3 seems to have corrected this issue. I am still running tests to make sure that the modems stay connected, so I’ll post if they fail again.

I also had to do the firmware update over the CLI, which seems to be much more reliable than the firmware manager.


#6

@developer_bt
The point of the reset is to reproduce the problem. When the fullReset() function gets to Cellular.disconnect() or Cellular.off() the Electron “freezes” as described above.
I’ve tried using different ping values but it did nothing. Anyway, I don’t believe that the keep alive has anything to do with this problem since it can be reproduced in the first 20-30 seconds of running the code.

One thing I didn’t mention before is that after the Electron enters that weird state, any “soft” reset won’t work. The board will reset but it won’t be able to connect to the network again. It will just keep blinking green. A full power-cycle is needed to get it working again.

That being said, the ApplicationWatchdog won’t help us since it will reset the Electron but it won’t help it reconnect. Even if it would work, what good would it be? It would just reset the Electron all over again since it would always get stuck after trying to execute AT+CGED.

@LabSpokane
Unfortunately we run 0.5.3 on all our Electrons. What we did try though was to build the system firmware locally. I managed to install the entire toolchain, compile and flash the firmware and application locally but the problem still appears.

Thank you both for the help! If you have more advice, It would be greatly appreciated!


#7

Hello @StefanAnghel.
I have a problem when I do not use keepAlive function with SIM from another carrier.
Simply by adding KeepAlive with a value of 45 seconds problem is solved.
But the problem is probably something with the firmware. Try it one of your electron to update with the latest beta 0.6.0 rc2.
The latest version has many changes.