How to handle Cellular connection loss on Electron

Hello. I’ve been struggling for a few weeks to get a stable Cellular connection on the Electron and I failed.

I’m using a third party SIM. At first I thought the Automatic Mode should do the hard work for me, but it didn’t. After a few hours of normal operation, the code would hang and the Electron stays offline but breathing Cyan. I was publishing data via MQTT every 3 seconds so I assumed there was no keepalive needed. (I tested the code with the keepalive and it behaves the same) I then read about System Threading and after enabling it at least I had control back to my code so I could detect the fault, but I was unable to recover from it.

I did some tests with Semi-automatic Mode and Manual Mode with similar results. In my final test version I tried to connect to the MQTT broker only, without connecting to the Particle Cloud, but it didn’t give better results. The only way to recover from a connection fault is to call System.reset() but that’s not acceptable because I want the output pins to remain stable. I could get anywhere between 15 minutes to 4 hours of reliable operation with the following code.

I will appreciate any suggestions.

#include "MQTT-TLS.h"

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);

//setup for third party SIM card
#include "cellular_hal.h"
STARTUP(cellular_credentials_set("net", "", "", NULL));

const char TLScertificate[] = R"(-----BEGIN CERTIFICATE-----
-----END CERTIFICATE-----)";

#define MQTT_USER                     "user"
#define MQTT_PASSWORD                 "password"
#define LED                           D7             //on board led
#define CHARGER_OUT                   D6             //charger out
#define READ_METER_INTERVAL_ENABLED   3000           //read meter interval when charger is enabled
#define READ_METER_INTERVAL_DISABLED  20000          //read meter interval when charger is disabled

//command
const String enable_cmd           = "c1";
const String disable_cmd          = "c2";
const String reset_cmd            = "c3";

//command_response
const String charger_enabled      = "r1";
const String charger_disabled     = "r2";
const String charger_reset        = "r3";
const String unrecognized_command = "r4";

//error
const String energy_meter_error   = "e1";
const String reconnected          = "e2";
const String power_fault          = "e3";

enum EReadMeter
{
    EReadMeter_Failed = -1,     //Read meter failed
    EReadMeter_Skipped = 0,      //Read meter skiped
    EReadMeter_Success = 1      //Read meter success
};


void receiveMsg(char* topic, byte* payload, unsigned int length);
void commandParse(String command);

//MQTT setup
byte server[] = { 0,0,0,0 };
MQTT client(server, 0, receiveMsg);

String coreId;
ApplicationWatchdog wd(100000, System.reset);

double current = 1.234;
double voltage = 227.3;
long long energy = 10;
String energyString;
EReadMeter readResult = EReadMeter_Skipped;
bool reconnectedFlag = true;
bool chargerEnabled = false;


class PowerCheck {
public:

	PowerCheck();
	virtual ~PowerCheck();

	/**
	 * You must call this out of setup() to initialize the interrupt handler!
	 */
	void setup();

	/**
	 * Returns true if the Electron has power, either a USB host (computer), USB charger, or VIN power.
	 *
	 * Not interrupt or timer safe; call only from the main loop as it uses I2C to query the PMIC.
	 */
	bool getHasPower();

	/**
	 * Returns true if the Electron has a battery.
	 */
	bool getHasBattery();

	/**
	 * Returns true if the Electron is currently charging (red light on)
	 *
	 * Not interrupt or timer safe; call only from the main loop as it uses I2C to query the PMIC.
	 */
	bool getIsCharging();

private:
	void interruptHandler();

	PMIC pmic;
	volatile bool hasBattery = true;
	volatile unsigned long lastChange = 0;
};

PowerCheck::PowerCheck() {
}

PowerCheck::~PowerCheck() {
}

void PowerCheck::setup() {
	// This can't be part of the constructor because it's initialized too early.
	// Call this from setup() instead.

	// BATT_INT_PC13
	attachInterrupt(LOW_BAT_UC, &PowerCheck::interruptHandler, this, FALLING);
}

bool PowerCheck::getHasPower() {
	// Bit 2 (mask 0x4) == PG_STAT. If non-zero, power is good
	// This means we're powered off USB or VIN, so we don't know for sure if there's a battery
	byte systemStatus = pmic.getSystemStatus();
	return ((systemStatus & 0x04) != 0);
}

/**
 * Returns true if the Electron has a battery.
 */
bool PowerCheck::getHasBattery() {
	if (millis() - lastChange < 100) {
		// When there is no battery, the charge status goes rapidly between fast charge and
		// charge done, about 30 times per second.

		// Normally this case means we have no battery, but return hasBattery instead to take
		// care of the case that the state changed because the battery just became charged
		// or the charger was plugged in or unplugged, etc.
		return hasBattery;
	}
	else {
		// It's been more than a 100 ms. since the charge status changed, assume that there is
		// a battery
		return true;
	}
}


/**
 * Returns true if the Electron is currently charging (red light on)
 */
bool PowerCheck::getIsCharging() {
	if (getHasBattery()) {
		byte systemStatus = pmic.getSystemStatus();

		// Bit 5 CHRG_STAT[1] R
		// Bit 4 CHRG_STAT[0] R
		// 00 – Not Charging, 01 – Pre-charge (<VBATLOWV), 10 – Fast Charging, 11 – Charge Termination Done
		byte chrgStat = (systemStatus >> 4) & 0x3;

		// Return true if battery is charging if in pre-charge or fast charge mode
		return (chrgStat == 1 || chrgStat == 2);
	}
	else {
		// Does not have a battery, can't be charging.
		// Don't just return the charge status because it's rapidly switching
		// between charging and done when there is no battery.
		return false;
	}
}

void PowerCheck::interruptHandler() {
	if (millis() - lastChange < 100) {
		// We very recently had a change; assume there is no battey and we're rapidly switching
		// between fast charge and charge done
		hasBattery = false;
	}
	else {
		// Note: It's quite possible that hasBattery will be false when there is a battery; the logic
		// in getHasBattery() takes this into account by checking lastChange as well.
		hasBattery = true;
	}
	lastChange = millis();
}

PowerCheck powerCheck;

String getCoreID()
{
  String coreIdentifier = "";
  char id[12];
  memcpy(id, (char *)ID1, 12);
  char hex_digit;
  for (int i = 0; i < 12; ++i)
  {
    hex_digit = 48 + (id[i] >> 4);
    if (57 < hex_digit)
     hex_digit += 39;
     coreIdentifier = coreIdentifier + hex_digit;
    hex_digit = 48 + (id[i] & 0xf);
   if (57 < hex_digit)
     hex_digit += 39;
   coreIdentifier = coreIdentifier + hex_digit;
 }
 return coreIdentifier;
}

void receiveMsg(char* topic, byte* payload, unsigned int length)
{
    char p[length + 1];
    memcpy(p, payload, length);
    p[length] = '\0';
    String message(p);

    commandParse(message);
}

void commandParse(String command)
{

  //enable charging
  if (command.equals(enable_cmd))
  {
    digitalWrite(LED, HIGH);
    digitalWrite(CHARGER_OUT, HIGH);
    //publish response
    client.publish(coreId + "/command_response", charger_enabled);
    chargerEnabled = true;
  }

  //disable charging
  else if (command.equals(disable_cmd))
  {
    digitalWrite(LED, LOW);
    digitalWrite(CHARGER_OUT, LOW);
    //publish response
    client.publish(coreId + "/command_response", charger_disabled);
    chargerEnabled = false;
  }

  //remote reset
  else if (command.equals(reset_cmd))
  {
    client.publish(coreId + "/command_response", charger_reset);
    System.reset();
  }

  //unrecognized command
  else
  {
    client.publish(coreId + "/command_response", unrecognized_command);
  }
}

String int64ToString(long long value)
{
    char outbuf[21]{};
    char *o = &outbuf[sizeof(outbuf) - 2];
    if(value == 0)
    {
       *o = '0';
       o--;
    } else
    {
      while(value > 0)
      {
        *o = (value % 10) + '0';
        value /= 10;
        --o;
      }
    }
    return String(o + 1);
}

EReadMeter readMeter()
{
    static unsigned long readPrevMillis;
    unsigned long currentMillis = millis();

    EReadMeter result = EReadMeter_Skipped;

    unsigned long readInterval;
    if (chargerEnabled)
    {
      readInterval = READ_METER_INTERVAL_ENABLED;
    } else
    {
      readInterval = READ_METER_INTERVAL_DISABLED;
    }

    if (currentMillis - readPrevMillis > readInterval)
    {
        result = EReadMeter_Success;
        readPrevMillis = currentMillis;
        energyString = int64ToString(energy);
    }

    return result;
}

bool publishData()
{
    bool result = true;

    if (client.isConnected())
    {
      if (readResult == EReadMeter_Failed)
      {
          result = client.publish(coreId + "/errors", energy_meter_error) && result;
      } else
      {
        //publish meter data
        result = client.publish(coreId + "/energy", energyString) && result;
        result = client.publish(coreId + "/current", String(current)) && result;
        result = client.publish(coreId + "/voltage", String(voltage)) && result;
      }

      //report erorrs
      if (!powerCheck.getHasPower())
      {
        result = client.publish(coreId + "/errors", power_fault) && result;
      }

      if (reconnectedFlag)
      {
        reconnectedFlag = false;
        ///////////////////////////////////////////////////////////////
        client.publish(coreId + "/debug", "connected in publishData");
        ///////////////////////////////////////////////////////////////
        result = client.publish(coreId + "/errors", reconnected) && result;
      }
    }
    //client is disconnected
    else
    {
      result = false;
    }

    return result;
}

void setup()
{

  RGB.control(true);
  RGB.color(0, 255, 0);
  wd.checkin();
  pinMode(LED, OUTPUT);
  pinMode(CHARGER_OUT, OUTPUT);
  digitalWrite(LED, LOW);
  digitalWrite(CHARGER_OUT, LOW);

  Cellular.connect();

  if (!waitFor(Cellular.ready, 60000))
  {
      System.reset();
  }
  wd.checkin();
  coreId = getCoreID();
  client.enableTls(TLScertificate, sizeof(TLScertificate));
  // connect to the server
  client.connect(coreId, MQTT_USER, MQTT_PASSWORD);

  if(!waitFor(client.isConnected, 20000))
  {
    System.reset();
  }

  ///////////////////////////////////////////////////////////////
  client.publish(coreId + "/debug", "connected in setup");
  ///////////////////////////////////////////////////////////////

  client.subscribe(coreId + "/command");
  wd.checkin();
}

void loop()
{
  static int publishFails;

  //read meter
  readResult = readMeter();

  //publish data via MQTT and count fails for error handling
  if (readResult != EReadMeter_Skipped)
  {
    if (chargerEnabled)
    {
      energy++;
      current = 1.234;
    } else
    {
      current = 0;
    }
    if (publishData())
    {
        publishFails = 0;
    }
    else
    {
      publishFails++;
    }
  }

  if (publishFails > 1)
  {
    client.disconnect();
    wd.checkin();
    Cellular.command(30000, "AT+CFUN=16\r\n");
    wd.checkin();
    Cellular.off();
    Cellular.on();
    Cellular.connect();
    if (!waitFor(Cellular.ready, 60000))
    {
      //System.reset();
      RGB.color(255, 0, 0);
    }

    client.enableTls(TLScertificate, sizeof(TLScertificate));

    // connect to the server
    client.connect(coreId, MQTT_USER, MQTT_PASSWORD);

    if (waitFor(client.isConnected, 10000))
    {
        ///////////////////////////////////////////////////////////////
        client.publish(coreId + "/debug", "connected in loop");
        ///////////////////////////////////////////////////////////////
        client.publish(coreId + "/errors", reconnected);
        client.subscribe(coreId + "/command");
    } else
    {
      RGB.color(255, 255, 0);
      //System.reset();
    }
    publishFails = 0;
  }

  //MQTT listener
  if (client.isConnected())
  {
    client.loop();
  }
  wd.checkin();
}

Have I missed it or have you not set Particle.keepAlive() in your code?
That's required to keep the UDP connection to the cloud running with 3rd party SIMs

How many seconds have you had set for that test. You may start with Particle.keepAlive(30);

I'd also try to reduce the use of String objects wherever it goes to prevent heap fragmentations, especially when dealing with varying size strings.

Thank you for your quick response.

This code does not connect to the Particle cloud. It only connects to my MQTT broker. I’m publishing data every 20 seconds in the worst case. Do I have to use some sort of keepalive in this situation ? Does the Particle.keepAlive() work when a connection to the Particle cloud is not desired?

If you are running AUTOMATIC mode, your device will connect to the cloud even if you don't use it and it will stall your code once the cloud can't be reached anymore (unless you explicitly call Particle.disconnect())
And only communicating with your MQTT broker does not act as a keep alive ping to the cloud.

So if you really don't need the cloud, you should go with SYSTEM_MODE(SEMI_AUTOMATIC) and for good measure I'd always go for SYSTEM_THREAD(ENABLED) too.

1 Like

As suggested above, I set the system mode to SEMI_AUTOMATIC and I changed some of the Strings to arrays of char. None of these solved the problem and I’m only able to recover from a lost connection via System reset. I want to be able to recover from a fault without resetting the microcontroller and then I also want find the cause of the connection loss.

Is there anything else I should try ?

Have you also added SYSTEM_THREAD(ENABLED) and tried Cellular.off(), Cellular.on() and Cellular.connect() (possibly with some delay in between) once your (now still running code) detects loss of connectivity?

Yes. I just found a why I'm not able to recover. Here's what I do when I detect a loss of connectivity :

client.disconnect(); //disconnect MQTT
Cellular.off();
Cellular.on();
Cellular.connect();
if (!waitFor(Cellular.ready, 60000))
{
System.reset();
}
//reconnect MQTT
client.enableTls(TLScertificate, sizeof(TLScertificate));
client.connect(coreId, MQTT_USER, MQTT_PASSWORD);
if (waitFor(client.isConnected, 10000))
{//on success, do something};

The problem is Cellular.ready returns TRUE immediately after calling Cellular.connect(), although the modem is obviously not ready and then the following attempt to reconnect to the MQTT broker will timeout and cause a reset. Can somebody from Particle tell me what is behind Cellular.ready ? Am I using it wrong or is it a bug in this function ?

Have you tried this part of my post?

I haven't tried it out myself yet, but I could imagine that the propagation time of the !ready from the cellular modem to the micro controler might be too slow.

How does your reconnect behave with this instead

  Cellular.off();
  delay(500);
  Cellular.on();
  Cellular.connect();
  delay(500);
  Particle.process(); // explicitly trigger the background task
  if (!waitFor(Cellular.ready, 60000))
    System.reset();
2 Likes

Thank you so much for your answers. Now it recovers properly. I have to run some longer tests for a few days or so to make sure it stays online. The particular line that made it work was the delay between Cellular.connect and Cellular.ready .
But isn’t it a bug in the Cellular class ? I certainly didn’t expect this behavior. In my experience with programming a delay never fixed a problem or even if it did, it wasn’t the proper approach for solving an issue.

I would not necessarily consider it a bug but it’s definetly worth a GitHub issue to let Particle know about the possible troubles.
Can you file an issue report on https://github.com/spark/firmware/issues ?

As pointed out, we are dealing with an asynchronous communication between the modem and the controler where race conditions can always play a role.

But a blunt approach to settle this issue would be to reset all “ready” indicators and the IP and what not inside of any of Cellular.off(), Cellular.diconnect(), …
A more tricky bit would be if any Cellular.command() calls invalidated the “ready” state.


Side note:
I had filed a possibly related issue a while ago and refered in an additional comment to this issue here
https://github.com/spark/firmware/issues/1028

Once you had filed the seperate issuel, could you also reference this issue to keep them linked?

I filed an issue and I referenced your issue as well. Thank you so much once again.

2 Likes