Electron - Advice needed on good practice for managing connection in manual mode

Hello.
We have a growing fleet of Electrons in the field. Most of the times they run fine. But some of them, from time to time, get stuck trying to reconnect to the Cloud (led blinking green) and the only solution is to do a power cycle (power off, battery off) and reconnect. System.reset or pushing the reset pin does not do anything. Similar issues in other threads:

[Electron Flashing Green - Will not connect until battery removed]
[Electron OS 1.0.1 no cloud connection after deep sleep]
[Electron sleep problems, yet again]

In an effort to avoid such state, we’re trying to manually manage the cloud connection. The objective in doing so is to detect when this led-flashing-green state happens. It seems that a full modem reset can also recover the electron from this state.

All systems are in System Thread Enabled and Manual Mode. Some are in 0.7.0 and some in 1.0.0 but the led-flashing-green state happens in both OS releases. All electrons have a daily reboot at midnight using System.sleep(SLEEP_MODE_DEEP, 30). So the problem might be in reconnecting from this reboot.

We also use Electron Sample library but only use the connection events log.

We manage the connection manually using a finite state machine. Most of the code was borrowed from the cellular helper library but simplified for our scenario. Here is the code. Appreciate your help in improving it.

The code works ok in the lab where we simulate the connectivity faults by taking the antenna off, waiting for Electron to fall into flashing green and then reconnecting the antenna.

enum ConnectStates {STARTUP_WAIT_STATE, CONNECT_STATE, CONNECT_WAIT, IDLE_CONNECTED, DISCONNECT_STATE, DO_NOT_CONNECT};
ConnectStates con_state = STARTUP_WAIT_STATE;
unsigned long stateTime = 0;

//constants for the cloud connection state machine
const unsigned long STARTUP_WAIT_TIME_MS = 2000;
const unsigned long CONNECT_WAIT_TIME_MS = 60000;
const unsigned long CLOUD_WAIT_TIME_MS = 40000;
const unsigned long DISCONNECT_WAIT_TIME_MS = 3000;
const int MAX_ATTEMPTS_TO_CONNECT = 3;
int max_disconnects = 6;

void setup(){
// nothing relevant for the FSM managing the connection
}

void loop() { 

// all main application tasks in another FSM
//in the main FSM, we decide to do a full modem reset if the connection FSM exceeds the maximum number of attempts to reconnect to the network.

switch (con_state){
    
    case STARTUP_WAIT_STATE: //wait 2 seconds before starting connection
    
      if (millis() - stateTime >= STARTUP_WAIT_TIME_MS) { //"start up wait finished - go to connect"
			stateTime = millis();
			Cellular.connect();
			con_state = CONNECT_STATE;
        }
	break;
	
	case CONNECT_STATE:
	
		if (Cellular.ready()) { //"attempting to connect to the Particle cloud..."
		    Particle.process();
		    Particle.connect();
		    con_state = CONNECT_WAIT;
		    stateTime = millis();
		  }
		else if (millis() - stateTime >= CONNECT_WAIT_TIME_MS) { //cellular not ready - disconnect & try again"
		    stateTime = millis();
		    Cellular.on();
		    con_state = DISCONNECT_STATE;
		 } 
	
	break;
	
	case CONNECT_WAIT: //wait 60secs to be connected
	
	  if (Particle.connected()){ // ...Cloud Connected, move to idle_connected state
	      attempts_to_connect = 0;
	      cloud_disconnect_count = 0;
	      con_state = IDLE_CONNECTED;
	      last_process = millis();
	      Particle.process();
	     }
	  else if (millis() - stateTime < CLOUD_WAIT_TIME_MS) {
			// Not time yet;
			break;
		}
	  else if (!Particle.connected())  { // attempt to connect unsuccesful, disconnect and retry
	        Cellular.on();
                con_state = DISCONNECT_STATE;
              stateTime = millis();
	    }
	
	break;
	
	case IDLE_CONNECTED:
	
	if (millis() - last_process >= espera_process){ //maintains connection live
           last_process = millis();
           Particle.process();
	  }
	
	if (!Particle.connected()){
	       cloud_disconnect_count++; //disconnection count to decide whether to reconnect or to do a full modem reset i
	       stateTime = millis();
	       if (cloud_disconnect_count < max_disconnects){//Cloud disconnected - simple reconnect
	             Particle.disconnect();
	             con_state = STARTUP_WAIT_STATE;
	           }
	       else { //Cloud disconnected many times - go to Disconnect State
	           cloud_disconnect_count = 0;  
	           Cellular.on(); 
                  con_state = DISCONNECT_STATE;
	        }
       }
	
	break;
	
	case DISCONNECT_STATE:
	
	  if (millis() - stateTime >= DISCONNECT_WAIT_TIME_MS) { //wait 3 seconds before turning modem off
                attempts_to_connect++; //count number of times falling to this state.
                Cellular.off();
                con_state = STARTUP_WAIT_STATE;
                stateTime = millis();
            }
	break;
	
	case DO_NOT_CONNECT: //used from the main FSM when we dont want to connect to cloud
    break;
	
 } //*************************** end of Connection FSM
}//end of loop 
  • Anything you see above that can be done better?
  • Anything above that can make the Electron go into Panic HARD_FAULT (we have some of these panic events too)

Thanks in advance
Fabio

The full modem reset function (copied from the Electron Sample library) is:

void full_modem_reset(){
      if(Particle.connected()) {
          Particle.publish("boot",String(reset_reason),PRIVATE); //publish to the cloud that we're rebooting
          delay(1000);
       } 
       
       Particle.publish("spark/device/session/end", "", PRIVATE);  //reset the session and force full handshake (!! warning!! can use 5K of data)
	   Particle.disconnect();  // Disconnect from the cloud

	    unsigned long tiempo_espera = millis();  // Wait up to 15 seconds to disconnect
	     while(Particle.connected() && millis() - tiempo_espera < 15000) {
		         delay(100); }
	   // Reset the modem and SIM card
	    Cellular.command(30000, "AT+CFUN=16\r\n"); // 16:MT silent reset (with detach from network and saving of NVM parameters), with reset of the SIM card
        delay(1000);
	    System.sleep(SLEEP_MODE_DEEP, 30);// Go into deep sleep for 30 seconds to try to reset everything.
}