Issues with sleep again- need advice (not waking from deep sleep)

Dear community
This is a continuation of this thread: Issues with sleep - Need advice on troubleshooting ideas Decided to open a new thread to separate the issues facing the original author as I really want to get to the bottom of my issues and find solutions. Apologies if this goes against forum etiquette.

This is the context I feel necessary to provide to get more meaningful help:

  • My devices are running on Electron 0.8.0 rc10 (some rc11). They are performing a real time remote monitoring function for machines working in car maintenance workshops mainly, and read 9 signals from the client’s machine. 5 of these signals trigger interrupts when the machine is on/off and the other 4 are digital alarms that need monitoring in real time if they’re on.
  • The machine performs a service during a period of time (30 or 40min) and our device measures some stuff and reports to the cloud. Then, after detecting no activity, it goes to sleep, mainly to save on data usage as the device is connected to power supply through VIN all the time. It waked up on the rising edge of the on/off signal from the client’s machine.
  • After migrating the firmware from 0.7 to 0.8 and introducing new firmware features, the application grew from roughly 900 lines of code to almost 2000 lines! :frowning: . Even though tests for several weeks were successful, once the new software was implemented in all machines (20 or so) I started having issues with the systems not waking up from System.sleep(pin, RISING) or even locking up a few times (constant LED cyan). Several small changes were made in an attempt to stabilize the software (explained in previous thread if you’re interested) but no real improvements.
  • As suggested in the previous thread, and in an attempt to isolate application issues vs possible firmware bugs, I migrated the structure of the application and started using the finite state machine concept. Loved it! I was able to shrink the code back to roughly 1100 lines of code keeping the intended new functionalities. Really good approach and started to migrate to FSM the other customers we have (different use cases).
  • The FSM style new code runs smoothly and more responsively without lock ups. I think we managed to get rid of all potential code blocks. Great! However, I did sacrifice the sleep feature given the bad experiences we were having in the previous app firmware version.

The current version is awake all the time, and only does a deep sleep for 10 seconds at midnight as a “maintenance” type of activity to be fresh every day (suggested by @ScruffR in the previous thread as well) . When the client’s machine is not working, the software “rests” in a small “OFF_IDLE” stage. This is the code that has been working fine for 2 weeks already. When the device does the “maintenance reset” at midnight, it reports back going into INIT_STATE, reads SoC and then OFF_IDLE again.

STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));
STARTUP(System.enableFeature(FEATURE_RESET_INFO));
SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);

// STATE Machine Variables definition
enum MainStates {INIT_STATE, OFF_IDLE, READY_TO_START, TURN_ON_STATE, IN_SERVICE, PAUSE_STATE, END_OF_SERVICE, WAIT_RESP_STATE, ERROR_STATE, MEMORY_STATE, MEMORY_RESP_STATE};
MainStates main_state = INIT_STATE;
retained MainStates old_main_state = OFF_IDLE;

void setup() {

    fuel.begin();
    fuel.quickStart();
    Time.zone(-5);
    
    if(!Cellular.ready()) Cellular.connect();
   
    // Particle functions, variables and webhook handlers added here
 
    //input pins - I deleted all pins from this snippet with exception of the one below as it uses A7 (WKP) that may interfere with the sleep more. This pin is an alarm from the machine that is normally in 0V and completely shut down when the machine is OFF. In addition, all interrupts are disabled in the OFF_IDLE state with exception of the turn_on button (pinSwitch)

    pinMode(pinTemperatura, INPUT_PULLDOWN);
     
    attachInterrupt(pinSwitch, pin_switch_ISR, CHANGE,1);
    last_reboot_day = Time.day();
    Particle.connect();
    
} //************* END OF SETUP 

void loop() {
maintain_manual_connection();

switch (main_state){
 //all other states deleted

case INIT_STATE:
    
       waitFor(Particle.connected, 30000);
       read_SoC(); //read & publish batteries State of Charge
       if (verboseMode && main_state != old_main_state) publishStateTransition();
       main_state = OFF_IDLE;
       
    break;

case OFF_IDLE:
    
      if (verboseMode && main_state != old_main_state) publishStateTransition();
      
      debounce_inicio(); //Machine's switch is activated through the ISR. Debounce (i.e wait for safe reading)
        
      if (pin_reading_ready){
          if (estado_inicio) { //machine normal ON. Waits 15 seconds to start service
              if (!estado_pausa) flag_start_in_pause = TRUE; //machine is started with the Pause button activated. It waits until pause is manually deactivated and then 15 secs more to start service 
              main_state = READY_TO_START;
            }
           
           if (!estado_inicio && estado_pausa) { // we were busy while machine started so jump straight to turn_on_state directly & miss ready_to_start state
                record_start_time();  //starts a millis based timer to measure the time of service
                main_state = TURN_ON_STATE;
            }
            
           reset_variables(); //initialiase some counters
           pin_reading_ready = FALSE;
        } 
        
       if (svc_in_memory || maq_in_memory){
          if(Time.now() - last_msg_in_memory >= wait_republish) main_state = MEMORY_STATE; //if there were errors in the webhooks and messages did not reach the Ubidots server, then we retry later when the machine is off_idle. if resending the webhooks are successful then the webhook error counter below is reset to avoid going into error_state
        }
        
        if (cloud_disconnect_count >= max_disconnects || webhook_error >= 1) main_state = ERROR_STATE; // too many cloud disconnects force a simple system.reset. Webhook errors force a full_modem_reset
        
        if (Time.day() != last_reboot_day) maintenance_reset(); //forces a reboot every day at 00.00 
        
    break;
  }
}

void maintain_manual_connection(){
    
    if (millis() - last_process >= espera_process){ //performs particle.process every 5 seconds or reconnect to Cloud
     
     last_process = millis();
      if (Particle.connected()) Particle.process();
      else {
         cloud_disconnect_count++;
         Particle.disconnect();
         Particle.process();
          for(uint32_t ms = millis(); millis() - ms < 500; Particle.process());
         Particle.connect();
         waitFor(Particle.connected, 30000);
       }
    }   
}  //**** end of maintain manual connection

void record_start_time(){
    startTime = millis(); //registers service start time
    maq_timestamp_on = Time.now(); //registers Machine ON timestamp to be published in a webhook later
}

void debounce_inicio(){ //pins are too bouncy so it waits 300ms or so after the ISR is triggered for a safe pin reading
    
if (flag_pin_switch_isr) { 
      if(millis() - inicio_triggered >= espera_inicio){
         estado_inicio = pinReadFast(pinSwitch); 
         estado_pausa = pinReadFast(pinPausa); 
         pin_reading_ready = TRUE;
         flag_pin_switch_isr = FALSE; 
       }
    }   
}


//turn-on button in the machine is sensed in an ISR. However, the pin reading in done in the main loop a few ms later to make sure we get a stable reading.

void pin_switch_ISR () { 
   
   if((micros() - last_micros_inicio) >= debouncing_inicio) { //first debounce
       last_micros_inicio = micros();
       inicio_triggered = millis();
       flag_pin_switch_isr = TRUE;
   }    
}

void maintenance_reset() {
  Particle.publish("boot","1",PRIVATE);
  delay(300);
  System.sleep(SLEEP_MODE_DEEP, 10);     
}

Thanks @chipmc. I read several times your FSM examples and “borrowed” a few bits as the verbose mode.

However, as I wanted to slowly recover the sleep features and also save on data plan, I introduced a longer deep sleep for 6 hours (midnight till 6am) and Sunday 24hrs till Monday 6am when the machines are not working. It worked 1 weekend and the second weekend one of the 2 test devices never woke up. In the code above, I added the following borrowed from @rickkas7 in When is Electron Powered By Battery to check whether the VIN supplied was still OK:

class PowerCheck {
public:
	PowerCheck();
	virtual ~PowerCheck();
	void setup();
	bool getHasPower();
	bool getHasBattery();
	bool getIsCharging();

private:
	void interruptHandler();

	PMIC pmic;
	volatile bool hasBattery = true;
	volatile unsigned long lastChange = 0;
};

PowerCheck::PowerCheck() {}

PowerCheck::~PowerCheck() {}

void PowerCheck::setup() {
	attachInterrupt(LOW_BAT_UC, &PowerCheck::interruptHandler, this, FALLING);
}

bool PowerCheck::getHasPower() {
	// Bit 2 (mask 0x4) == PG_STAT. If non-zero, power is good
	// This means we're powered off USB or VIN, so we don't know for sure if there's a battery
	byte systemStatus = pmic.getSystemStatus();
	return ((systemStatus & 0x04) != 0);
}

void PowerCheck::interruptHandler() {
	if (millis() - lastChange < 100) {
		hasBattery = false;
	}
	else {
		hasBattery = true;
	}
	lastChange = millis();
}

And changed the maintenance reset commands for this:

if (Time.weekday() == fin_de_semana) weekend_deep_sleep(); //fin_de_semana is set to 1 = Sunday in setup
        else if (Time.day() != last_reboot_day) maintenance_reset();

void weekend_deep_sleep(){
  Particle.publish("boot","4",PRIVATE);
  delay(300);
  System.sleep(SLEEP_MODE_DEEP, 30*3600);     //weekend sleep from 00.00 on Sunday to 6.00 Monday = 30hrs
}

void maintenance_reset() {
  Particle.publish("boot","1",PRIVATE);
  delay(300);
  System.sleep(SLEEP_MODE_DEEP, 6*3600);     // Night Sleep 6hrs = 6*3600 secs
}

The test device was supposed to be sleep since Sunday 00.00 but woke up on the Sunday at 3.52am (reported the status as OFF_IDLE) and then went back to sleep as I never was able to communicate with it again.
Maybe an issue with the internal clock?
Did the WKP/A7 pin managed to pick some noise and woke up the device? but if this happened, the device was supposed to be reachable all Sunday (after the premature wake up) but I was never able to reach out.

Monday morning and was still sleeping. Went to reset it manually around 9.30am and downgraded to the original working version with the 10secs midnight sleep only.

  • Do you see any problems in the code above?
  • has the internal clock a problem?
  • I’ve seen some of my other devices not going to sleep or waking up in the times they’re supposed. Some go into sleep again after start up as if the Time.zone settings were not properly set ? and the last_reboot_day = Time.day(); in the set up did not work
  • is the use of A7/WKP pin an issue even though the interrupts are disabled?
  • is there a bug in the firmware?

Thanks in advance for the help. Hope I was clear and sorry for the so long explanation.

@fenriquez I am using Photons not Electrons so bear with me. There is lot in you post. I found it very difficult to get into Sleep correctly with 0.7.0 and had to use 0.8.0 and be very careful with what I did around the Photon on the mother board. I am running the Photon off a battery backup solution that requires the motherboard to shut down power to all the peripherals (GPIO, SD controller, LCD screen, etc.). The Photon would sort of half go to sleep - keep drawing power but in other ways appear asleep. It could then only be started/woken with a reset.

Probably the key to solving your problem is to make it repeatable or to narrow the potential causes. Can you get the same condition to occur on your bench/desk? In my case the key was that it worked the first time but then subsequent calls of sleep did not. I was using eeprom and you are using Retained memory?

Also key to diagnosing the issue in my case was putting in deep trace logging.

Have you looked at the diagnostics history - any clues from the network and cloud connection/reconnection?

In your functions weekend_deep_sleep() and maintenance_reset() - it might be a good idea to check if Particle.connected() before doing Particle.publish(). In any case, 300 milliseconds doesn’t seem long enough to wait for the publish to complete before calling sleep? In otherwords, you may not be going to sleep properly and then a stray signal on WKP wakes it but not correctly.

thanks Armor for the reply. I had been testing the same software for several days in the lab and it’s always ok. Nothing happens: always wakes ups, no code blocks, etc. It’s in the field that the situation happens so I suspect that it may be the noisy environment the machines are located and potentially poorer connectivity conditions. On this front, I am now controlling manually (manual mode) the connection and there are error conditions where the electron is reset or full modem reset when there is a lot of disconnects.

I need to learn how to do the deep trace logging. However, that’s one of the reasons I moved the code to FSM structure so I know where it hangs and it’s always in the OFF_IDLE state which code I posted above.

I will put the particle.connected checks in the maintenance routines and the delay longer than 300ms to make sure this publish events reach the cloud. This will potentially avoid some code blocks, however I doubt this will explain why the short 10sec deep sleep works while in the 6hrs+ deep sleep the electron does not recover (i.e does not wake up) which is my main issue.

@fenriquez,

I know how frustrating these issues can be, I would only have one suggestion in looking at your code.

Since you want to put the device into deep sleep, I suggest you might try managing the connection. Here is how I manage this in my sleep sequence:

    detachInterrupt(intPin);                                          // Done sensing for the day
    pinSetFast(disableModule);                                        // Turn off the pressure module for the hour
    pinResetFast(ledPower);                                           // Turn off the LED on the module
    if (verboseMode && state != oldState) publishStateTransition();
    if (hourlyPersonCount) {                                          // If this number is not zero then we need to send this last count
      state = REPORTING_STATE;
      break;
    }
    if (connectionMode) {
      Particle.disconnect();
      Cellular.off();
      delay(1000);
    }
    FRAMwrite16(FRAM::currentDailyCountAddr, 0);                      // Reset the counts in FRAM as well
    FRAMwrite8(FRAM::resetCountAddr,0);
    FRAMwrite16(FRAM::currentHourlyCountAddr, 0);
    digitalWrite(blueLED,LOW);                                        // Turn off the LED
    digitalWrite(tmp36Shutdwn, LOW);                                  // Turns off the temp sensor
    watchdogISR();                                                    // Pet the watchdog
    int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
    System.sleep(SLEEP_MODE_DEEP,wakeInSeconds);                      // Very deep sleep till the next hour - then resets

This requires, of course, that I reconnect to Particle after waking but, perhaps adding a sequence that manages the cellular module makes a difference - especially since you 10sec test may be too short for the cellular radio to power down.

I hope this helps,

Chip

2 Likes

For deep trace logging - I am referring to logging like this. You would need a monitor connected to the microUSB or write it to an SD card. I appreciate that out in the field this may not be feasible. Given you are only seeing this problem when in the field then it sounds like an environmental issue.

// Use primary serial over USB interface for logging output

SerialLogHandler logHandler;

void setup() {
    // Log some messages with different logging levels
    Log.info("This is info message");
    Log.warn("This is warning message");
    Log.error("This is error message");

    // Format text message
    Log.info("System version: %s", (const char*)System.version());
}

void loop() {
}

I think that there are many useful threads to explain how to check (with ACK) that the particle publish has succeeded before calling sleep.

My main learning point - sorry if this wasn’t clear - is that the problem is likely to be going to sleep properly and that something (unfinished publish, etc.) could be blocking that. The system is then in an odd state and you can get odd behaviour.

There is one known issue with deep sleep not being able to wake again when the WKP happens to be HIGH just at the very moment the device enters deep sleep.
If you are not using the WKP/A7 pin, you could tie it to LOW with a strong pull-down to exclude this from your list of possible causes.

Thanks @chipmc. I will add this to the sleep sequence!

understood. it makes sense now. I am adding this Particle.connected check up as well as making the delay longer for the publish to finish before going into deep sleep. Thanks @armor

Thanks @ScruffR. Unfortunately I am using the A7 pin for an external alarm coming from the client's machine. The risk on this front is lower as the machine is completely turned off (therefore pin LOW) before the electron doing the sleep routine. However, it's a possibility but something I cannot address now with the current hardware.
I am making changes to the hardware though and will make the new hw version more flexible on this front (e.g leaving space for a hard pull down resistor) in addition to changing the use of the WKP/A7 pin for a hardware watchdog.

1 Like