Dear community
This is a continuation of this thread: Issues with sleep - Need advice on troubleshooting ideas Decided to open a new thread to separate the issues facing the original author as I really want to get to the bottom of my issues and find solutions. Apologies if this goes against forum etiquette.
This is the context I feel necessary to provide to get more meaningful help:
- My devices are running on Electron 0.8.0 rc10 (some rc11). They are performing a real time remote monitoring function for machines working in car maintenance workshops mainly, and read 9 signals from the client’s machine. 5 of these signals trigger interrupts when the machine is on/off and the other 4 are digital alarms that need monitoring in real time if they’re on.
- The machine performs a service during a period of time (30 or 40min) and our device measures some stuff and reports to the cloud. Then, after detecting no activity, it goes to sleep, mainly to save on data usage as the device is connected to power supply through VIN all the time. It waked up on the rising edge of the on/off signal from the client’s machine.
- After migrating the firmware from 0.7 to 0.8 and introducing new firmware features, the application grew from roughly 900 lines of code to almost 2000 lines! . Even though tests for several weeks were successful, once the new software was implemented in all machines (20 or so) I started having issues with the systems not waking up from
System.sleep(pin, RISING)
or even locking up a few times (constant LED cyan). Several small changes were made in an attempt to stabilize the software (explained in previous thread if you’re interested) but no real improvements. - As suggested in the previous thread, and in an attempt to isolate application issues vs possible firmware bugs, I migrated the structure of the application and started using the finite state machine concept. Loved it! I was able to shrink the code back to roughly 1100 lines of code keeping the intended new functionalities. Really good approach and started to migrate to FSM the other customers we have (different use cases).
- The FSM style new code runs smoothly and more responsively without lock ups. I think we managed to get rid of all potential code blocks. Great! However, I did sacrifice the sleep feature given the bad experiences we were having in the previous app firmware version.
The current version is awake all the time, and only does a deep sleep for 10 seconds at midnight as a “maintenance” type of activity to be fresh every day (suggested by @ScruffR in the previous thread as well) . When the client’s machine is not working, the software “rests” in a small “OFF_IDLE” stage. This is the code that has been working fine for 2 weeks already. When the device does the “maintenance reset” at midnight, it reports back going into INIT_STATE, reads SoC and then OFF_IDLE again.
STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));
STARTUP(System.enableFeature(FEATURE_RESET_INFO));
SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);
// STATE Machine Variables definition
enum MainStates {INIT_STATE, OFF_IDLE, READY_TO_START, TURN_ON_STATE, IN_SERVICE, PAUSE_STATE, END_OF_SERVICE, WAIT_RESP_STATE, ERROR_STATE, MEMORY_STATE, MEMORY_RESP_STATE};
MainStates main_state = INIT_STATE;
retained MainStates old_main_state = OFF_IDLE;
void setup() {
fuel.begin();
fuel.quickStart();
Time.zone(-5);
if(!Cellular.ready()) Cellular.connect();
// Particle functions, variables and webhook handlers added here
//input pins - I deleted all pins from this snippet with exception of the one below as it uses A7 (WKP) that may interfere with the sleep more. This pin is an alarm from the machine that is normally in 0V and completely shut down when the machine is OFF. In addition, all interrupts are disabled in the OFF_IDLE state with exception of the turn_on button (pinSwitch)
pinMode(pinTemperatura, INPUT_PULLDOWN);
attachInterrupt(pinSwitch, pin_switch_ISR, CHANGE,1);
last_reboot_day = Time.day();
Particle.connect();
} //************* END OF SETUP
void loop() {
maintain_manual_connection();
switch (main_state){
//all other states deleted
case INIT_STATE:
waitFor(Particle.connected, 30000);
read_SoC(); //read & publish batteries State of Charge
if (verboseMode && main_state != old_main_state) publishStateTransition();
main_state = OFF_IDLE;
break;
case OFF_IDLE:
if (verboseMode && main_state != old_main_state) publishStateTransition();
debounce_inicio(); //Machine's switch is activated through the ISR. Debounce (i.e wait for safe reading)
if (pin_reading_ready){
if (estado_inicio) { //machine normal ON. Waits 15 seconds to start service
if (!estado_pausa) flag_start_in_pause = TRUE; //machine is started with the Pause button activated. It waits until pause is manually deactivated and then 15 secs more to start service
main_state = READY_TO_START;
}
if (!estado_inicio && estado_pausa) { // we were busy while machine started so jump straight to turn_on_state directly & miss ready_to_start state
record_start_time(); //starts a millis based timer to measure the time of service
main_state = TURN_ON_STATE;
}
reset_variables(); //initialiase some counters
pin_reading_ready = FALSE;
}
if (svc_in_memory || maq_in_memory){
if(Time.now() - last_msg_in_memory >= wait_republish) main_state = MEMORY_STATE; //if there were errors in the webhooks and messages did not reach the Ubidots server, then we retry later when the machine is off_idle. if resending the webhooks are successful then the webhook error counter below is reset to avoid going into error_state
}
if (cloud_disconnect_count >= max_disconnects || webhook_error >= 1) main_state = ERROR_STATE; // too many cloud disconnects force a simple system.reset. Webhook errors force a full_modem_reset
if (Time.day() != last_reboot_day) maintenance_reset(); //forces a reboot every day at 00.00
break;
}
}
void maintain_manual_connection(){
if (millis() - last_process >= espera_process){ //performs particle.process every 5 seconds or reconnect to Cloud
last_process = millis();
if (Particle.connected()) Particle.process();
else {
cloud_disconnect_count++;
Particle.disconnect();
Particle.process();
for(uint32_t ms = millis(); millis() - ms < 500; Particle.process());
Particle.connect();
waitFor(Particle.connected, 30000);
}
}
} //**** end of maintain manual connection
void record_start_time(){
startTime = millis(); //registers service start time
maq_timestamp_on = Time.now(); //registers Machine ON timestamp to be published in a webhook later
}
void debounce_inicio(){ //pins are too bouncy so it waits 300ms or so after the ISR is triggered for a safe pin reading
if (flag_pin_switch_isr) {
if(millis() - inicio_triggered >= espera_inicio){
estado_inicio = pinReadFast(pinSwitch);
estado_pausa = pinReadFast(pinPausa);
pin_reading_ready = TRUE;
flag_pin_switch_isr = FALSE;
}
}
}
//turn-on button in the machine is sensed in an ISR. However, the pin reading in done in the main loop a few ms later to make sure we get a stable reading.
void pin_switch_ISR () {
if((micros() - last_micros_inicio) >= debouncing_inicio) { //first debounce
last_micros_inicio = micros();
inicio_triggered = millis();
flag_pin_switch_isr = TRUE;
}
}
void maintenance_reset() {
Particle.publish("boot","1",PRIVATE);
delay(300);
System.sleep(SLEEP_MODE_DEEP, 10);
}
Thanks @chipmc. I read several times your FSM examples and “borrowed” a few bits as the verbose mode.
However, as I wanted to slowly recover the sleep features and also save on data plan, I introduced a longer deep sleep for 6 hours (midnight till 6am) and Sunday 24hrs till Monday 6am when the machines are not working. It worked 1 weekend and the second weekend one of the 2 test devices never woke up. In the code above, I added the following borrowed from @rickkas7 in When is Electron Powered By Battery to check whether the VIN supplied was still OK:
class PowerCheck {
public:
PowerCheck();
virtual ~PowerCheck();
void setup();
bool getHasPower();
bool getHasBattery();
bool getIsCharging();
private:
void interruptHandler();
PMIC pmic;
volatile bool hasBattery = true;
volatile unsigned long lastChange = 0;
};
PowerCheck::PowerCheck() {}
PowerCheck::~PowerCheck() {}
void PowerCheck::setup() {
attachInterrupt(LOW_BAT_UC, &PowerCheck::interruptHandler, this, FALLING);
}
bool PowerCheck::getHasPower() {
// Bit 2 (mask 0x4) == PG_STAT. If non-zero, power is good
// This means we're powered off USB or VIN, so we don't know for sure if there's a battery
byte systemStatus = pmic.getSystemStatus();
return ((systemStatus & 0x04) != 0);
}
void PowerCheck::interruptHandler() {
if (millis() - lastChange < 100) {
hasBattery = false;
}
else {
hasBattery = true;
}
lastChange = millis();
}
And changed the maintenance reset commands for this:
if (Time.weekday() == fin_de_semana) weekend_deep_sleep(); //fin_de_semana is set to 1 = Sunday in setup
else if (Time.day() != last_reboot_day) maintenance_reset();
void weekend_deep_sleep(){
Particle.publish("boot","4",PRIVATE);
delay(300);
System.sleep(SLEEP_MODE_DEEP, 30*3600); //weekend sleep from 00.00 on Sunday to 6.00 Monday = 30hrs
}
void maintenance_reset() {
Particle.publish("boot","1",PRIVATE);
delay(300);
System.sleep(SLEEP_MODE_DEEP, 6*3600); // Night Sleep 6hrs = 6*3600 secs
}
The test device was supposed to be sleep since Sunday 00.00 but woke up on the Sunday at 3.52am (reported the status as OFF_IDLE) and then went back to sleep as I never was able to communicate with it again.
Maybe an issue with the internal clock?
Did the WKP/A7 pin managed to pick some noise and woke up the device? but if this happened, the device was supposed to be reachable all Sunday (after the premature wake up) but I was never able to reach out.
Monday morning and was still sleeping. Went to reset it manually around 9.30am and downgraded to the original working version with the 10secs midnight sleep only.
- Do you see any problems in the code above?
- has the internal clock a problem?
- I’ve seen some of my other devices not going to sleep or waking up in the times they’re supposed. Some go into sleep again after start up as if the Time.zone settings were not properly set ? and the
last_reboot_day = Time.day();
in the set up did not work - is the use of A7/WKP pin an issue even though the interrupts are disabled?
- is there a bug in the firmware?
Thanks in advance for the help. Hope I was clear and sorry for the so long explanation.