External Watchdog and Sleep - Advice needed


#1

As many of you know, I am working on making my Electron based boards more reliable. One was I am doing this is by adding an external watchdog timer to the Electron carrier board. I am also working to improve my code so watchdog initiated resets are rare. So far, so good with 5 devices now deployed for over 120 days with no lockups.

However, I need to place another device in a place where there is no utility power and will require solar. I have built solar powered devices before and am used to using SLEEP modes. In this case however, my watchdog (TI TPL5010) timer is causing issues. It is connected on three pins to the Electron - WKP to wake the Electron if needed, DONE - this is the pin the electron uses to pet the watchdog and RST where the watchdog can reset the Electron if needed.

I can disable the WKP portion easily by simply changing the pin mode to OUTPUT. I was hoping that I could disable the Reset as well using the System.resetDisable() function but, it does not seem to prevent the watchdog from resetting the Electron while it is sleeping. I could add a logic gate to by carrier to allow the Electron to disable the watchdog but, before I do, I wanted to make sure I am not missing a software based trick.

For example, can I use the same pinMode trick on RST that I did on WKP - if so, what would be the pin designator?

Thanks,

Chip


#2

RST is hard wired and can’t be disabled internally.


#3

What exactly is the issue? I assume it is power related but can you elaborate?


#4

@peekay123,

The issue is that the watchdog is waking the Electron up every minute. With the 20-30 seconds it takes for the electron to connect to the cellular network, register with Particle and start “Breathing Cyan” it is not getting much sleep.

On previous projects, this was not a big deal as the process of waking, petting the dog, and going back to sleep took less than a second.

If I cannot disable RST, is there a way to have the cycle time of wake, pet and sleep go faster?

Chip


#5

@chipmc, how long are you sleeping the Electron for?


#6

@peekay123,

This is for a device that is next to a large trail in a local park. It counts pedestrians walking by. The volume of walkers varies based on day, time and weather from a few a minute to one or two per hour during the park’s open hours.

Two sleep scenarios:

  1. During the day if there has not been an event for 60 seconds. The sensor would then sleep until the PIR sensor wakes it again with an interrupt.

  2. Once the park closes, the device would sleep until the next morning (about 10 hours).

Chip


#7

@chipmc, in the first scenario, there is no predictable time and wake is event based. In the second, sleep is for about 10 hours. These are not good conditions for the use of a watchdog.

One possibility is that you stretch the TLP5010 timeout to its maximum of 7200s (2hrs) and wake every 2hrs to “pet” (as you say) the watchdog. In the case of motion-activated wake, implementing this is easy. In the case of deep-sleep, you could set a counter to 4 in retained RAM so that on wake, you simply decrement the counter by 1 and go back to deep sleep for 2hrs. Once the counter reaches 0, you wake as usual. When you do a 10hr sleep again, you reset the counter to 4 and deep sleep for 2 hrs. Of course, this requires that your run in SEMI-AUTOMATIC mode to present a connection attempt on reboot.


#8

@peekay123,

OK, so, I have been thinking about this whole watchdog thing. Please tell me if I am off on this:

The goal is reliability. I am thinking there are three layers to improving the reliability of my devices and, I think I may have put them in the wrong order before. But, after our other discussions and reading the Particle Reference guides, I think it may look like this:

  1. Reliable Software, using better techniques (such as finite state machines) and testing to improve the reliability and resiliency of the software.

  2. Taking full advantage of the Particle platform. I break this out from #1 as there are some things that particle brings to the party that I have not yet taken full advantage of such as its internal watchdog timer, reset source codes, and persistent memory to store the state of the machine if it resets

  3. Finally, the Platform has to operate in an imperfect world: static electricity (or worse) can deliver a damaging shock, the i2c bus can hang, cellular data might not get delivered and Webhooks can timeout. These events should be rare but they are what I hope to address in the carrier board with circuit protection, more memory and the watchdog timer.

Does this make sense? If so, I have an idea to make the watchdog controllable by the Electron instead of separate as it is today. The Electron will engage the watchdog when it is doing things that may hang and may not be easily recovered from and turn it off when it needs to sleep. The idea would be to AND one of the digital IO pins with the output of the watchdog. A pull-up on the digital IO pin will make sure the default behavior is the same as today but with the option to turn off the watchdog when needed.

So, am I crazy? Should I just drop the external watchdog and focus on #1 and #2 alone? Appreciate your and others opinion.

Thanks,

Chip


#9

@chipmc, you are not crazy. Those are all good scenarios and points. Now imagine the Electron crapping out before you set the IO pin for the AND. Or the GPIO pin stops working. You can easily go down a rabbit hole of scenarios.

This is where you need to look at the likelihood of a given failure and build protection accordingly. Making a matrix of possible failures vs probability of failure is a good start. Classic threat/risk analysis. :slight_smile:


#10

@peekay123,

Agreed, I think I am starting to get some data on that threat/risk part.

Each time the sensor is reset, I count it and store the number in a counter in the FRAM on the carrier. Based on this number, it looks like between 3-5 times a month the watchdog timer resets the sensors. The most likely reason is and i2c lockup for the accelerometer based sensors or an issue with a webhook response. This tells me that, based on the reliability of these systems, I still need the external watchdog today.

However, I think I could do more to reduce these reset events. It may take me some time to realize this roadmap but, I am thinking it would look like this:

  1. Rewrite the code using the Finite State Machine model
  2. In addition to counting resets, I could keep track of the last known state before the reset
  3. I could try more recovery efforts before I reset the device (such as retrying a webhook)

Perhaps watchdog resets could be a proxy for my code quality.

Thanks for your ongoing engagement on this,

Chip


#11

@peekay123,

I have disabled the external watchdog timer for now and am updating this project on the Finite State Machine thread. Comments welcome.

Thanks,

Chip


#12

@chipmc @peekay123 @BDub @ScruffR

You could forget about the external watchdog if we knew for sure that the built-in watchdog feature would be enabled via a future firmware updated.

We hear that it’s coming but do any of you guys know when we could expect this to happen? Weeks? Months? Years?

Hopefully, that new round of Particle funding can help get this vital feature enabled for everybody sooner rather than later.


#13

We Elites keep frequently nudging Particle on this topic, but have not had any info about its priority on the back log either :frowning:


#14

All,

I have rewritten the code and simplified things to get my hear wrapped around how power management works with the Electron. I am making progress but, it does not seem like the SYSTEM_MODE(SEMI_AUTOMATIC); is working as I expected.

I thought that, in this mode, I could wake from an interrupt and not have to wait for the Electron to reconnect before my program started executing. But, that is not what I am observing. It takes about 15 seconds for execution to resume. This includes the system LED cycling through “white”, “flashing green”(most of the time is here - connecting to the cellular network) and finally “breathing green”. I expected “breathing green” as I do not give the Particle.connect() command until I am ready to report. Still, why does it take so long since the manual said user execution would start within a second or so?

Do I need to do something to tell it not to connect to the cellular network on wake?

/*
* Project Cellular-Logger-PIR-SOLAR
* Description: Cellular Connected Data Logger
* Author: Chip McClelland
* Date:8 October 2017
*/


// Finally, here are the variables I want to change often and pull them all together here
#define SOFTWARERELEASENUMBER "0.1"
#define PARKCLOSES 22
#define PARKOPENS 7

// Prototypes and System Mode calls
SYSTEM_MODE(SEMI_AUTOMATIC);    // This will enable user code to start executing automatically.
FuelGauge batteryMonitor;                              // Prototype for the fuel gauge (included in Particle core library)

// State Maching Variables
enum State { INITIALIZATION_STATE, ERROR_STATE, IDLE_STATE, SLEEPING_STATE, NAPPING_STATE, REPORTING_STATE, RESP_WAIT_STATE };
State state = INITIALIZATION_STATE;

// Pin Constants
const int intPin = D3;              // Acclerometer interrupt pin
const int blueLED = D7;              // This LED is on the Electron itself
const int userSwitch = D5;           // User switch with a pull-up resistor

// Timing Variables
unsigned long resetWaitTimeStamp = 0;       // Starts the reset wait clock
unsigned long resetWaitTime = 30000;        // Will wait this lonk before resetting.
unsigned long sleepDelay = 60000;             // Amount of time to stay awake after an event - too short and could cost power
unsigned long lastEvent = 0;                  // Keeps track of the last time there was an event
unsigned long oneMinuteMillis = 60000;      // For Testing the system and smaller adjustments
bool waiting = false;
int currentPeriod = 0;                      // Change length of period for testing 2 places in main loop

// Program Variables
volatile bool ledsEnabled = true;    // Start with the lights on
int hourlyPersonCount = 0;
boolean ledState = LOW;             // variable used to store the last LED status, to toggle the light
const char* releaseNumber = SOFTWARERELEASENUMBER;  // Displays the release on the menu

// PIR Sensor variables
volatile bool sensorDetect = false;       // This is the flag that an interrupt is triggered

// Battery monitor
int stateOfCharge = 0;            // stores battery charge level value


void setup()
{
  Particle.connect();             // Connect to Particle on bootup - will disonnect on nap or sleep
  Serial.begin(9600);
  Serial.println("");                 // Header information
  Serial.print(F("Electron-Sleep-Test-PIR - release "));
  Serial.println(releaseNumber);

  pinMode(intPin,INPUT);            // PIR interrupt pinMode
  pinMode(blueLED, OUTPUT);           // declare the Red LED Pin as an output

  attachInterrupt(intPin,sensorISR,RISING);          // Will know when the PIR sensor is triggered

  Particle.variable("Release",releaseNumber);
  Particle.variable("stateOfChg", stateOfCharge);

  Time.zone(-4);                   // Set time zone to Eastern USA daylight saving time
  stateOfCharge = int(batteryMonitor.getSoC()); // Percentage of full charge

  state = IDLE_STATE;         // Idle and look for events to change the state
}

void loop()
{
  switch(state) {
  case IDLE_STATE:
    if (Time.hour() != currentPeriod)                       // Spring into action each hour on the hour
    {
      currentPeriod = Time.hour();                          // Set the new current period
      state = REPORTING_STATE;
      break;
    }
    if (sensorDetect) recordCount();               // The ISR had raised the sensor flag
    if (millis() >= (lastEvent + sleepDelay)) state = NAPPING_STATE;
    if (Time.hour() >= PARKCLOSES) state = SLEEPING_STATE;
    break;

  case SLEEPING_STATE:
    Particle.disconnect();     // Disconnect from Particle in prep for sleep
    Serial.println("Park closed go to sleep");
    static long secondsToOpen = ((24 - Time.hour())+PARKOPENS)*3600;
    System.sleep(SLEEP_MODE_DEEP,secondsToOpen);
    state = REPORTING_STATE;
    break;

  case NAPPING_STATE:
    Particle.disconnect();     // Disconnect from Particle in prep for sleep
    sensorDetect = true;                              // Woke up so there must have been an event
    lastEvent = millis();
    Serial.print("Going to sleep ...");
    static int secondsToHour = (60 - Time.minute())*60;
    System.sleep(intPin,RISING);
    attachInterrupt(intPin,sensorISR,RISING);         // Sensor interrupt from low to high
    state = IDLE_STATE;
    break;

  case REPORTING_STATE:
    static bool success = false;
    Particle.connect();
    success = Particle.publish("State","Reporting");
    if (success) state = IDLE_STATE;
    else state = ERROR_STATE;
    break;

  case ERROR_STATE:                               // Set up so I could have other error recovery options than just reset in the future
    if (!waiting)
    {
      waiting = true;
      resetWaitTimeStamp = millis();
      Particle.publish("State","Resetting in 30 sec");
    }
    if (millis() >= (resetWaitTimeStamp + resetWaitTime)) System.reset();
    break;
  }
}

void recordCount() // This is where we check to see if an interrupt is set when not asleep or act on a tap that woke the Arduino
{
  lastEvent = millis();
  sensorDetect = false;      // Reset the flag
  Serial.println("It is a tap - counting");
  hourlyPersonCount++;                    // Increment the PersonCount
  Serial.print("Hourly: ");
  Serial.print(hourlyPersonCount);
  Serial.print("  Time: ");
  Serial.println(Time.timeStr()); // Prints time string example: Wed May 21 01:08:47 2014
  ledState = !ledState;              // toggle the status of the LEDPIN:
  if (ledsEnabled) digitalWrite(blueLED, ledState);    // update the LED pin itself
}

void sensorISR()
{
  sensorDetect = true;  // sets the sensor flag for the main loop
}

Thanks,

Chip


#15

SEMI_AUTOMATIC only will run your code immediately after wake from Stop Mode Sleep when you explicitly disconnected completely (including Cellular.disconnect()) prior sleep. Otherwise the system will reestablish the connection first as it was.
If you want your code to run “independent” of the connection SYSTEM_THREAD(ENABLED) would be the option to go for.

SEMI_AUTOMATIC won’t initiate the connection on startup, but once it’s initiated it’ll just behave the same as AUTOMATIC mode.

BTW, if you only intend to sleep for a short periode, you might want to use SLEEP_NETWORK_STANDBY to reduce the time for a subsequent reconnect considerably.


#16

@ScruffR,

Thank you. Makes sense. I have several of these sensors in the park already so, I have an idea of how many “events” I will see in an average day. The number of hikers changes all the time so the sensor will see (on average) a hiker every 6 minutes with as few as one an hour and as many as 60.

So, I think what makes sense is to nap as quickly as possible, wake within 3 seconds (the debounce between sensor events) during the day and take a deep sleep at night. I have modified the code to include the Cellular.off(); commands but, to be honest, I am not seeing much of a reduction. The Electron is using about 28mA while sleeping with the modem off, this seems high to me.

Any idea what I am missing to get a lower sleep power consumption rate? Modified code below with some additions from @RWB

/*
* Project Cellular-Logger-PIR-SOLAR
* Description: Cellular Connected Data Logger
* Author: Chip McClelland
* Date:8 October 2017
*/


// Easy place to change global numbers
#define SOFTWARERELEASENUMBER "0.15"
#define PARKCLOSES 14
#define PARKOPENS 7

// Prototypes and System Mode calls
SYSTEM_MODE(SEMI_AUTOMATIC);    // This will enable user code to start executing automatically.
SYSTEM_THREAD(ENABLED);         // Means my code will not be held up by Particle processes.
FuelGauge batteryMonitor;       // Prototype for the fuel gauge (included in Particle core library)
PMIC pmic;                      //Initalize the PMIC class so you can call the Power Management functions below.


// State Maching Variables
enum State { INITIALIZATION_STATE, ERROR_STATE, IDLE_STATE, SLEEPING_STATE, NAPPING_STATE, REPORTING_STATE, RESP_WAIT_STATE };
State state = INITIALIZATION_STATE;

// Pin Constants
const int intPin = D3;                      // Acclerometer interrupt pin
const int blueLED = D7;                     // This LED is on the Electron itself
const int userSwitch = D5;                  // User switch with a pull-up resistor

// Timing Variables
unsigned long resetWaitTimeStamp = 0;       // Starts the reset wait clock
unsigned long resetWaitTime = 30000;        // Will wait this lonk before resetting.
unsigned long sleepDelay = 60000;           // Amount of time to stay awake after an event - too short and could cost power
unsigned long lastEvent = 0;                // Keeps track of the last time there was an event
bool waiting = false;
int currentPeriod = 0;                      // Change length of period for testing 2 places in main loop

// Program Variables
int hourlyPersonCount = 0;
bool ledState = LOW;                        // variable used to store the last LED status, to toggle the light
const char* releaseNumber = SOFTWARERELEASENUMBER;  // Displays the release on the menu

// PIR Sensor variables
volatile bool sensorDetect = false;         // This is the flag that an interrupt is triggered

// Battery monitor
int stateOfCharge = 0;                      // stores battery charge level value

void setup()
{
  Particle.connect();                         // Connect to Particle on bootup - will disonnect on nap or sleep
  Serial.begin(9600);                         // Serial for debugging, will come out later
  pmic.begin();                               // For power management
  Serial.println("");                         // Header information
  Serial.print(F("Electron-Sleep-Test-PIR - release "));
  Serial.println(releaseNumber);

  pinMode(intPin,INPUT);                      // PIR interrupt pinMode
  pinMode(blueLED, OUTPUT);                   // declare the Blue LED Pin as an output

  attachInterrupt(intPin,sensorISR,RISING);   // Will know when the PIR sensor is triggered

  Particle.variable("Release",releaseNumber); // Make sure we know what version of software is running
  Particle.variable("stateOfChg", stateOfCharge); // Track Battery Level

  Time.zone(-4);                              // Set time zone to Eastern USA daylight saving time

  pmic.setChargeCurrent(0,0,1,0,0,0);         //Set charging current to 1024mA (512 + 512 offset) thank you @RWB for these two lines
  pmic.setInputVoltageLimit(4840);            //Set the lowest input voltage to 4.84 volts. This keeps my 5v solar panel from operating below 4.84 volts.
  stateOfCharge = int(batteryMonitor.getSoC()); // Percentage of full charg

  state = IDLE_STATE;                         // Idle and look for events to change the state
}

void loop()
{
  switch(state) {
  case IDLE_STATE:
    if (Time.hour() != currentPeriod)                       // Spring into action each hour on the hour
    {
      currentPeriod = Time.hour();                          // Set the new current period
      state = REPORTING_STATE;                              // We want to report on the hour
      break;
    }
    if (sensorDetect) recordCount();                        // The ISR had raised the sensor flag
    if (millis() >= (lastEvent + sleepDelay)) state = NAPPING_STATE;  // Too long since last sensor flag - time to nap
    if (Time.hour() >= PARKCLOSES) state = SLEEPING_STATE;  // The park is closed, time to sleep
    break;

  case SLEEPING_STATE:
    Particle.disconnect();                                   // Disconnect from Particle in prep for sleep
    Cellular.disconnect();                                   // Disconnect from the cellular network
    Cellular.off();                                          // Turn off the cellular modem
    Serial.println("Park closed go to sleep");
    static long secondsToOpen = 600;  // Test - sleep for 10 minutes
    //static long secondsToOpen = ((24 - Time.hour())+PARKOPENS)*3600;  // Set the alarm (in seconds) till park opens again
    System.sleep(SLEEP_MODE_DEEP,secondsToOpen);              // Sleep till morning
    state = REPORTING_STATE;                                  // Report when we wake from sleep
    break;

  case NAPPING_STATE:
    Particle.disconnect();                                   // Disconnect from Particle in prep for sleep
    Cellular.disconnect();                                   // Disconnect from the cellular network
    Cellular.off();                                          // Turn off the cellular modem    digitalWrite(blueLED,LOW);    // Turn off the on-board light
    sensorDetect = true;                                     // Woke up so there must have been an event
    lastEvent = millis();                                    // Reset millis so we don't wake and then nap again
    Serial.print("Going to sleep ...");
    static int secondsToHour = (60 - Time.minute())*60;      // Time till the top of the hour
    System.sleep(intPin,RISING,secondsToHour);               // Sensor will wake us with an interrupt
    attachInterrupt(intPin,sensorISR,RISING);                // Sensor interrupt from low to high
    sleepDelay = 10000;                                      // Sets the sleep delay to 10 seconds after a nap
    state = IDLE_STATE;                                      // Back to the IDLE_STATE after a nap
    break;

  case REPORTING_STATE:
    static bool success = false;                             // Was data received
    Cellular.on();                                           // turn on the Modem
    Cellular.connect();                                      // Connect to the cellular network
    Particle.connect();                                      // Connect to Particle
    success = Particle.publish("State","Reporting");         // Sample message.
    sleepDelay = 60000;     // Sets the sleep delay to 60 seconds after reporting to give time to flash if needed
    if (success) state = IDLE_STATE;                         // Success - go to IDLE_STATE
    else state = ERROR_STATE;                                // Failure - go to ERROR_STATE
    break;

  case ERROR_STATE:                                          // To be enhanced - where we deal with errors
    if (!waiting)                                            // Will use this flag to wiat 30 seconds before reset
    {
      waiting = true;
      resetWaitTimeStamp = millis();
      Particle.publish("State","Resetting in 30 sec");
    }
    if (millis() >= (resetWaitTimeStamp + resetWaitTime)) System.reset();   // Today, only way out is reset
    break;
  }
}

void recordCount()                                          // Handles counting when the sensor triggers
{
  lastEvent = millis();                                     // Important to keep from napping too soon
  sensorDetect = false;                                     // Reset the flag
  Serial.println("It is a tap - counting");
  hourlyPersonCount++;                                      // Increment the PersonCount
  Serial.print("Hourly: ");                                 // Serial reporting for debugging
  Serial.print(hourlyPersonCount);
  Serial.print("  Time: ");
  Serial.println(Time.timeStr());                           // Prints time string example: Wed May 21 01:08:47 2014
  ledState = !ledState;                                     // toggle the status of the LEDPIN:
  digitalWrite(blueLED, ledState);                          // update the LED pin itself
}

void sensorISR()
{
  sensorDetect = true;                                      // sets the sensor flag for the main loop
}

Thank you,

Chip


#17

But has this helped getting rid of the 15sec reconnection time on wake?
That was the prime focus of my answer.

What system version are you running?
0.6.2 and lower still have a “bug” where the modem may end up in a “limbo state” when going to sleep too soon after disconnect.


#18

@ScruffR,

Yes, your suggestions have solved the issue of my user code being blocked during reconnect. Sorry, I failed to mention that in my update.

Also, I am using the 0.6.2 firmware, do I need to add a test for disconnect before turning off the modem?

Thank you for all the help you are giving me. I hope others will benefit from this as mush as I have.

Chip


#19

A few seconds delay before should help, since any test in code may still fool you into believing it’s done, when it isn’t.
The best option would be to use 0.7.0 if you can.


#20

@ScruffR,

Updated to 0.7.0 and still no luck. Will do some more testing on my end.

On the PMIC commands. Where can I get more information on the registers and how to manage this. Once I move over to a Solar panel, I will want to change three things:

  1. set a lower limit on the charge voltage
  2. raise the max charging current beyond 500mA if I end up with using a 6V panel bigger than 3W.
  3. Raise the charge limit above 80% when the temperature allows (remember I have a temp sensor on the carrier board).

I appreciate you sticking with me on this.

Thanks,

Chip