Project Share - Build a better external watchdog to improve device reliability

I have been using the Texas Instruments TPL5010 in my remote outdoor projects and it has served me well by saving me trips reset errant devices. However, as my projects have matured and with the capabilities of the new Generation 3 devices, I found the TPL5010 lacking.

There are three things the TPL5010 cannot do:

  1. Allow my to dynamically set the interrupt interval
  2. Reset the interval when the watchdog is “petted”. - @TI, this one is a serious miss!
  3. Support more robust “petting” to preclude the possibility of a flailing device twiddling the “done” line.

So, with @shanevanj’s encouragement and help, I have started to build the watchdog of my dreams. I am sharing this work here in the hope it helps some of you and in case you have suggestions to improve this project.

Hardware, I used the ATTINY85 for this first iteration as I am very familiar with its operation, it is cheap, and it does not need many external components. Here is the schematic including the programming pads.

For this first iteration, I am simply going to implement the same functionality of the TPL5010 with the (HUGE!) added benefit of resetting the interrupt interval when it is “pet”.

A short aside on why this is a big deal: If my device needs to report on the hour potentially sleeping for the hour between reporting periods, I can set an interrupt period that is just over one hour. This gets me max sleep while preventing the watchdog from waking the device unnecessarily. The watchdog resets the device if it locks up missing only one reporting period. However, this requires me to align the watchdog interrupt interval to the hourly sleep cycle. There is no way to do this with the TPL5010.

Here is the simple code for the watchdog and the code for the Particle device that you can use to test it.

ATTINY Watchdog Code

// ATtiny85 Watchdog Timer - Basic Example
// Author: Chip McClelland
// Date: 3-14-20
// License - GPL3

// ATMEL ATTINY85, 1MHz, Internal clock
//
//                           +-\/-+
//  Reset - Ain0 (D 5) PB5  1|    |8  Vcc
//  Wake  - Ain3 (D 3) PB3  2|    |7  PB2 (D 2) Ain1 - SCL - Not used in this example
//  Done  - Ain2 (D 4) PB4  3|    |6  PB1 (D 1) MISO - Reset uC
//                     GND  4|    |5  PB0 (D 0) MOSI - SDA - Not used in this example
//                           +----+
// Interrupt code: https://gammon.com.au/forum/?id=11488&reply=9#reply9

/*
This is my dream Watchdog timer. It will function exactly as I want it - unlike the TPL5010.
First - You can set the interval in the header - wakeIntervalSeconds
Second - At the end of the interval, it will bring WAKE HIGH
Third - It will start a timer to reset which you can set in the header - resetIntervalSeconds
Finally - It will either Reset the device or restart the interval if it received a HIGH / LOW on Debounce
Version 1.0 - Minimally Viable Product - No Sleep
*/

#define adc_disable() (ADCSRA &= ~(1<<ADEN))  // disable ADC (before power-off)

#include <avr/power.h>    // Power management

enum State {INITIALIZATION_STATE, IDLE_STATE, INTERRUPT_STATE, DONE_WAIT_STATE, RESET_STATE};
State state = INITIALIZATION_STATE;

// Pin assignments will for the ATTINY85
const int resetPin  =   PB1;              // Pin to reset the uC - Active LOW
const int wakePin   =   PB3;              // Pin that wakes the uC - Active HIGH
const int donePin   =   PB4;              // Pin the uC uses to "pet" the watchdog
 
// Timing Variables
const unsigned long wakeIntervalSeconds = 3660UL;                 // One Hour and one minute
const unsigned long resetIntervalSeconds = 5UL;                   // You have this many seconds to pet the watchdog
unsigned long lastWake = 0;
unsigned long resetWait = 0;

// Program Variables
volatile bool donePinInterrupt = false; // Volatile as this flag is set in the Interrupt Service Routine

void setup() {
  pinMode(resetPin,OUTPUT);             // Pin to reset the uC
  pinMode(wakePin,OUTPUT);              // Pin to wake the uC
  pinMode(donePin,INPUT);               // uC to Watchdog pin

  digitalWrite(resetPin, HIGH);         // Unlike the TPL5010, we don't want to reset on startup - Reset is active LOW
  digitalWrite(wakePin, LOW);           // Wake pin is active HIGH     

  adc_disable();                        // This saves power

  PCMSK  |= bit (PCINT4);               // Pinchange interrupt on pin D4 / pin 3
  GIFR   |= bit (PCIF);                 // clear any outstanding interrupts
  GIMSK  |= bit (PCIE);                 // enable pin change interrupts

  state = IDLE_STATE;
}

void loop() {
  switch (state) {
    case IDLE_STATE:
      if (millis() - lastWake >= wakeIntervalSeconds * 1000UL) {    // Time to send a "wake" signal?
        state = INTERRUPT_STATE;
      }
      if (donePinInterrupt) {           // This is where we can reset the wake cycle using the Done pin
        lastWake = millis();            // A "done" signal will reset the interrupt interval - unlike the TPL5010!
        donePinInterrupt = false;
      }
    break;

    case INTERRUPT_STATE:              // Here we will send the "wake" signal and start the timer for a response
      digitalWrite(wakePin, HIGH);
      resetWait = millis();
      state = DONE_WAIT_STATE;
    break;
    
    case DONE_WAIT_STATE:              // We will wait here until we receive a "done" if time runs out - we will reset
      if (millis() - resetWait >= resetIntervalSeconds * 1000UL) {   // No response - reset
        digitalWrite(wakePin,LOW);
        state = RESET_STATE;
      }
      else if (donePinInterrupt) {                                // Got a response - reset interval
        donePinInterrupt = false;
        digitalWrite(wakePin,LOW);
        lastWake = millis();
        state = IDLE_STATE;
      }

    break;

    case RESET_STATE:
      digitalWrite(resetPin, LOW);      // Reset is active low
      delay(1000);                      // How long do we hold the reset pin
      digitalWrite(resetPin, HIGH);     // Need to bring high for device to come out of reset
      lastWake = millis();
      state = IDLE_STATE;
    break;
  }
}

ISR (PCINT0_vect) {                    // Interrupt service routine
  donePinInterrupt = true;
}

Particle Device Test Code

/*
* Project Watchdog Test Sketch
* Description: Simplest possible sketch to put the new Watchdog Timer through its paces
* Author: Charles McClelland
* Date: Started 3-12-2020 
* 
* Implements the following Tests
* 1 - Detects and reports a watchdog interrupt  - ISR pets the watchdog
* 
* v0.10 - Initial Release - Basic functionality - Straight Non-Programmable Watchdog Timer 
* 
*/

// Pin Constants for Boron
const int donePin = D5;                                           // Pin the Electron uses to "pet" the watchdog
const int wakeUpPin = D8;                                         // This is the Particle Electron WKP pin

// Program Variables
volatile bool watchDogFlag = false;                               // variable used to see if the watchdogInterrupt had fired                          

// setup() runs once, when the device is first turned on.
void setup() {
  pinMode(wakeUpPin,INPUT);                                       // This pin is active HIGH
  pinMode(donePin,OUTPUT);                                        // Allows us to pet the watchdog

  Particle.publish("Status", "Beginning Test Run",PRIVATE);

  attachInterrupt(wakeUpPin, watchdogISR, RISING);                 // Need to pet the watchdog when needed - may not be relevant to your application
}

void loop() {

  if (watchDogFlag) {                                             // Publish that we detected a watchdog event
    Particle.publish("Watchdog","Detected",PRIVATE);
    watchDogFlag = false;
  }
}

void watchdogISR()                                                // Watchdog Interrupt Service Routine
{
  digitalWrite(donePin,HIGH);
  digitalWrite(donePin,LOW);
  watchDogFlag = true;
}

Please take a look and let me know if you have questions / suggestions.

Next step is to add the programmability via i2c.

Thanks, Chip

7 Likes

This is a greta initiative in a number of ways - firstly it serves a vital purpose and second it gets people to see that a wider view of embedded technologies and how they can interact.

For those not familiar with ATTINY and is programming, I use the following setup that has worked well for me:

  • Install the Arduino IDE (you could use ATOM or VSCODE but its overly complicated based on my needs)
  • Install the [ATTINY Core]https://github.com/SpenceKonde/ATTinyCore) into the Arduino IDE.
  • Setup an Arduino UNO or similar as a programmer for the ATTINY85
  • Then use the basic libraries from the UNO IDE (such as TinyWire for I2C etc)

Easy to use and quick to develop with the added benefit is that you can have your Workbench and WEBIDE and the ARDUINO IDE open all at the same time for easy debugging.

A tip for the ATTINY debugging - setup an LED on one of the pins and us sit to flash patterns as debugging hints, plus it can also be used in operation to show your various states etc. I shamelessly used the concept of the Particle SOS patterns as created a “poor mans” version that has been very useful so far:


#define LED_BUILTIN 3;
//---------------------------------------------------------------------
void flashLed(byte numOfTimes)
//---------------------------------------------------------------------
{
  for (byte i = 0 ; i < numOfTimes; i++ ) {
    digitalWrite(LED_BUILTIN, HIGH);
    delay(200);
    digitalWrite(LED_BUILTIN, LOW);
    delay(300);
  }
}

//---------------------------------------------------------------------
void flashLedSOS(byte howMany)
//---------------------------------------------------------------------
{
  for (byte n = 0 ; n < 1; n++) {
    for (byte i = 0 ; i < 3; i++ ) {
      digitalWrite(LED_BUILTIN, HIGH);
      delay(75);
      digitalWrite(LED_BUILTIN, LOW);
      delay(200);
    }
    delay(300);
    for (byte i = 0 ; i < 3; i++ ) {
      digitalWrite(LED_BUILTIN, HIGH);
      delay(300);
      digitalWrite(LED_BUILTIN, LOW);
      delay(200);
    }
    delay(300);
    for (byte i = 0 ; i < 3; i++ ) {
      digitalWrite(LED_BUILTIN, HIGH);
      delay(75);
      digitalWrite(LED_BUILTIN, LOW);
      delay(200);
    }
  }
  delay(1500);
  flashLed(howMany);
  delay(1500);
}

Chip

Nice one.

Did you need an ATTiny85 or would a 45 have worked with less memory? I guess when you add I2C the answer would be no. Surely, programmability is limited in this case to changing the interrupt period could you not do this by toggling one pin for the time required? This then avoids needing to include an I2C slave library.

Nice work! I was looking at this exact problem right now.

If any of you are cheaters like me and would like to experiment with this idea without making a circuit board, adafruit has the trinket which works right out of the gate and can be programmed via USB through the Arduino IDE. It’s got I2C and other handy dandy features to. It’s a fair amount more expensive than the ATTINY85 only, but that’s the price of convenience. I’m about to drop your code on it and see how things go :slight_smile:

https://www.adafruit.com/product/3500

2 Likes

@armor,

Very good question. I am working on the programmable version now and will let you know if it would fit into an ATTINY45.

I am open to suggestions but, my intent is that the programmable version will be able to:

  • accept an interrupt interval
  • accept a reset wait interval
  • send the current values of both to the Particle device

I am open to suggestions but am not sure how to do this without i2c. I want to be able to change the interrupt and reset wait intervals when ever I need to so I could move to adaptive sleep / sample periods.

Thanks, Chip

@aklein24,

Nothing wrong with using something that you have to move forward quickly. I like the trinkets and even have an ATTINY85 version which I might dig out to speed iterations. We will see.

Please let me know if you have any questions / suggestions on the code thus far and I hope you find it useful.

Chip

The I2C serves as secondary detection method - this makes sure the main MPU is “logically” awake and doesn’t just have a thread running a “pet dog” toggle on an I/O. I would expect the I2C conversation to have an incrementing counter check or something so you know the main loop of the monitored MPU is actually running properly

I sense a bit of scope creep - but to answer Chip’s requirements - serial might be simpler and not require the I2C slave library. From memory I have fitted the I2C slave library on the ATtiny45 but then there isn’t much space left - RAM is the issue. Cents difference between the 45 and 85.

@armor,

Thank you for the scope creep comment - this is a recurring issue in my projects and one I need to watch out for.

Interesting suggestion of serial vs. i2c. I checked on Digikey and there is a ~$0.10 difference between the ATTINY45 and the ATTINY85. At this point, I am going to see if I can get the i2c version working (already have the board) but will be thinking about a serial option as well. I suspect one issue will be power usage as it seems the i2c requirement will have the ATTINY running at 8MHz.

As for the scope, I think this all fits under the three points I made in the first post. The basic watchdog code posted above meets criteria #2 (not sure why I numbered it this way). The i2c version I am working on now would meet #1. Requirement #3 is all about a more robust validation that the Particle device is working as intended.

My hope - and it looks like Shane’s as well - is to explore a few different ways of doing this. For example:

  • time window petting - perhaps two pets are required with some requirements on how short or long between them.
  • passing a “magic byte” - in order to change the intervals to prevent inadvertent changing of the interval
  • some way of confirming that the Particle device has connected to the cellular network (?).
  • some combination of petting the “done” pin and sending data over i2c.

The idea is to reduce the chance that a malfunction will leave the Particle device disconnected while still managing to satisfy the watchdog timer.

Thank you! Chip

I've used a Pub/Sub to accomplish this for a few 12V Timer Relay projects.
The Boron would subscribe to it's own Publish.
It will only "pet" the external 12V Timer Relay (used as a fool-proof Watchdog) when the Pub/Sub was successful, thus ensuring Cloud Connectivity. There have been a couple of instances in the Past that required a physical power cycle for other Boron's to recover (Cloud Related). But my Boron's and Electrons with the "Connection WatchDog" quickly recovered after automatic power cycles.
The Pub/Sub Connection WatchDog might be something you could incorporate into your System ?

It takes the WatchDog concept one step further [verse only ensuring Loop() is running].

1 Like

@Rftop,

Love that idea. I currently have this line in my code which keeps track of the time between successful webhook responses.

      else if (Time.now() - sysStatus.lastHookResponse > 7200L) { //It has been more than two hours since a successful hook response
        waitUntil(meterParticlePublish);
        if (Particle.connected()) Particle.publish("State","Error State - Power Cycle", PRIVATE);  // Broadcast Reset Action
        delay(2000);
        sysStatus.resetCount = 0;                                  // Zero the ResetCount
        systemStatusWriteNeeded=true;
        digitalWrite(deepSleepPin,HIGH);                              // This will cut all power to the Boron AND everything it powers
        rtc.setAlarm(10);

This way I don’t even need to create a new webhook. However, having a dedicated Pub/Sub for this purpose does seem more straightforward. My model would fail, for example, if the Webhook service provider went down.

Once I get the programmable timer part working, this will be the focus.

Chip

1 Like

That might be useful though.
I just looked back at my code and apparently I used (2) different versions.
One is the simple Pub/Sub previously mentioned.
The second version is a Webhook Callback when the data is successfully posted to ThingSpeak.
The former focuses on Cloud connectivity, while the latter ensures that the data ended up at the final destination.
The Webhook Callback version uses System.deviceID() in the Sub, so (1) Webhook can service all the Devices and ThinkSpeak Channels.
I'll be glad to share them with you, but your coding skills are way past mine :smiley:

@Rftop,

At this point, it would be helpful to collect ideas and then let folks decide what use case is more important for them.

Please do share your code snippet for this function. I don’t consider myself more than an adequate programmer and I learn something each time I look at someone’s code.

BTW, I signed up for the upcoming Particle ThingSpeak webinar. What do you think about their service?

Thanks, Chip

I didn't see the webinar announcement.
I love ThingSpeak. Its a easy place to store data and their Graphing API is very useful.
It's never going to touch the swanky Dashboard Apps/Services, but that's not TS's target market.
Having "onboard" access to MatLAB is useful for real Scientific Analysis for certain projects.

My personal opinion is ThingSpeak has a lot of advantages for Projects with 250 or Less endpoints... which is how many Independent Channels (each with 8 fields) you get with a basic Subscription. There are better choices for smaller price projects that need a larger scale for the business case.
For instance, I'll use a TS account for a Commercial or Industrial Mesh Network with <250 nodes including future expansion. But if that client wanted to deploy a Mesh at multiple locations, then TS wouldn't be the best direction (I'd lean towards AWS, etc).

Sorry for the rambling. I look forward to seeing your board in the future.

You have a point, but in my testing of Arduino based code (IMO at this stage the easiest way to work with ATTINY chips) - the serial library is much the same size as the I2C one - and you have the added complication of having to dedicate 2 pins on the host MCU to talk to it - vs I2C where you are sharing 2 pins that (in this carrier board case) are in use already talking to the RTC

1 Like

I like this idea - since various parts of your app could have very different WDT period requirements - i.e. when connecting to Particle you want maybe a 120s timeout, but in your I/O loop, perhapsa shorter period

@Rftop,

If you are interested, here is the link to the Webinar: https://www.particle.io/particle-and-thingspeak-webinar/

Topics Include:

AN IN-DEPTH INTEGRATION TALK
Join us on March 26th to learn best practices for integrating your IoT solutions with your existing cloud, systems, and applications.

YOU WILL LEARN:
How to securely connect your IoT solutions to Azure, Google and more
How to best utilize Particle’s edge-to-cloud platform
How to connect Particle with ThingSpeak for data visualization
Using ThingSpeak and MATLAB for data processing and advanced visualization
The webinar will also include a live Q&A session so you can get all your integration questions answered from experts. Register today to learn more!

2 Likes