Issues with sleep - Need advice on troubleshooting ideas


#14

@ScruffR,

OK, back from Peru (and back to sea level!) and was able to give it a try. Detaching the interrupt before sleep and reattaching after sleep does not change the behavior.

To recap:

  1. This code works perfectly until it does an hourly report.
  2. After doing an hourly report, it goes into a weird mode - almost like it cannot read the status of the int2pin. It wakes from sleep on a tap, connects to Particle and then immediately disconnects. I have no idea how it is doing this given the code flow as I understand it.
  3. The device then goes back to sleep with the int2pin HIGH so it does not wake up until the next hour.

Any ideas on troubleshooting would be appreciated.

Full code repository here.

Napping function - where I think the wheels are falling off is here:

  case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      System.sleep(int2Pin, RISING, wakeInSeconds);                   // Wake on either int2Pin or the top of the hour
      if (digitalRead(int2Pin)) {                                     // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Thank you all for your help and advice.

Chip

PS, I liked the suggestion on constraining the sleep seconds. - thank you.


#15

Some other things you may need to consider

  • if the pin is already in the state you want to trigger for, the wake interrupt may not work, so wait for the pin to return to passive state before going to sleep
  • I have found some issue with detachInterrupt() which over time tends to barf things up
  • reconnect after sleep depends on the state of the connection prior to sleep, if it was in a limbo state before the system tends to see this as the desired state for next wake :stuck_out_tongue_closed_eyes:
  • while others seem to trust the auto-reconnect, I usually put my own connection monitoring in place and tear down the connection and manually reconnect, when I see odd behaviour

Just some personal views

Over time I came to the conclusion that it’s best to avoid #define for constant definitions wherever possible, due to their “obscure” range of scope, sometimes interfering with otherwise unrelated code.
Hence I’d replace things like this

#define VERSIONNUMBER 9             // Increment this number each time the memory map is changed
#define WORDSIZE 8                  // For the Word size the number of bytes in a "word"
#define PAGESIZE 4096               // Memory size in bytes / word size - 256kb FRAM
#define HOURLYOFFSET 24             // First word of hourly counts (remember we start counts at 1)
#define HOURLYCOUNTNUMBER 4064      // used in modulo calculations - sets the # of hours stored - 256k (4096-14-2)
// First Word - 8 bytes for setting global values
#define VERSIONADDR 0x0             // Where we store the memory map version number
#define SENSITIVITYADDR 0x1         // Sensitivity for Accelerometer sensors
#define DEBOUNCEADDR 0x2            // Where we store debounce in cSec or 1/10s of a sec (ie 1.6sec is stored as 16)
#define RESETCOUNT 0x3              // This is where we keep track of how often the Electron was reset
#define TIMEZONE  0x4               // Store the local time zone data                                    // One byte is open here
#define OPENTIMEADDR 0x5            // Hour for opening the park / store / etc - military time (e.g. 6 is 6am)
#define CLOSETIMEADDR 0x6           // Hour for closing of the park / store / etc - military time (e.g 23 is 11pm)
#define CONTROLREGISTER 0x7         // This is the control register for storing the current state
//Second and Third words bytes for storing current counts
#define CURRENTHOURLYCOUNT 0x8      // Current Hourly Count - 16 bits
#define CURRENTDAILYCOUNT 0xC       // Current Daily Count - 16 bits
#define CURRENTCOUNTSTIME 0xE       // Time of last count - 32 bits
#define HOURLYPOINTERADDR 0x11      // Two bytes for hourly pointer

with an enum that has an explicit scope also allowing for the use of namespaces.

Any other kinds of constants and literals I’d just define as actual const variables

#define HOURLYCOUNTOFFSET 4         // Offsets for the values in the hourly words
#define HOURLYBATTOFFSET 6          // Where the hourly battery charge is stored
// Finally, here are the variables I want to change often and pull them all together here
#define SOFTWARERELEASENUMBER "0.59"

as

const int    HOURLYCOUNTOFFSET     = 4;  // Offsets for the values in the hourly words
const int    HOURLYBATTOFFSET      = 6;  // Where the hourly battery charge is stored
// Finally, here are the variables I want to change often and pull them all together here
const char[] SOFTWARERELEASENUMBER = "0.59";

#16

@ScruffR,

First, thank you for taking a look at my code and suggesting a better approach than the #defines. That section now looks like this:

namespace FRAM {                                    // Moved to namespace instead of #define to limit scope
  enum Addresses {
    versionAddr =0x0,                               // Where we store the memory map version number
    sensitivityAddr= 0x1 ,                          // Sensitivity for Accelerometer sensors
    debounceAddr= 0x2,                              // Where we store debounce in cSec or 1/10s of a sec (ie 1.6sec is stored as 16)
    resetCountAddr =0x3 ,                           // This is where we keep track of how often the Electron was reset
    timeZoneAddr = 0x4  ,                           // Store the local time zone data
    openTimeAddr= 0x5 ,                             // Hour for opening the park / store / etc - military time (e.g. 6 is 6am)
    closeTimeAddr =0x6  ,                           // Hour for closing of the park / store / etc - military time (e.g 23 is 11pm)
    controlRegisterAddr =0x7 ,                      // This is the control register for storing the current state
    currentHourlyCountAddr =0x8 ,                   // Current Hourly Count - 16 bits
    currentDailyCountAddr =0xC ,                    // Current Daily Count - 16 bits
    currentCountsTimeAddr =0xE ,                    // Time of last count - 32 bits
  };
};

and when I need one of these values I call it using FRAM::closeTimeAddr. I understand this will reduce the chance of conflict with variable names in libraries and all. This is my first time using a namespace so, I hope I got it write - the code compiles and works anyway.

Now, onto the issue at hand. I read your issue on detachInterrupt() but I am using 0.8.0rc-4 - has this issue been addressed in the 0.8.0 release? I have tried checking the value of the interrupt pin and only going to sleep once it it low but, in the weird state it gets into, these checks don’t seem to work.

You had mentioned not trusting the auto-reconnect and that you team down the connection. What are the steps I need to take in order to try this approach?

Thanks again for all your help,

Chip


#17

Since I didn’t see this issue getting any attention so far (not even after my bump) I doubt it was addressed yet.
However, if some dev just happened to see and squash that bug without being aware of my issue report, it could still be - worth a try with the code I provided there.


#18

@ScruffR,

Well, I hope this does get addressed.

Looking at your code, it seemed that the change for me to try was to put noInterrupts() before and interrupts() after the disconnectInterrupt() command. I also added a check for the int2Pin before sleeping

case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      interrupts();
      if (!digitalRead(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);  // Wake on either int2Pin or the top of the hour
      if (digitalRead(int2Pin)) {                                     // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Will give this a try.

Thanks,

Chip


#19

Update,

No luck. So, here is where I am now:

  1. Goes to sleep and wakes on a hardware pin interrupt - no issues
  2. Goes to sleep and wakes at the hour - does not function as expected
    • Connects to Particle - even though it should not
    • Ignores the state of the int2Pin
    • if the int2Pin is high, ignores the conditional that should prevent going to sleep
    • Goes to sleep with the int2Pin high and therefore cannot wake until the next hour

I am at my wits end with this. At this point, I have to assume there is a bug in the System.sleep(wakeUpPin, edgeTriggerMode, seconds) command or in how it is handing interrupts. I looks like my only option at this point is to give up on sleep which will have a significant impact on battery performance.

If anyone has a suggestion on what else to try, I am all ears.

Thanks, Chip


#20

Hmm, I’ve not looked for this particular point in your full code, but the NAPPING_STATE does only disconnect when connectionMode == true and hence if it wasn’t but a connection was present, then this might play a role

This might be due to a race condition.
Allow for some more time after wake before checking the state.

When using the pin for a RISING edge trigger, the system will implicitly attach the internal pull-down resistor, which - when you don’t have pinMode(int2Pin, INPUT_PULLDOWN) in your other code - needs to be removed again after wake.
BTW, checking the pin state after wake isn’t a reliable way to actually know whether it was a pin wake or not - there are several threads about this topic.


#21

@ScruffR,

Thank you for your continued assistance. I do hope we can find a solution.

  1. Good point about only checking the flag. I have added a Particle.connected() check to the conditional so I hope it will disconnect even if the flag is improperly set. One point on this, I have read that in previous releases, Particle.connected() was not very reliable. I hope this is fixed in 0.8.0 otherwise, I might try Cellular.RSSI() instead.

  2. Thank you for pointing out the connection state being reinstated after sleep. It helped me see that not fully disconnecting may be a core part of my problem.

  3. I would like to avoid adding delay()s in my code. By avoiding the digitalRead(), I hope I can avoid having to put a delay after sleep. If it does become necessary, I was unable to find any guidance as to how long it needs to be. 30mSec enough?

  4. Thank you for pointing this out. I saw that one of the new features in 0.8.0 is the reason for waking from “stop” - sleep mode.

  5. Having a Pull down resistor won’t hurt so I added it to the pinMode() statement in Setup.

My Napping function looks like this now - testing to see if it fixes things:

case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!digitalRead(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);  // Wake on either int2Pin or the top of the hour
      if (System.wokenUpByPin()) {                                           // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Seems to be working - went through the reporting cycle once. Will continue testing over the weekend. Fingers crossed!

Chip


#22

While avoiding delay() in running code, having it after a System.sleep() isn’t really anything like it - especially when only delaying for 100ms.
From the point of code flow, you wouldn’t be able to distinguish a slightly longer sleep from a sleep + delay :wink:

BTW, digitalRead() has some internal sanity check before actually checking the state of the pin. There is a faster way to read the state pinReadFast() which you could try too.


#23

@ScruffR,

Good point, implemented delay() and pinReadFast() and tried every trick I could think of.

I have been doing torture sessions over the weekend. I don’t think there is a reliable way to implement the System.sleep(interrupt pin, pin state, sleep in seconds) function. I works find for up to 1,000 cycles but eventually fails. This is not reliable enough for my trail counter use.

I am afraid that the issue you found with interrupts and 0.7.0 also applies to 0.8.0. Is there a way I can register this as an issue so that it might get fixed?

Thenks,

Chip


#24

Everybody can file an issue with the open source firmware repo.


#25

@ScruffR,

OK, will do that. Here is the net of my testing. If I comment out the sleep related code, my device runs without issues. To test this, simulate the accelerometer with a vibration motor and measure at least 25,000 “taps”.

  case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      stayAwakeTimeStamp = millis();                                  // Allows us to ensure we stay awake long enough to debounce
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!pinReadFast(int2Pin)) readRegister(MMA8452_ADDRESS,0x22);  // Reads the PULSE_SRC register to reset it - just in case
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      //if (!pinReadFast(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);                   // Wake on either int2Pin or the top of the hour
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      delay(20);
      /*
      if (System.wokenUpByPin()) {
        awokeFromNap = sensorDetect = true;                 // This flag will allow us to bypass the debounce in the recordCount function
      }
      */
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

If I uncomment out these two lines, the system will go to sleep with the int2Pin HIGH which prevents it from waking until the next hour. With these two lines uncommented, the device will only go about 2-3,000 “tap” cycles before locking up. Is this a clear enough indication that there is an issue with Sleep which should be reported?

Thanks,

Chip


#26

Hi chip… did you get any solution to the System.sleep(interrupt pin, pin state, sleep in seconds) situation that you found out? any acknowledge issue in the feature out of this?

I happened to have found this thread as I am experiencing similar issues with that function. I have a firmware running on over 20 electrons and they little by little started to fall into unknown states (unresponsive) since I introduced some changes. They rarely crashed before.

In reviewing such changes, and reading this thread, I confirm my suspicion over the System.sleep(interrupt pin, pin state, sleep in seconds) which I am now using to wake up the electron at 7pm every day. I wanted to use this wake up period to perform planned firmware upgrades and/or to signal that the module is alive in case it has not been used during the day (some context: the electrons are the core of a module connected to machines that may or may have not been used during the day. If the machine is not used, then the electron is just waiting sleep for the wake up/interrupt pin).

The electrons report very nicely and punctually for some days, but some (not all) get stuck and need an onsite reset.

Thanks for the insights


#27

Have you considered adding a deep sleep (System.sleep(SLEEP_MODE_DEEP, period)) from time to time?
Unlike Stop Mode (the one you are currently using) deep sleep (Standby Mode) causes a system reset which can help leverage potential heap fragmentation issues.
If your interrupt would happen to provide a rising edge, you can also wake-on-interrupt on the WKP pin.

To preserve the state of some variables across deep sleep/reset cycle you can use retained variables.


#28

Thanks ScruffR
Yes. I have one of those System.sleep(SLEEP_MODE_DEEP, period) as part of the code to “force” a reset every 7 or so days.
The modules were working fine waking with the interrupt pin using System.sleep(intpin, RISING) but started to behave erratically after upgrading to 0.8.0(rc10) and user firmware upgrades.
The two main things introduced in the user firmware upgrade were webhook response handlers (which I will send separately for your kind review so we rule out heap fragmentation or other issues) and this maintenance feature which consisted on 2 things:

  1. to have a soft reset every x number of days using the deep sleep you mention. Most of this functionality is reusing the code in the electronsample library and
  2. a daily scheduled wake up using System.sleep(interrupt pin, pin state, sleep in seconds). Upon wake up the electron publishes a “live” signal to cloud.

As mentioned before, I removed number 2 above to see if it improves.
Btw, I was also using #define for constant definition, and changed them for const as per the recommendation in this thread.


#29

@fenriquez,

First of all, I like @ScruffR’s suggestion and I put all my devices into SLEEP_MODE_DEEP each night. My devices are in remote areas so I went to some lengths to ensure their reliability:

  1. Changed my code to a Finite State Machine format to make it easier to know what code is running when problems occur

  2. Added a hardware watchdog timer to my carrier board

  3. Start tracking the reason for resets in Setup, if there are too many soft resets, I have circuitry on the carrier that allows the Electron to power cycle itself and all the peripherals

  4. Put the Electron into DEEP sleep each night as suggested below.

With these changes, I have only had to physically reset two devices over two years and dozens of installation.

Hope this helps,

Chip


#30

@fenriquez, Are you operating without a Li-po by any chance?
I had the same situation with Electrons after upgrading to 0.8.x as this Bug Report.
I wasn’t using the Li-Po’s as I provided 2+amp external power supplies with a carrier board.
The Solution was to use the Li-Po for now.
I’m not sure if this is related to your problem, but it never hurts to ask.


#31

hi @Rftop. Yes. Hardware is standard issue: electron + standard battery + some electronic low power consumption electronics to interface with & protect the electron pins.
Also regulated plenty of power is supplied via VIN.


#32

I too have issues with sleep…
the price of having a toddler…

sorry couldn’t resist because…
issues with sleep… lol… :slight_smile:


#33

Hi @chipmc and @ScruffR . My devices are connected to machines that can operate at any moment so I put them to sleep with System.sleep(interrupt pin, RISING) when the machines are not operating. The electron wakes up when the machine is turned on.

As said before, I also implemented an scheduled full-modem reset every few days as per the electronsample library (disconnect session, disconnect particle, SIM reset and sleep_mode_deep for 10 secs) but this will use a lot of 3G data if we do it every time it goes to sleep (this can be several times per day)

adding a System.sleep(sleep_mode_deep, seconds), without the full modem reset, would help with the potential issues you mention?
thanks again