Issues with sleep - Need advice on troubleshooting ideas


#17

Since I didn’t see this issue getting any attention so far (not even after my bump) I doubt it was addressed yet.
However, if some dev just happened to see and squash that bug without being aware of my issue report, it could still be - worth a try with the code I provided there.


#18

@ScruffR,

Well, I hope this does get addressed.

Looking at your code, it seemed that the change for me to try was to put noInterrupts() before and interrupts() after the disconnectInterrupt() command. I also added a check for the int2Pin before sleeping

case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      interrupts();
      if (!digitalRead(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);  // Wake on either int2Pin or the top of the hour
      if (digitalRead(int2Pin)) {                                     // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Will give this a try.

Thanks,

Chip


#19

Update,

No luck. So, here is where I am now:

  1. Goes to sleep and wakes on a hardware pin interrupt - no issues
  2. Goes to sleep and wakes at the hour - does not function as expected
    • Connects to Particle - even though it should not
    • Ignores the state of the int2Pin
    • if the int2Pin is high, ignores the conditional that should prevent going to sleep
    • Goes to sleep with the int2Pin high and therefore cannot wake until the next hour

I am at my wits end with this. At this point, I have to assume there is a bug in the System.sleep(wakeUpPin, edgeTriggerMode, seconds) command or in how it is handing interrupts. I looks like my only option at this point is to give up on sleep which will have a significant impact on battery performance.

If anyone has a suggestion on what else to try, I am all ears.

Thanks, Chip


#20

Hmm, I’ve not looked for this particular point in your full code, but the NAPPING_STATE does only disconnect when connectionMode == true and hence if it wasn’t but a connection was present, then this might play a role

This might be due to a race condition.
Allow for some more time after wake before checking the state.

When using the pin for a RISING edge trigger, the system will implicitly attach the internal pull-down resistor, which - when you don’t have pinMode(int2Pin, INPUT_PULLDOWN) in your other code - needs to be removed again after wake.
BTW, checking the pin state after wake isn’t a reliable way to actually know whether it was a pin wake or not - there are several threads about this topic.


#21

@ScruffR,

Thank you for your continued assistance. I do hope we can find a solution.

  1. Good point about only checking the flag. I have added a Particle.connected() check to the conditional so I hope it will disconnect even if the flag is improperly set. One point on this, I have read that in previous releases, Particle.connected() was not very reliable. I hope this is fixed in 0.8.0 otherwise, I might try Cellular.RSSI() instead.

  2. Thank you for pointing out the connection state being reinstated after sleep. It helped me see that not fully disconnecting may be a core part of my problem.

  3. I would like to avoid adding delay()s in my code. By avoiding the digitalRead(), I hope I can avoid having to put a delay after sleep. If it does become necessary, I was unable to find any guidance as to how long it needs to be. 30mSec enough?

  4. Thank you for pointing this out. I saw that one of the new features in 0.8.0 is the reason for waking from “stop” - sleep mode.

  5. Having a Pull down resistor won’t hurt so I added it to the pinMode() statement in Setup.

My Napping function looks like this now - testing to see if it fixes things:

case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!digitalRead(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);  // Wake on either int2Pin or the top of the hour
      if (System.wokenUpByPin()) {                                           // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Seems to be working - went through the reporting cycle once. Will continue testing over the weekend. Fingers crossed!

Chip


#22

While avoiding delay() in running code, having it after a System.sleep() isn’t really anything like it - especially when only delaying for 100ms.
From the point of code flow, you wouldn’t be able to distinguish a slightly longer sleep from a sleep + delay :wink:

BTW, digitalRead() has some internal sanity check before actually checking the state of the pin. There is a faster way to read the state pinReadFast() which you could try too.


#23

@ScruffR,

Good point, implemented delay() and pinReadFast() and tried every trick I could think of.

I have been doing torture sessions over the weekend. I don’t think there is a reliable way to implement the System.sleep(interrupt pin, pin state, sleep in seconds) function. I works find for up to 1,000 cycles but eventually fails. This is not reliable enough for my trail counter use.

I am afraid that the issue you found with interrupts and 0.7.0 also applies to 0.8.0. Is there a way I can register this as an issue so that it might get fixed?

Thenks,

Chip


#24

Everybody can file an issue with the open source firmware repo.


#25

@ScruffR,

OK, will do that. Here is the net of my testing. If I comment out the sleep related code, my device runs without issues. To test this, simulate the accelerometer with a vibration motor and measure at least 25,000 “taps”.

  case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      stayAwakeTimeStamp = millis();                                  // Allows us to ensure we stay awake long enough to debounce
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!pinReadFast(int2Pin)) readRegister(MMA8452_ADDRESS,0x22);  // Reads the PULSE_SRC register to reset it - just in case
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      //if (!pinReadFast(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);                   // Wake on either int2Pin or the top of the hour
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      delay(20);
      /*
      if (System.wokenUpByPin()) {
        awokeFromNap = sensorDetect = true;                 // This flag will allow us to bypass the debounce in the recordCount function
      }
      */
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

If I uncomment out these two lines, the system will go to sleep with the int2Pin HIGH which prevents it from waking until the next hour. With these two lines uncommented, the device will only go about 2-3,000 “tap” cycles before locking up. Is this a clear enough indication that there is an issue with Sleep which should be reported?

Thanks,

Chip


#26

Hi chip… did you get any solution to the System.sleep(interrupt pin, pin state, sleep in seconds) situation that you found out? any acknowledge issue in the feature out of this?

I happened to have found this thread as I am experiencing similar issues with that function. I have a firmware running on over 20 electrons and they little by little started to fall into unknown states (unresponsive) since I introduced some changes. They rarely crashed before.

In reviewing such changes, and reading this thread, I confirm my suspicion over the System.sleep(interrupt pin, pin state, sleep in seconds) which I am now using to wake up the electron at 7pm every day. I wanted to use this wake up period to perform planned firmware upgrades and/or to signal that the module is alive in case it has not been used during the day (some context: the electrons are the core of a module connected to machines that may or may have not been used during the day. If the machine is not used, then the electron is just waiting sleep for the wake up/interrupt pin).

The electrons report very nicely and punctually for some days, but some (not all) get stuck and need an onsite reset.

Thanks for the insights


#27

Have you considered adding a deep sleep (System.sleep(SLEEP_MODE_DEEP, period)) from time to time?
Unlike Stop Mode (the one you are currently using) deep sleep (Standby Mode) causes a system reset which can help leverage potential heap fragmentation issues.
If your interrupt would happen to provide a rising edge, you can also wake-on-interrupt on the WKP pin.

To preserve the state of some variables across deep sleep/reset cycle you can use retained variables.


#28

Thanks ScruffR
Yes. I have one of those System.sleep(SLEEP_MODE_DEEP, period) as part of the code to “force” a reset every 7 or so days.
The modules were working fine waking with the interrupt pin using System.sleep(intpin, RISING) but started to behave erratically after upgrading to 0.8.0(rc10) and user firmware upgrades.
The two main things introduced in the user firmware upgrade were webhook response handlers (which I will send separately for your kind review so we rule out heap fragmentation or other issues) and this maintenance feature which consisted on 2 things:

  1. to have a soft reset every x number of days using the deep sleep you mention. Most of this functionality is reusing the code in the electronsample library and
  2. a daily scheduled wake up using System.sleep(interrupt pin, pin state, sleep in seconds). Upon wake up the electron publishes a “live” signal to cloud.

As mentioned before, I removed number 2 above to see if it improves.
Btw, I was also using #define for constant definition, and changed them for const as per the recommendation in this thread.


#29

@fenriquez,

First of all, I like @ScruffR’s suggestion and I put all my devices into SLEEP_MODE_DEEP each night. My devices are in remote areas so I went to some lengths to ensure their reliability:

  1. Changed my code to a Finite State Machine format to make it easier to know what code is running when problems occur

  2. Added a hardware watchdog timer to my carrier board

  3. Start tracking the reason for resets in Setup, if there are too many soft resets, I have circuitry on the carrier that allows the Electron to power cycle itself and all the peripherals

  4. Put the Electron into DEEP sleep each night as suggested below.

With these changes, I have only had to physically reset two devices over two years and dozens of installation.

Hope this helps,

Chip


#30

@fenriquez, Are you operating without a Li-po by any chance?
I had the same situation with Electrons after upgrading to 0.8.x as this Bug Report.
I wasn’t using the Li-Po’s as I provided 2+amp external power supplies with a carrier board.
The Solution was to use the Li-Po for now.
I’m not sure if this is related to your problem, but it never hurts to ask.


#31

hi @Rftop. Yes. Hardware is standard issue: electron + standard battery + some electronic low power consumption electronics to interface with & protect the electron pins.
Also regulated plenty of power is supplied via VIN.


#32

I too have issues with sleep…
the price of having a toddler…

sorry couldn’t resist because…
issues with sleep… lol… :slight_smile:


#33

Hi @chipmc and @ScruffR . My devices are connected to machines that can operate at any moment so I put them to sleep with System.sleep(interrupt pin, RISING) when the machines are not operating. The electron wakes up when the machine is turned on.

As said before, I also implemented an scheduled full-modem reset every few days as per the electronsample library (disconnect session, disconnect particle, SIM reset and sleep_mode_deep for 10 secs) but this will use a lot of 3G data if we do it every time it goes to sleep (this can be several times per day)

adding a System.sleep(sleep_mode_deep, seconds), without the full modem reset, would help with the potential issues you mention?
thanks again


#34

@fenriquez,

Increasing the long term reliability of any system is hard work. I understand about your requirement to be able to respond at any time as I have a few systems that have this same requirement (monitoring industrial control systems). Putting a system to DEEP sleep will cause the system to reset which may help with your issue but, make sure you try these steps as well.

I will assume that your system works as expected under normal development and testing and these issues are rare and intermittent - the hardest bugs to squash.

Broadly speaking, you can take three approaches to fixing these problems:

  • Preemptively squashing them in software - the DEEP sleep approach
  • Preemptively squashing them in hardware - external watchdog timers or the power-cycle functionality I mentioned above
  • Fixing the software - this is almost always the issue. But it is the hardest to solve but the best in the long run.

Here is my approach to finding where I have made a coding error that only rarely and intermittently causes an issue:

  1. Figure out a way to torture test your system so you can trigger the flaw. If it locks up once every few days, use accelerated testing to get it to fail within an hour. For example, I have a vehicle counter that counts up to 400 cars a day but it would lock every few days. I built a test rig to mechanically simulate 20,000 cars an hour. This test rig allowed me to validate my fixes more quickly than field testing.
  2. Try to capture the state of your system when it fails. Add logging, serial or Particle.publish() code to help you determine what state the system is in when it fails.
  3. Write your code so that you are reviewing an ever smaller block of code as you progress. Finite State Machine approach is a great way to refine the amount of code you need to troubleshoot.

I hope this helps,

Chip


#35

@chipmc Hey just saw this on adafruit and it made me think about your custom vehicle counter setup. This may make the job of making those counters easier.

https://www.adafruit.com/product/3965


#36

@RWB,

Thank you for sending this link. I wish I had seen this before I invested the time to develop my own pressure sensor breakout board. Might have saved me some time.

Thank you also for all the advice and help you have given me as I started developing on Particle.

Chip