Issues with sleep - Need advice on troubleshooting ideas

@ScruffR,

Thank you for your continued assistance. I do hope we can find a solution.

  1. Good point about only checking the flag. I have added a Particle.connected() check to the conditional so I hope it will disconnect even if the flag is improperly set. One point on this, I have read that in previous releases, Particle.connected() was not very reliable. I hope this is fixed in 0.8.0 otherwise, I might try Cellular.RSSI() instead.

  2. Thank you for pointing out the connection state being reinstated after sleep. It helped me see that not fully disconnecting may be a core part of my problem.

  3. I would like to avoid adding delay()s in my code. By avoiding the digitalRead(), I hope I can avoid having to put a delay after sleep. If it does become necessary, I was unable to find any guidance as to how long it needs to be. 30mSec enough?

  4. Thank you for pointing this out. I saw that one of the new features in 0.8.0 is the reason for waking from “stop” - sleep mode.

  5. Having a Pull down resistor won’t hurt so I added it to the pinMode() statement in Setup.

My Napping function looks like this now - testing to see if it fixes things:

case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!digitalRead(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);  // Wake on either int2Pin or the top of the hour
      if (System.wokenUpByPin()) {                                           // Need to test if Tap or Time woke us up
        awokeFromNap = true;                                          // This flag will allow us to bypass the debounce in the recordCount function
        recordCount();                                                // Count the tap that awoke the device
        stayAwakeTimeStamp = millis();                                // Allows us to ensure we stay awake long enough to debounce
      }
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

Seems to be working - went through the reporting cycle once. Will continue testing over the weekend. Fingers crossed!

Chip

While avoiding delay() in running code, having it after a System.sleep() isn't really anything like it - especially when only delaying for 100ms.
From the point of code flow, you wouldn't be able to distinguish a slightly longer sleep from a sleep + delay :wink:

BTW, digitalRead() has some internal sanity check before actually checking the state of the pin. There is a faster way to read the state pinReadFast() which you could try too.

2 Likes

@ScruffR,

Good point, implemented delay() and pinReadFast() and tried every trick I could think of.

I have been doing torture sessions over the weekend. I don’t think there is a reliable way to implement the System.sleep(interrupt pin, pin state, sleep in seconds) function. I works find for up to 1,000 cycles but eventually fails. This is not reliable enough for my trail counter use.

I am afraid that the issue you found with interrupts and 0.7.0 also applies to 0.8.0. Is there a way I can register this as an issue so that it might get fixed?

Thenks,

Chip

Everybody can file an issue with the open source firmware repo.

@ScruffR,

OK, will do that. Here is the net of my testing. If I comment out the sleep related code, my device runs without issues. To test this, simulate the accelerometer with a vibration motor and measure at least 25,000 “taps”.

  case NAPPING_STATE: {
      if (connectionMode && verboseMode && state != oldState) publishStateTransition();
      stayAwake = debounce;                                           // Ensures that we stay awake long enough to debounce a tap
      stayAwakeTimeStamp = millis();                                  // Allows us to ensure we stay awake long enough to debounce
      if (connectionMode || Particle.connected()) disconnectFromParticle();                   // If connected, we need to disconned and power down the modem
      watchdogISR();                                                  // Pet the watchdog
      int wakeInSeconds = constrain(wakeBoundary - Time.now() % wakeBoundary, 1, wakeBoundary);
      if (!pinReadFast(int2Pin)) readRegister(MMA8452_ADDRESS,0x22);  // Reads the PULSE_SRC register to reset it - just in case
      noInterrupts();
      detachInterrupt(int2Pin);                                       // Detach since sleep will monitor the int2Pin
      interrupts();
      //if (!pinReadFast(int2Pin)) System.sleep(int2Pin, RISING, wakeInSeconds);                   // Wake on either int2Pin or the top of the hour
      attachInterrupt(int2Pin,sensorISR,RISING);                      // Reattach Accelerometer interrupt from low to high
      delay(20);
      /*
      if (System.wokenUpByPin()) {
        awokeFromNap = sensorDetect = true;                 // This flag will allow us to bypass the debounce in the recordCount function
      }
      */
      state = IDLE_STATE;                                             // Back to the IDLE_STATE after a nap will come back after the stayAwake time is over
  } break;

If I uncomment out these two lines, the system will go to sleep with the int2Pin HIGH which prevents it from waking until the next hour. With these two lines uncommented, the device will only go about 2-3,000 “tap” cycles before locking up. Is this a clear enough indication that there is an issue with Sleep which should be reported?

Thanks,

Chip

1 Like

Hi chip… did you get any solution to the System.sleep(interrupt pin, pin state, sleep in seconds) situation that you found out? any acknowledge issue in the feature out of this?

I happened to have found this thread as I am experiencing similar issues with that function. I have a firmware running on over 20 electrons and they little by little started to fall into unknown states (unresponsive) since I introduced some changes. They rarely crashed before.

In reviewing such changes, and reading this thread, I confirm my suspicion over the System.sleep(interrupt pin, pin state, sleep in seconds) which I am now using to wake up the electron at 7pm every day. I wanted to use this wake up period to perform planned firmware upgrades and/or to signal that the module is alive in case it has not been used during the day (some context: the electrons are the core of a module connected to machines that may or may have not been used during the day. If the machine is not used, then the electron is just waiting sleep for the wake up/interrupt pin).

The electrons report very nicely and punctually for some days, but some (not all) get stuck and need an onsite reset.

Thanks for the insights

Have you considered adding a deep sleep (System.sleep(SLEEP_MODE_DEEP, period)) from time to time?
Unlike Stop Mode (the one you are currently using) deep sleep (Standby Mode) causes a system reset which can help leverage potential heap fragmentation issues.
If your interrupt would happen to provide a rising edge, you can also wake-on-interrupt on the WKP pin.

To preserve the state of some variables across deep sleep/reset cycle you can use retained variables.

Thanks ScruffR
Yes. I have one of those System.sleep(SLEEP_MODE_DEEP, period) as part of the code to “force” a reset every 7 or so days.
The modules were working fine waking with the interrupt pin using System.sleep(intpin, RISING) but started to behave erratically after upgrading to 0.8.0(rc10) and user firmware upgrades.
The two main things introduced in the user firmware upgrade were webhook response handlers (which I will send separately for your kind review so we rule out heap fragmentation or other issues) and this maintenance feature which consisted on 2 things:

  1. to have a soft reset every x number of days using the deep sleep you mention. Most of this functionality is reusing the code in the electronsample library and
  2. a daily scheduled wake up using System.sleep(interrupt pin, pin state, sleep in seconds). Upon wake up the electron publishes a “live” signal to cloud.

As mentioned before, I removed number 2 above to see if it improves.
Btw, I was also using #define for constant definition, and changed them for const as per the recommendation in this thread.

@fenriquez,

First of all, I like @ScruffR’s suggestion and I put all my devices into SLEEP_MODE_DEEP each night. My devices are in remote areas so I went to some lengths to ensure their reliability:

  1. Changed my code to a Finite State Machine format to make it easier to know what code is running when problems occur

  2. Added a hardware watchdog timer to my carrier board

  3. Start tracking the reason for resets in Setup, if there are too many soft resets, I have circuitry on the carrier that allows the Electron to power cycle itself and all the peripherals

  4. Put the Electron into DEEP sleep each night as suggested below.

With these changes, I have only had to physically reset two devices over two years and dozens of installation.

Hope this helps,

Chip

1 Like

@fenriquez, Are you operating without a Li-po by any chance?
I had the same situation with Electrons after upgrading to 0.8.x as this Bug Report.
I wasn't using the Li-Po's as I provided 2+amp external power supplies with a carrier board.
The Solution was to use the Li-Po for now.
I'm not sure if this is related to your problem, but it never hurts to ask.

hi @Rftop. Yes. Hardware is standard issue: electron + standard battery + some electronic low power consumption electronics to interface with & protect the electron pins.
Also regulated plenty of power is supplied via VIN.

I too have issues with sleep…
the price of having a toddler…

sorry couldn’t resist because…
issues with sleep… lol… :slight_smile:

2 Likes

Hi @chipmc and @ScruffR . My devices are connected to machines that can operate at any moment so I put them to sleep with System.sleep(interrupt pin, RISING) when the machines are not operating. The electron wakes up when the machine is turned on.

As said before, I also implemented an scheduled full-modem reset every few days as per the electronsample library (disconnect session, disconnect particle, SIM reset and sleep_mode_deep for 10 secs) but this will use a lot of 3G data if we do it every time it goes to sleep (this can be several times per day)

adding a System.sleep(sleep_mode_deep, seconds), without the full modem reset, would help with the potential issues you mention?
thanks again

@fenriquez,

Increasing the long term reliability of any system is hard work. I understand about your requirement to be able to respond at any time as I have a few systems that have this same requirement (monitoring industrial control systems). Putting a system to DEEP sleep will cause the system to reset which may help with your issue but, make sure you try these steps as well.

I will assume that your system works as expected under normal development and testing and these issues are rare and intermittent - the hardest bugs to squash.

Broadly speaking, you can take three approaches to fixing these problems:

  • Preemptively squashing them in software - the DEEP sleep approach
  • Preemptively squashing them in hardware - external watchdog timers or the power-cycle functionality I mentioned above
  • Fixing the software - this is almost always the issue. But it is the hardest to solve but the best in the long run.

Here is my approach to finding where I have made a coding error that only rarely and intermittently causes an issue:

  1. Figure out a way to torture test your system so you can trigger the flaw. If it locks up once every few days, use accelerated testing to get it to fail within an hour. For example, I have a vehicle counter that counts up to 400 cars a day but it would lock every few days. I built a test rig to mechanically simulate 20,000 cars an hour. This test rig allowed me to validate my fixes more quickly than field testing.
  2. Try to capture the state of your system when it fails. Add logging, serial or Particle.publish() code to help you determine what state the system is in when it fails.
  3. Write your code so that you are reviewing an ever smaller block of code as you progress. Finite State Machine approach is a great way to refine the amount of code you need to troubleshoot.

I hope this helps,

Chip

2 Likes

@chipmc Hey just saw this on adafruit and it made me think about your custom vehicle counter setup. This may make the job of making those counters easier.

https://www.adafruit.com/product/3965

@RWB,

Thank you for sending this link. I wish I had seen this before I invested the time to develop my own pressure sensor breakout board. Might have saved me some time.

Thank you also for all the advice and help you have given me as I started developing on Particle.

Chip

Hello again. Apologies for coming back to this issue. The sleep modes and v0.8.0rcX are giving me headaches. I have found several electrons running 0.8.0rc9 or rc10 not been able to wake up from sleep using an interrupt pin. The electrons are just asleep unresponsive to the interrupt pin and they need to be reset (push reset button) to wake up again.

I originally had the following sequence to send the device to sleep (with no issues) using electrons in 0.7.0 (0.6.4 in some cases):

if (Time.now()>=time_to_sleep && flag_to_sleep){

  initialize_variables();
  Cellular.on();
  delay(2000);
  Cellular.off();
  delay(1000);
  System.sleep(pinSwitch, RISING);
  flag_from_sleep = TRUE; // the device is in a very noisy environment and it wake ups unintentionally. This flag gets the device into another small routine to set the appropriate values in time_to_sleep & flag_to_sleep so the device goes back here to sleep again. 
}

Since I migrated to 0.8.0, I introduced the ``System.sleep(pinSwitch, RISING, wakeup_time); having similar stability issues as mentioned in this thread before so I removed it. I got back to only waking up with the pin but also introduced a Particle.connected() to make sure it’s connected before it goes to sleep. The “going to sleep” routine goes like this now:

if (Time.now()>=time_to_sleep && flag_to_sleep && Particle.connected()){

  initialize_variables();
  check_if_reboot_needed(); //small routine that sends the device to sleep_mode_deep for 10 secs and resets the cloud connection every 4 days. I use the code from electron_sample
  System.sleep(pinSwitch, RISING);
  delay(100);
  Particle.connect(); //in case has troubles to connect again?
  flag_from_sleep = TRUE;  
}

note I removed the cellular_on & cellular_off steps. I had them before to save on battery before going to sleep but the devices have 100% external power connected so decided to remove these.

Do you see anything weird here? could there be a bug in 0.8.0? The issue happens after many iterations: ie the device goes to sleep and wakes up with the pin with no problems for several days/weeks and then stops waking up. It has happened on 4 devices already.

Thanks in advance
PS - in all cases, it uses SYSTEM_THREAD(ENABLED); and SYSTEM_MODE(SEMI_AUTOMATIC);

I myself seem to have issues with not being able to wake after several rounds of deep sleep phases, but haven’t yet come across the same with Stop Mode sleep.
However I’ll have to have some test with that mode for myself too anyway.

One thing I’ve planned on testing also is to add external stronger pull-downs to avoid the pin being HIGH when entering sleep mode potentially masking any future rising edge.

Thanks @ScruffR. Do you see any issues in the piece of code I used?

The devices were working with the original version of code without any issue for several months so I doubt putting additional hardware will help . They started falling into this unresponsive sleep state since migrating to 0.8.0 and making those changes in the code before the System.sleep

I can’t see anything obvious wrong, but there might be issues in the not disclosed code that may lay foundation for an issue with sleep.

But it’s sure possible that a 0.8.0 “bug” may be part too.