Bug Bounty: Electron not booting after battery discharges completely

Hi all,

We have recently seen a number of different reports that an Electron will no longer boot up after powering them from a battery that has completely discharged. The type of battery and where it is applied varies. Here are a few related posts:

We have investigated this since the first report without success in reproducing, and after seeing more reports we ramped up the investigation yet again with little success. This is a top priority for us at this time, and since this issue remains elusive we need your help. We're placing a bounty of $200 in Particle store credit on this issue in the hopes that some of you in the Community have more information that can help us find the root cause of this issue.

Let's break down the details of the problem and tests done to date, then we'll talk about the bounty!

Problem

The Electron will not boot properly after the battery discharges completely. This could be the included LiPo battery, or a different one attached to the VIN and GND pins. One report simply suggested: the on-board blue D7 LED is only dimly lit after leaving the electron in a drawer for a month. This may indicate that some corruption has occurred in the bootloader, DCD, system or user flash memory. Pressing RESET does not clear the problem. The electron may not be able to enter DFU mode, or if it can enter DFU mode some other part of flash memory is corrupted. In general, it could be a hardware issue, firmware issue or most likely an interaction between the two.

Bounty Goal

There are ways to forcibly corrupt the firmware on embedded devices, but the goal with this bounty is to identify a repeatible way in which the Electron will no longer boot properly as described above.

What data to collect?

  • Ensure you are starting with a known base system firmware (0.4.8-rc.6 is the firmware Electrons are manufactured with and 0.5.2 is the current default).
  • There are an infinite number of things you could test for in user firmware. If you think some code is more prone to making this issue occur, go ahead and use it but keep it limited to things most users would do. I.e., do not attempt write low level code that specifically erases memory... that's invalid. Do use our Documented Firmware Reference for the Electron, everything in there is fair game! Whenever this has been reported in the past, the user has reported that they weren't doing anything special in their code. Some of them might have even been running Tinker. One of the posts above even has posted code.
  • Record voltages (and currents if you have enough equipment) :chart_with_upwards_trend:
  • Record times :clock430:
  • Record cycles :bicyclist: :bar_chart:
  • Record which pins are hooked up to which things.
  • Take pictures or video of the setup :camera:
  • What steps lead up to the Electron not booting any longer? Can you repeat them after backup/restore (below)?
  • Maybe you end up shipping us your hardware for post mortem.
  • Take a snapshot of firmware before and after testing. Before should be easy with a DFU command (dfu-util -d 2b04:d00a -a 0 -s 0x8000000:0x100000 -U electron_backup.bin), but after could require a JTAG tool if your bootloader is not operational. Note: Be sure not to post your Electron flash image on the forum since it contains IDs and keys which belong to you, not including any compiled firmware you may have written. Please email that with your data below, but there is no need to send a pre-death image without a post-death image or hardware.
  • More information is better, as long as it's organized :memo:
  • Please email your data collected to , or you may post it here if you don't have a complete entry and want to share your findings so that others may learn and build up the solution.
  • If you have a JTAG tool, you can restore your Electron with the combined binary image found here: (0.4.8-rc.6), and make sure to upload a copy of the Flash memory (post-death) before a complete erase! You will have to also run particle keys doctor <device_id> with the CLI after your Electron is back up and running again to connect to the Cloud.

The following are all of the tests I have done to date, these were performed on a G350 Electron running v0.4.8-rc.6 firmware with Tinker

Test #1 (manual - LiPo on JST):

Setup: Discharge supplied lipo to 2.85V (it’s protection cut-off voltage).

  • Revive battery with a bit of charging voltage supplied from VIN.
  • Remove VIN source.
  • Electron will boot and battery will drain again until the cut-off voltage is reached.
  • While the battery is attempting to reach it’s cut-off voltage, the modem will turn on and draw a higher current that causes the battery to sag very rapidly. This causes the STM32 to reset, because the process repeats itself over and over very rapidly with the RGB led flashing white. Eventually after ~14 seconds, the battery cut-off voltage is reached and the battery goes into a HI-Z state.
  • These steps can be easily repeated over and over.

Results Test #1

  • The bootloader or DCD does not appear to be corrupting, however a test of one is hardly a test at all. An automated test rig was set up to apply the charging voltage over and over (Test #2), with a dwell period of 60 seconds to ensure the battery completely drains again.
  • Electron was also powered from Li+ and VIN separately with no ill effects.
  • The Electron will shut off at about 2.4V on Li+. This is well below the battery cut-off voltage of 2.85V, which is the protection circuit opening the battery terminals (HI-Z). A charging voltage must be applied to the battery again to kick the protection circuit out of the HI-Z state.
  • The Electron will shut off at just below 4.33V on VIN without a battery attached on Li+. With a battery attached on Li+ it will stop charging a dead battery when VIN falls below 3.8V, after which the battery will take over powering the electron until it’s dead again at 2.85V.

Test #2 (automated cycling - LiPo on JST):

Setup: Discharge supplied lipo to 2.85V (it’s protection cut-off voltage).

  • Revive battery with a bit of charging voltage supplied from VIN for 3 seconds.
  • Remove VIN source for 60 seconds.
  • Electron will boot and battery will drain again until the cut-off voltage is reached.
  • While the battery is attempting to reach it’s cut-off voltage, the modem will turn on and draw a higher current that causes the battery to sag very rapidly. This causes the STM32 to reset, because the process repeats itself over and over very rapidly with the RGB led flashing white. Eventually after ~14 seconds, the battery cut-off voltage is reached and the battery goes into a HI-Z state.
  • These steps are automated over and over using a Particle relay shield powered by a Photon ( Photon code here )

Results Test #2:

After 1088 cycles of 3 seconds on, 60 seconds off, the electron still boots fine. It doesn’t have enough charge to make a connection though, so Test #3 will expand on this.


Test #3 (automated cycling - LiPo on JST):

Same as Test #2, but with 30 seconds on and 5 minutes off. This allows the electron to boot, connect to cellular, and handshake with the Cloud before it discharges.

Results Test #3:

30 seconds of charging seems to be enough to get it to connect to the cloud in most cases, and 5 minutes is more than enough to discharge it all of the way again. This one ran for several days, over 1175 cycles, and still boots fine!


Test #4 (automated cycling - 6V SLA on VIN):

Same as Test #3, except set up 6V SLA battery now to only apply power to VIN (for enough time to allow the electron to connect to the cloud) with the LiPo battery detached, then charging current to the battery is removed and it’s allowed to discharge. I’m using a simple 6V sealed lead acid battery for this test, as it will discharge similarly to many other battery types.

Results Test #4:

1085 cycles and the electron was still booting and connecting to the Cloud.


Test #5 (automated cycling - 6V SLA on VIN):

Same as Test #4, except off time is set to 1 hour instead of 5 minutes.

Results Test #5:

After a week of testing (about 200 cycles) it hasn’t been an issue. Soon after that I saw VIN is at 5V and the electron was off (dead). If I manually connect my relay I terminals to deliver charge, I get about 0.5A or so, but no red LED on the electron and no other dim blue LED either. The 3V3 output was not reaching more than about 2.5V or so with 5V applied to VIN. After plugging in a battery to Li+, the unit powered up instantly. This appears to have been some temporary latch-up condition with the PMIC.


So those were my tests.... many cycles later and yet hardly the results we were after :frowning:

We are issuing the following bounty:

  • If you (1) submit your findings in a well documented format (see What data to collect? above), and (2) your findings identify or lead to the discovery of the root cause of this issue; we will (3) send you a $200 credit for Particle stuff (Electrons, Photons, accessories, or kits).
  • If multiple findings are submitted, the first that we accept will get the bounty. If we conclude that multiple individuals contributed to the final root cause discovery, we'll split the bounty up accordingly and each entry will also receive a free Particle T-Shirt to reward your team effort :particle:!

Thanks for your help on this one, and if you have any questions, please let me know.

Firmware for the Electron can be found here, however you should not need to build firmware locally to encounter this issue. You may want to have a peek at the source code though.

For testing, you can try the firmware that's shipped with all manufactured Electrons (0.4.8-rc.6 or try one of the latest releases in firmware 0.5.2 is the current default)

If you'd like to discuss below, please do!


:books: For anyone looking to mitigate the risk of this issue (it feels very low though), you can check out the techniques employed in this electron-maintain-capacity app:

7 Likes

Just wanted to say that I’m really excited to make progress against this issue! We’ve had a lot of success working with the community in the past, and I’m excited to tackle another set of elusive symptoms together again!

3 Likes

Hi guys, being new to the Electron, but planning to implement it in a situation where, if the power goes out, the battery will discharge and there will be no way to manually intervene, I am wondering if there is a recommended protection scheme to avoid the issue while the problem is unresolved?

e.g. Watchdog checking the battery voltage and at some threshold putting the system into deep sleep until the power returns.

What is the best way to wake it up when the power is back and the battery is sufficiently charged? Is a hardware connection required to do this? Or can it be done in the firmware?

The last thing we want is unrecoverable Electrons in the field months from now…

I have not looked into this, but there may be a way to set or raise the battery voltage level that the PMIC chip stops supplying power to the Electron. Getting the PMIC to cut power to the Electron before the battery management board on the actual battery shuts off would probably be a good move.

I think right now it lets the battery drain down to a point where the battery management board cuts output power from the battery. But I’m 100% not sure.

@BDub Have you looked into this?

I had the battery discharge problem as well after accidentally leaving the battery connected with the power off for several days. The battery itself was destroyed in the process.

Fortunately, I had an STlink device available and was able to look at the Electron’s flash memory. I found that it had been completely erased – all bits set to 1.

Using the STlink, I was able to restore the device to functionality by loading an image of the binary code. That enabled me to get to DFU mode, from which tech support was able guide me to full restoration. There was no hardware damage aside from the battery itself.

I share the concern that others have expressed – If devices are installed in remote locations and the power fails, will the device survive? I have given the matter some thought and believe that there is a solution. The battery could be disconnected if main power fails. This could be accomplished by connecting the battery through a relay or MOSFET controlled by the main power supply. I haven’t tried it yet, but put it out there as a potential workaround.

@DanL, a full wipe of the device!!! How did the battery destroy itself? Did it not have under-voltage (ie over-discharge) protection hardware?

The fact that you found the Electron wiped is consistent with some reported Photon failures under similar conditions, if I recall correctly. It seems that the STM32 goes weird under low voltage conditions. There is a brown-out detector on the STM32 that will reset the CPU (and hot it there) until the voltage comes back up again (there is some code for this somewhere in the forum). It would be interesting to see if @BDub can replicate this failure.

This was a standard Electron, so it had whatever protection is built in. After the event I found that the battery had a next to 0 voltage and would not take a charge. I substituted a standard 18650 type cell to get the unit functional and that cell seems to charge without problems.

For me the takeaway is that the failure was flash memory problem. After memory was restored, all hardware appears to be working properly.

Interesting! One way I know that this can happen, is if the Option Bytes register is being operated on while the power is lost or hard reset is asserted the Read Protection Level can be set to 1 (default is 0). If an ST-Link is used, and Read Protection Level is set back to 0, the STM device will erase itself completely. This is by STM32 design and documented in the Flash Programming manual. We have seen very few instances of RDP level being set to 1, but even in this case the device can operate normally. It is only if you want to use JTAG tools that the device must be set back to RDP level 0 first.

When connecting your ST Link for this electron issue, please check out the Read Protection Level first by examining the Option Bytes register and take note of the level set. Then perform a Read of flash memory.

I'm going to work on a user application example that can help mitigate this issue. It will involve use of the Fuel Gauge on board the Electron so it will only be for users that have LiPos connected to their Electrons. For users with batteries connected to VIN, there is currently no way to measure the battery percentage or voltage built into the PMIC. The Fuel Gauge only monitors the LiPo. In the future we will have a System event that will be easy to hook for low power alerts. Because a wide range of power sources could be attached to VIN, exactly how you'd monitor them will vary for each application. I'll work on an example for how to do this with a SLA battery which seems to be a common choice.

There is :smile: Right now it's set to a default 4.3ish volts (4.33V in testing, 4.36V default in PMIC hardware which is PMIC power; power.setInputVoltageLimit(4360); in firmware) which is to ensure there is enough voltage to fully charge the LiPo. Part of the problem though is even though the PMIC may switch off, when it does the load presented on the battery decreases and it's voltage typically recovers, switching the PMIC back on. It will oscillate in this fashion on and off until the battery is too low to power the PMIC any longer. What needs to be done is a more active control system based on the application and projected usage of the battery and re-charging system. So you would want to create a system that monitored the battery and powered up only when you determined you have more than enough power to operate for as long as you need to connect and send data (or whatever else you need to do). Also, if you sensed that you were getting close to running out of acceptable power, you could drop down into a lower power operation state where you waited for the system to charge back up. You can think of how this might work with a solar system that has a lot of unexpected cloudy days... where you'll need a plan that will make it through these days where the system won't be charging much. This is a fairly involved example, so will require a longer discussion.

Yes this is a typical solution for battery operated systems. You want to protect the battery from deep discharge, and in the case of this issue you want to protect the system from whatever bug is causing it not to boot. In fact, it's the same solution miniaturized in LiPo protection circuits built right into the battery pack. The Electron ships with these types of batteries.

Which battery did you use (part numbers would be helpful)? Connected to which input on the Electron? How did you determine the battery is destroyed (won't charge, won't hold a charge, etc..)?

Thanks all! Let's keep the discussion going so we can squash this bug :bug:

@BDub I'm referring to the voltage where the PMIC disconnects the battery output that feeds the microprocessor.

Can you set the voltage where the PMIC disconnects the battery from the processor? If we can do that then maybe raising that setting to a higher voltage, something before the batteries Low Voltage Cutoff Protection kicks in.

If there is no way to do that via the PMIC then nevermind, your software solutions would be the only other solution.

Thank you. This would save us some wheel spinning while we are still on the learning curve.

I think I might have ran into this problem – led D7 is dimly lit, RGB LED is dead no matter what I do. Battery is completely discharged, but isn’t destroyed. I’m using the one that shipped with electron.

After several minutes of being plugged into USB, the RGB LED went white and everything is fine again.

1 Like

So, presuming that you are seeing a battery voltage collapse during a critical operation…

Do you have the MPU enabled? If there is a supply voltage droop and you are running with the flash timings (wait states, etc) and flash accelerator enabled then you can absolutely see bitflips in both instruction and data fetches during these events. We see them all the time on imp devices low on battery. The MPU can help catch these illegal accesses and prevent more bad things happening (even if you can’t correct for the issue, you can at least stop doing random things once an exception has been thrown).

A worthwhile experiment is to try to program the flash accelerator, wait states, and flash parallelism (for writes) for the most conservative settings (ie 1.8v operation even when you believe you’re at 3.3v). Does the problem still exhibit itself? If not then a brownout could be the issue.

The PVD can be used to trigger reconfiguration of the flash timings, too. We use a combination of all these things to give the best behavior when faced with unreliable power, but sometimes you just gotta crash. Are all your flash operations safe and recoverable, even if they fail mid-byte?

2 Likes

This is great info from your experience and I am sure the team appreciates it! But the symptom is that the entire flash memory is erased (all 1’s) when there were no planned flash writes.

I guess it is possible that bit flips lead to a code path that erases all memory but that seems a bit unlikely to me. More likely there is some hardware limitation that the team is not aware of on low voltage behavior. For instance, I know that one of your published designs has a 1.8V detector that forces the processor into reset on low voltage to avoid bad behavior. Particle devices have no such protection.

Do you have any insight on that?

That true; about the only simple thing that could do that is exit from RDP1 as has been noted… but why is anyone writing the option bytes?

A second simple experiment (beyond the 1.8v timings, because you don’t lose much performance - maybe 30%?) would be to program the MPU to prevent any access to the option bytes… that’s a pretty minimal MPU setup and shouldn’t affect any normal operation, so minimal code change.

Yes, Nora (environmental sensor) has a battery cut-off circuit, but that’s more for neatness. The STM itself has a BOR circuit (which you can reprogram to give a reset at higher voltages - worth considering if you’re 3v3 only) - and being used in plenty of industrial applications, it’s pretty good at resetting itself when correctly set up. The bitflips are more what you should be worrying about, which generally only happen when you’re running out of spec (ie temporarily below configured voltage) - but the sensitivity to voltage changes chip to chip due to process variations.

edit: if you’re always expecting to be at 3v3, then setting BOR to V(BOR3) will trigger a reset by 2.75v at the absolute lowest. That would allow you to continue using the fast flash settings. The default BOR setting is OFF so you have no brownout protection.

1 Like

If the issues are hapenning when the battery voltage goes low can’t we just turn off the power fet that is supplying power to the Electron before letting the battery voltage go to low?

I was looking thorugh the PMIC datasheet and firmware functions and found where even PMIC and power fet is drawing power when the battery is attached the electron is turned OFF. This may be causing issues for devices sleeping for long periods of time while the battery is low.

Not sure but if we stop feeding the SMT processor with low voltage it may stop erasing itself.

Just thinking out loud here, I have not tested any of this because I have not bricked a Electron yet but have some Photons from low battery voltage.

@BDub I have a quick coding question for a Solar Powered Electron I’m testing.

I let the Electron run until there was cloudy day and the batteries PCM cut the output off to Electron to protect the battery from over discharging.

The next day when the sun came back out the red PCIM led lit up to indicate it was charging but the Electron gets stuck in a state where the main LED is White and never recovers and starts running again. I can hit the reset and mode buttons a few times and eventually it will recover and my code starts working again.

I’m thinking the easiest way to prevent this from happening is to put the Electron to sleep when the Battery SOC hits something like 20% which will keep the Electron properly sleeping until it’s solar charged above 20% again when the sun comes out the next day.

I ended up just writing this code to prevent the Electron from connecting to cellular network unless the battery SOC is above 20%. Let me know if you see any obvious ways to improve it.

I ended up not putting the fuel gauge to sleep because it caused the fuel gauge SOC level to make sudden jumps in SOC when it’s turned OFF and ON. I saw lower sleep power consumption when not putting the fuel gauge to sleep for some reason also.

I’m only sending data to Ubidots and not connecting to the Particle Cloud in this test which makes sending data to Ubidots super quick and I only see the system awake to publish for 1 second every time it wakes up. Connecting to the Particle Cloud usually takes much longer even when using SLEEP_NETWORK_STANDBY

SYSTEM_MODE(SEMI_AUTOMATIC);
SYSTEM_THREAD(ENABLED);
// This #include statement was automatically added by the Particle IDE.
#include "Ubidots/Ubidots.h"

#define TOKEN "YourToken#"  // Put here your Ubidots TOKEN
#define DATA_SOURCE_NAME "ElectronSleepNew"


Ubidots ubidots(TOKEN); // A data source with particle name will be created in your Ubidots account


int button = D0; 
int ledPin = D7;              // LED connected to D1
int sleepInterval = 60;


void setup() {
 //Serial.begin(115200);
 pinMode(button, INPUT_PULLDOWN);    // sets pin as input
 pinMode(ledPin, OUTPUT);    // sets pin as output

 ubidots.setDatasourceName(DATA_SOURCE_NAME);
 
 PMIC pmic;
 //set charging current to 1024mA (512 + 512 offset)
 pmic.setChargeCurrent(0,0,1,0,0,0); 
 pmic.setInputVoltageLimit(4840);    
}

void loop() {
    
FuelGauge fuel;
 
    
if(fuel.getSoC() > 20)
  {
   
   float value1 = fuel.getVCell();
   float value2 = fuel.getSoC();
   
  ubidots.add("Volts", value1);  // Change for your variable name
  ubidots.add("SOC", value2);

  Cellular.connect();
  Cellular.ready();
  
  ubidots.sendAll();;

  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(250);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  delay(250);                  // waits for a second
  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(250);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  
  System.sleep(D0, RISING,sleepInterval * 2, SLEEP_NETWORK_STANDBY);
    
  }
  else
  {
      
  Cellular.on();
  delay(10000);
  Cellular.command("AT+CPWROFF\r\n");
  delay(2000);
  //FuelGauge().sleep();
  //delay(2000);
  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(150);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  delay(150);                  // waits for a second
  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(150);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  delay(150);
  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(150);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  delay(150);                  // waits for a second
  digitalWrite(ledPin, HIGH);   // sets the LED on
  delay(150);                  // waits for a second
  digitalWrite(ledPin, LOW);    // sets the LED off
  System.sleep(SLEEP_MODE_DEEP, 30);
  }
    
}  
   

Hey all! I’ll catch up on the back scroll soon and address all of your questions and comments. First though, I wanted to share something I’ve been working on. It’s a user app that maintains a minimum battery capacity on the Electron. I also added this to the bottom of the first post above.

:books: For anyone looking to mitigate the risk of this Electron not booting after battery discharges completely issue (it feels very low though), you can check out the techniques employed in this electron-maintain-capacity app:

The first goal with any good portable application is to never run out of power!

Please let me know what your experience is with using the techniques presented in the app. Thanks!

2 Likes

@BDub I’ll check out your code but using the code I wrote to put the Electron to Sleep when SOC hit 20% is working out perfectly.

Based on some test I’m running I estimate the Electron with 20% SOC could sleep and wake up to check SOC every hour for 25 days straight before the battery would die and cause the dreaded Electron manual reset or even worse Electron Failure.

This should be enough to keep Electrons alive in the wild without power loss problems. I’ll create a post on my solar charging experiences in the near future also to help others considering doing the same thing.