Bug Bounty: Electron not booting after battery discharges completely


#136

I’ve seen you mention this code, but did not care much to get the link. Now that we’re at it, please point me to it.
By “dead”, i presume you mean “flat”. The absolute minimum voltage to discharge Li-ion batteries safely to is about 2.8V, so 3.0V is quite safe.
A 1W solar panel in Uganda is actually overqualified for our deployment, in the sunny season. It’s currently fixed at 1W for research purposes though, so we need to have it in the rainy season as well.
What I removed was the battery protection IC in the battery. Perhaps I should start charging through the TI PMIC as designed. The panel is 6v though, I’ll need to check the maximum voltage it can take.


#137

Yea Flat, empty enough to not support the Electrons startup and connection load which causes the brownout issue your seeing.

It’s underqualified based on how you’re trying to use it as your graph clearly shows. It can work with proper coding though. You would be better off with a bigger panel, especially in less sunny parts of the year.

The Battery protection IC will prevent short circuits.

The PMIC can accept the solar input directly from the solar panel without any problems, the max input voltage is 12+ volts. My code optimizes the solar panel by not allowing it to operate too far below its maximum power point.


#138

@BDub @RWB This is an update.
I included a simple condition of not turning on the modem if the SOC is below 20%. The electron logs the SOC every about 7 minutes or so. When the SOC was finally OK, the data was uploaded and the SOC graph now looks like below:

I recorded a SOC of 1.1% at one point in time.
What this could indicate is that the Electron is happy going about its other duties (waking up, receiving data over serial and flushing to the SD card) at extremely low SOCs and things go haywire once you decide to turn the modem on.

Since the guys at particle failed to replicate this bug (is this true? up to now?), I will try to deliberately cause this bug to repeat and see (Anyone that has done repeatability tests with results?)


#139

What I do when the battery SOC hits 20% is do nothing but go back to deep sleep for 60 mins and then only wake up and check SOC again to see if it’s above 20% before running code normally agian.

If you only turn off the Modem when the SOC is below 20% but still do your other task then there is the possibility of the battery still draining to empty and causing the Electron to brownout and cause memory corruption.

From my testing, I could go at least 1+ weeks in this deep sleep mode waking up every hour before the battery would be empty which was more than enough time to recharge the battery via solar power.


#140

I agree 1W is not enough to keep the electron charged, I would recommend 10W unless you are sleeping a lot… then 2-3W might be enough to get you by. I guess it really depends on how much sun you have (sounds like tons) and how often you are waking and staying awake for.

The battery protection IC in the battery should not limit the default current rate at all. It should allow charging rates of up to 1C easily, and we only charge at 0.25C (512mA) by default.

If you see the first post, with data… we have tried but don’t have an exact repeatable test that we can perform over and over. There are many ways to mitigate this issue though as you are testing now.

Please do submit a minimal repeatable test case and collect the bounty :slight_smile:


#141

We were having a communications problem and decided to hookup 6 electrons to the same USB hub and power supply. Although all appeared to be charging, when someone went it 2 days later, 3 of them had dropped off the line.

All showed the blue led bug when first inspected. For 2 of them, once they were connected to a better power supply and the devices reset.

However 1 of them has not reset properly. I tried a different battery with no luck.
This one had no board connected at all. The only connections was the USB, battery and Antenna.

We are using the 7.0.0-rc6 firmware. We have ~12 particles in the field (With the same firmware) and all have had the battery run down on all of them (Multiple times). None have exhibited this behaviour before. NB. These all charge off the Vin port where as this one was via the USB. They also have a screen attached that runs off the 3V3 rail.


#142

Have the same symptoms, but I had my battery and USB cable plugged in non-stop. Don’t know if this is somehow related, as it’s possible that the USB power supply failed (power strip unplugged/turned off?) while I was gone.

Going to try to use my ST-Link to reflash the bootloader. Hopefully that will unbrick it.


#143

Two more data points. This time I had two 0.7.0 units in the field powered by a 12V 144AHr SLA, going through a Murata OKI-T/3-W32P-C DC/DC converter (4.5V output).

The battery deeply discharged after several days (resting voltage right now is 8.8V), and when I recovered the two Electrons one had a corrupted key[*] and the other shows the same dim D7. The incoming data shows that the device which stopped transmitting first was the one with the corrupted key, whereas the one with the (hopefully just) corrupted bootloader ran an extra 32 minutes.

The corrupted key was solved by using particle CLI’s key doctor. I haven’t gotten out my J-Link yet to see about reflashing the bootloader, so I don’t know for sure that it will come back to life.

So over a set of about 15 Electrons, I have now seen three corrupted keys and two dim D7 instances. I hadn’t linked the lost keys to the low power event till now, but it seems to me likely that the prior two key corruptions were because I ran the LiPo battery low and not because of some mishandling on my part.

Suggestions? I’m loving Particle and the Electron, but 5 failures which require sending the devices back to me for reflashing/re-keying isn’t something we can live with.

[*] I’m calling it corrupted because the device is unable to get back online until I reset the key certificate. Then everything works normally.


#144

Use the Battery Check code in the library below to prevent running the electron on low batteries.


#145

Very nice code package. I would love to see this level of debugging built into Particle’s main firmware. For most applications I imagine this is very worth the code footprint.

Unfortunately, in this case it won’t work with an upstream DC-DC, unless there were a voltage divider on the battery, but now that’s necessitating a more complex circuit in order to keep the Electron from being corrupted/damaged by a brownout. This is something which the hardware should be able to tolerate gracefully.

I’ll try your package for the dev units on my desk. These sometimes get unplugged from USB but are left plugged into battery, which is what led to the undervoltage scenario.


#146

If you connect a battery to your Electron then when your 5v input cuts out the battery will take over and the Battery Check code will work.


#147

That’s a good idea, but in our case that would mean deploying a LiPo to the field, which isn’t really an option (temp specs, increase in complexity, increase in flammability/explosiveness, air shipping issues, etc…)

I think the fix probably needs to be in firmware, or understanding how the Electron’s power-supply issues can cause errant writes so that this can be definitively avoided. Before deploying a bunch of these units to the field in mission critical applications, it’s important to understand our failure modes.

The STM32 is a pretty robust chip, and in years of using it for drones I can’t recall running into this corruption problem. I figure it’s either a Particle firmware problem (perhaps the bootloader?) or the power-supply has uncovered some hitherto unknown errata.


#148

I know fixes have been worked into the latest firmware to prevent this from happening. Not sure if this update is in the firmware your using or not.

I’ve never experienced this before but know preventing the low voltage input seems to prevent this from happening.


#149

Is this still considered an open issue? I have about 75+ Electrons and I have had two of them fail with the symptoms described in this thread. I was able to recover one with resetting the keys but the other one is dead. I am waiting for my ST-Link to arrive (I could not recover it using a Photon).

My electrons are all running 0.7.0 and they are not powered by a battery. They are powered by a 5V, 2A power supply. I am not using the lithium battery.

The devices are installed in a location where there is very bad cellular reception and the devices frequently attempt re-connection to the cloud.


#150

I’ve got 2 electrons that I let run down too far and now only enter DFU state (with D7 dimly lit). I’ve spent several weeks readings posts and experimenting, with no success. Is there any progress on this BUG?


#151

Me too. I have a Special Box in the back of a drawer for devices that have ended up in the same state.


#152

If I’m understanding the commit message correctly, there’s a chance that this bug was just addressed: https://github.com/particle-iot/firmware/pull/1578

Unfortunately, from looking at the changelog, the bug at Electron power-on connectivity issues with 0.8.0-rcX doesn’t seem to have been addressed, so many of us have to stay put with 0.7.0. With luck they’ll backport this fix, since the original bug was from some three years ago and would thus have affected all final releases.


#153

I hope you all know that devices with a dim D7 LED can be fixed with a JTAG programmer so they are not really dead, dead.

It is not a simple procedure but it is actually what a lot of the non-Particle world uses to program their devices.


#154

I am not worried about the restoration of the Electron itself. I am mainly concerned about the failure of the Electron in the field and having to send someone to swap the device out.

I have to note that the Device Key Helper library seems to have fixed the device key problem. I haven’t noticed any of my devices exhibit the blinking cyan behavior since I started using it.

The dim D7 LED (bootloader corruption) has only happened once so far.


#155

Louder for the people in the back!

From the pull request:

The idea being that a power glitch could result in the read of sector 0 write protection bits being misinterpreted as unprotected, which would unlock the Option Bytes register to change these bits to be protected. While writing to Option Bytes register, if power is lost or the MCU is reset, this can result in Read Protection level being set to 1. This is fairly well understood and easy to reproduce, so not attempting to write protect the bootloader on every boot is a great mitigation technique to avoiding RPD level 1.

Always be very wary of ‘un-journaled’ (irreversible) writes to flash, especially on boot. Something like this ended up with me spending a week on a project rescue mission in Phoenix, which I do not recommend.

(I’m sure it’s a fine place to live, but as a work trip it’s a pretty boring town)