Bug Bounty: Electron not booting after battery discharges completely

Have the same symptoms, but I had my battery and USB cable plugged in non-stop. Don’t know if this is somehow related, as it’s possible that the USB power supply failed (power strip unplugged/turned off?) while I was gone.

Going to try to use my ST-Link to reflash the bootloader. Hopefully that will unbrick it.

Two more data points. This time I had two 0.7.0 units in the field powered by a 12V 144AHr SLA, going through a Murata OKI-T/3-W32P-C DC/DC converter (4.5V output).

The battery deeply discharged after several days (resting voltage right now is 8.8V), and when I recovered the two Electrons one had a corrupted key[*] and the other shows the same dim D7. The incoming data shows that the device which stopped transmitting first was the one with the corrupted key, whereas the one with the (hopefully just) corrupted bootloader ran an extra 32 minutes.

The corrupted key was solved by using particle CLI’s key doctor. I haven’t gotten out my J-Link yet to see about reflashing the bootloader, so I don’t know for sure that it will come back to life.

So over a set of about 15 Electrons, I have now seen three corrupted keys and two dim D7 instances. I hadn’t linked the lost keys to the low power event till now, but it seems to me likely that the prior two key corruptions were because I ran the LiPo battery low and not because of some mishandling on my part.

Suggestions? I’m loving Particle and the Electron, but 5 failures which require sending the devices back to me for reflashing/re-keying isn’t something we can live with.

[*] I’m calling it corrupted because the device is unable to get back online until I reset the key certificate. Then everything works normally.

1 Like

Use the Battery Check code in the library below to prevent running the electron on low batteries.

Very nice code package. I would love to see this level of debugging built into Particle’s main firmware. For most applications I imagine this is very worth the code footprint.

Unfortunately, in this case it won’t work with an upstream DC-DC, unless there were a voltage divider on the battery, but now that’s necessitating a more complex circuit in order to keep the Electron from being corrupted/damaged by a brownout. This is something which the hardware should be able to tolerate gracefully.

I’ll try your package for the dev units on my desk. These sometimes get unplugged from USB but are left plugged into battery, which is what led to the undervoltage scenario.

If you connect a battery to your Electron then when your 5v input cuts out the battery will take over and the Battery Check code will work.

That's a good idea, but in our case that would mean deploying a LiPo to the field, which isn't really an option (temp specs, increase in complexity, increase in flammability/explosiveness, air shipping issues, etc...)

I think the fix probably needs to be in firmware, or understanding how the Electron's power-supply issues can cause errant writes so that this can be definitively avoided. Before deploying a bunch of these units to the field in mission critical applications, it's important to understand our failure modes.

The STM32 is a pretty robust chip, and in years of using it for drones I can't recall running into this corruption problem. I figure it's either a Particle firmware problem (perhaps the bootloader?) or the power-supply has uncovered some hitherto unknown errata.

I know fixes have been worked into the latest firmware to prevent this from happening. Not sure if this update is in the firmware your using or not.

I’ve never experienced this before but know preventing the low voltage input seems to prevent this from happening.

Is this still considered an open issue? I have about 75+ Electrons and I have had two of them fail with the symptoms described in this thread. I was able to recover one with resetting the keys but the other one is dead. I am waiting for my ST-Link to arrive (I could not recover it using a Photon).

My electrons are all running 0.7.0 and they are not powered by a battery. They are powered by a 5V, 2A power supply. I am not using the lithium battery.

The devices are installed in a location where there is very bad cellular reception and the devices frequently attempt re-connection to the cloud.

I’ve got 2 electrons that I let run down too far and now only enter DFU state (with D7 dimly lit). I’ve spent several weeks readings posts and experimenting, with no success. Is there any progress on this BUG?

Me too. I have a Special Box in the back of a drawer for devices that have ended up in the same state.

If I’m understanding the commit message correctly, there’s a chance that this bug was just addressed: https://github.com/particle-iot/firmware/pull/1578

Unfortunately, from looking at the changelog, the bug at Electron power-on connectivity issues with 0.8.0-rcX doesn’t seem to have been addressed, so many of us have to stay put with 0.7.0. With luck they’ll backport this fix, since the original bug was from some three years ago and would thus have affected all final releases.

1 Like

I hope you all know that devices with a dim D7 LED can be fixed with a JTAG programmer so they are not really dead, dead.

It is not a simple procedure but it is actually what a lot of the non-Particle world uses to program their devices.

I am not worried about the restoration of the Electron itself. I am mainly concerned about the failure of the Electron in the field and having to send someone to swap the device out.

I have to note that the Device Key Helper library seems to have fixed the device key problem. I haven’t noticed any of my devices exhibit the blinking cyan behavior since I started using it.

The dim D7 LED (bootloader corruption) has only happened once so far.

Louder for the people in the back!

From the pull request:

The idea being that a power glitch could result in the read of sector 0 write protection bits being misinterpreted as unprotected, which would unlock the Option Bytes register to change these bits to be protected. While writing to Option Bytes register, if power is lost or the MCU is reset, this can result in Read Protection level being set to 1. This is fairly well understood and easy to reproduce, so not attempting to write protect the bootloader on every boot is a great mitigation technique to avoiding RPD level 1.

Always be very wary of 'un-journaled' (irreversible) writes to flash, especially on boot. Something like this ended up with me spending a week on a project rescue mission in Phoenix, which I do not recommend.

(I'm sure it's a fine place to live, but as a work trip it's a pretty boring town)

1 Like

I have tried 0.8.0-rc.11 and can make the device not boot reasonably reliably.

process is.

No battery connected.
Powered via USB only.
Flash new Device-OS and then your application. (I have a batch file that does them all one after the other)

As soon as the partIcle-cli says “Flash success!” IMMEDIATELY disconnect the USB. thereby removing the power.

When you try and power it back up, it won’t boot.

What exactly does that mean?

And

have you tried with battery?
When powering via USB only, there are extra demands to the power supply to allow for high current, high speed current demand which not a lot of USB wall warts fullfil (and no USB port of any common computer).

When you reconnect the USB to provide a PSU, after having flashed the device with the new Device-OS and application, it no longer boots (AKA it’s bricked and needs to be wiped using the ST-LINK)

I only reflashed the Device-OS and my application via the “particle flash --usb” so there was no need to have the battery connected, I did not connect to the cellular network at all.

Are you doing this right after the first of 4 "Flash success!" replies (3 system + 1 user module, is what I'm presuming your batch file is doing), or after the 4th?

If you do this after the first, you are likely corrupting the DFU process midway through on the second system module. This will leave your device in a state where it will not boot the system firmware and might SOS hard fault as well. But you should be able to get back in to DFU mode. Try removing power for a bit and restoring, while holding the MODE button.

If that's not what you experienced, please add a bit more detail about what you are doing and I'll try to reproduce it. Thanks @marshall !

Hi BDub

I’m doing it after the final (4 of 4).

I was loading code into 30units and 6 (20%)of them never rebooted, I’m pretty sure that these ones I disconnected very quickly after the “Flash Success” message, The LED may not have had a chance to even flash White, I can retest it but I have to make up a lead for my ST-LINK first. I don’t think that it matters what the application code actually is, It’s just the time delay after it has finished flashing and before you remove the power.

Here is my batch file

::extract the filename for the binary from the current directory path.
for %%* in (.) do (
	:: @echo =%%~n*
	set filename=%%~n*
)


echo project name = %filename%

set firmware=0.8.0-rc.11
set binary_extn=bin
set binary_filename=%filename%.%binary_extn%
echo binary_filename = %binary_filename%
call upgrade_fw.bat %firmware%
particle flash --usb %binary_filename%

and this is the upgrade_fw.bat

SET firmware=%1

particle flash --usb c:\system_firmware\v%firmware%\system-part1-%firmware%-electron.bin
particle flash --usb c:\system_firmware\v%firmware%\system-part2-%firmware%-electron.bin
particle flash --usb c:\system_firmware\v%firmware%\system-part3-%firmware%-electron.bin

Thanks for the extra details @marshall I’ve been trying now for 20 tries of a similar script on Mac that flashes 0.8.0-rc.11 and then tinker and I’ve not had any problem. I’ve also just tried a bunch of booting and yanking power before and after the white LED boots to simulate that last stage of the flashing script. Basically what happens there is the last particle update command causes a soft reset, and then you are yanking power just before the LED turns on, or right when it turns on. I’ve been plugging the USB in and unplugging it at different times (just before the white LED and just after). Trying to vary it a bunch and it continues to boot over and over. I don’t doubt that it happened to you 6 times out of 30 like you said, but I haven’t had the same results here. What type of electrons are these (G530, U260, etc…), how long is your USB cable? Any chance you can recreate this with one of the good ones when you are trying to do it? (I wonder if some slight residual power on the caps when I’m trying over and over keeps them from having the issue, vs. when you did it maybe you left them powered off for a long time before you tried to plug them back in again and then finally noticed?) So if you are actively trying repeatedly without waiting more than 10 seconds in between like I was, maybe you never see it happen? I’d like to set this up on an automated rig where it can do this test 1000’s of times and vary the timing between “Flash Success!” and disconnecting the USB power.