Photons dying with over-the-air updates

Hi.

We recently pushed out an update to our Photon-based devices and maybe like 10% have died. I just got back the first one (we’re having to replace them) and the only thing I can get is safe mode or DCA mode. I can get it to DCA mode and download the firmware by wire but cannot get it out of safe mode after that.

My first concern is that it happened at all. Seems doubtful it is a coincidence. At first I was guessing that the power supply was inadequate to burn new firmware – so it would work just fine until the PS was taxed – but the power supplies we use are rated 500mA which I believe is USB standard. Next guess is that if a customer has an iffy WiFi, that the update was interrupted and it corrupted the firmware. I’m guessing here. Ideas?

My second concern is getting the bad one(s) to work again. There is not “factory reset” on a Photon so how do I totally start from scratch??

Thanks!
Tahl

Hey @Tahl – that doesn’t sound good. I’m glad that you can put the devices into safe mode and DFU mode. 500mA should be enough for the Photon to receive an OTA update. I’d love to dig in more with you so that we can figure out what happened and how to prevent it in the future.

There are a ton of questions I’d love to ask, including:

  • What version of system firmware are your devices running?
  • What tool did you use to push out the update?
  • Are the devices integrated into a product, or were they bare development kits?
  • What changes were made in the firmware that you pushed out?
  • If you put the device into DFU mode and use the Particle CLI to run particle update and particle flash --usb tinker does the device resume normal behavior?
4 Likes

Hello @Tahl,

Sorry to hear of these problems. When you say the device dies, can you explain what you mean exactly? How does the device appear when it is powered on?

Will’s advice for restoring the devices is (of course!) spot on - using particle update followed by particle flash --usb tinker is close to a factory reset. But before attempting to restore these devices to working order, it would be useful to have a copy of the device memory to help us diagnose.

You can get the device memory by putting the device in DFU mode and then running this command:

dfu-util -d 2b04:d006 -a 0 -s 0x8000000:0x100000 -U dump.bin

Please perform this on these types of devices (if you have them to hand) - I’ve suggested some names for the files, so we can easily distinguish:

  1. a device that has not yet been upgraded (old-working.bin)
  2. a device that was successfully upgraded (new-working.bin)
  3. on two devices that failed to upgrade correctly

and then please send these files to me: mat (at) particle.io.

I will take a look at these asap and hopefully be able to return a quick post mortem that can explain what has happened.

Thanks,
mat.

5 Likes

@will … hi

  1. system firmware: 0.6.0 for the ones that died
  2. tool for push: particle console
  3. this product uses the Internet Button
  4. this release only added some additional stuff to the end of a published var which we read on a poll … that extra data is a another non-published variable and the WiFI RSSI value
  5. I’ll try tinker … I am successfully able to put the failed unit into DFU mode, do a “particle flash --usb foo.bin” but after running for a couple of second goes to set up mode (flashing blue) then a few seconds later goes to safe mode (magenta)

Also for the units still in the field and still connected (but dead), they are breathing cyan in the console and when I click on the item, the console says there are no functions or variables although they should normally be. Also, the IDE has the units in the field breathing magenta. (Why doesn’t the console show that??)

Lastly, when our devices are powered up, they do a webhook to our cloud to log that boot. We had a couple of units that successfully flashed but called the webhook over 40 times! That’s a problem because the user sees lights and sound for each boot. For all the units that did die, called the webhook was called between 40-70 times!

Tahl

@mdma,

Dies: from user’s perspective these devices are dead even after power cycles. From Particle console, they show breathing cyan, but says there are no variables or functions, which would not normally be the case. It stills says v6 > v7 (i.e., its waiting to OTA update) and if I lock and try force the update, the update doesn’t occur. The device seems to be booting over and over again. At least, it is calling the webhook that happens on a reboot. I have no end user to verify what it is actually doing in person.

If I look at these devices from the IDE (if available there), they are breathing magenta. If I try to flash from there, it says it’ going to safe mode.

I have one physically and the rest are still in the field with some still on line.

I read your message sequentially so I flashed the one I have phsycially with Tinker before I read the part about dumping memory. I’ll send you what I dumped anyway.

BUT as soon as I flashed Tinker, the console software automatically and successfully pushed a v7 update to the device! Note that that same device would go to safe mode (and would stay there) when I did “particle flash --usb foo.bin” but when the console pushed the same image it worked – after it was loaded with Tinker first.

In terms of dumps:

  1. All have been upgraded or attempted
  2. I’ll send you this
  3. I’ll send you this – but I only have one that’s been returned so far

So since the tinker flash helped the console push an update, any ideas of how to not have the other ones returned (or I have to make site visits!)? There are a few “not working” ones still on-line if I can get the console to push an update.

Then of course we need to figure how for this not to happen again.

Tahl

Hm, had you flashed them from the Console previously? I’m also curious about what system firmware version was on the devices, and if you’re definitely compiling the binary against that same version. If you ended up on 061 or something and then flashed that binary to devices with 060, then the devices would end up in safe mode.

1 Like

@Dick

Except for test devices, all the of the devices are flashed from the Console. I was thinking the console flash brought with it the appropriate system firmware to the target device.

In any case, we need to have some way to manage this. At any given time I will have all different generations of product in the field. I would say the console should refuse to attempt a flash on a device if it will break it. I have no problem with some devices in the field not getting the latest update if I know why (like it’s too old) if it won’t break the product. All our vars have version numbers so we can deal with different versions if need be. But I can’t have broken product in the field really. Some locations have to be physically visited for a replacement (i.e., no way to mail it in).

Any way, it is possible that the console checks the system firmware before attempting a flash?

For the record, the ones that died were some of the oldest and some of the newest devices…

Best,
Tahl

Depending on what type of port you're using, higher currents are allowed. Not to mention that wall chargers often provide more than that (tablet chargers being over 2A mostly)

The internet button has 11 neopixels on it, which can draw 60mA each, going well over the specs of the power supply (taking the Photon into consideration). Those combined might have the Photon drop out during network intensive periods.

Have you got some test devices locally you can play around with (using the same setup as in the field, obviously)? If so, try duplicating the issue if at all possible, which would make it easier to debug.

@Tahl, I haven’t read through all the posts in this thread yet, but hope that wasn’t asked already :blush:

Did your previous firmware use SYSTEM_THREAD(ENABLED) and any of System.sleep()/System.reset()/ApplicationWatchdog?

Due to some issue(s) (some already closed on newer system versions) an ongoing OTA update might get messed up during the process
e.g.
https://github.com/spark/firmware/issues/1166
https://github.com/spark/firmware/issues/1280 (already closed)

Also 500mA are enough as long there is no ext circuitry that demands for more.

Or on flash operations which are also power intensive. I believe this is the smoking gun - the unit will be considerably underpowered when all 11 neopiels are on, since the photon consumes 80-100mA average and 450mA peak, and the neopixels require 660mA - a total of 1110mA.

4 Likes

It sounds like the devices are in safe mode. The Console doesn't show devices in safe mode, unlike the IDE. I've added an issue to our internal tracker to address tis in the Console. Thank you for reporting it.

The webhook event - is it an event you have coded in your firmware? if that's the case then the devices are definitely running your application firmware. The continued reboots could be caused by the underpower situation previously mentioned.

2 Likes

@Moors7

I’ve been assuming that since the Internet Buttons do not come with power supplies, and there is not unusual power supply requirement stated, the it could be safely powered using a garden variety 500 mA wall wort. If the IB takes over an amp, you’d think that it would specify. I’m familiar with the Neopixels but was assume that with multiplexing, that the LEDs are never all on at the same time.

Does anybody know the “official” power requirement for the IB??

Thanks,
Tahl

Yeah, that really shouldn’t be it. We have a service that updates the system firmware to make sure this isn’t a problem, but I wanted to double check.

Ah, and for power consumption- Mat’s right and it’s up to about 1A if you turned them all on full white and the Photon was transmitting at the same time. We’ll add that to the datasheet.

@ScruffR

We do not use system.sleep, reset or watchdog. YET. The next version was going to implement a software watchdog. I’m beginning to wonder…

Tahl

Neopixels aren't multiplexed, each individual LED is running its own set color in parallel to all the others - so all on (255 white) is 60mA times pixel count.
You set their colors sequentially, but once set, the just do their power burning :wink:

Thanks for inputs.

@Dick, it would be nice to have an official power requirement for the IB. Sounds like it is 1.1A. From a firmware POV I can simply dim them. We are never fully bright, fully white though… Also, while you’re doing the data sheet, can you please do mechanical drawings?? We know the PCB is 2.6" but can’t find the dimensions for anything else; e.g., the curve of the dome.

@mdma, yes having accurate indicators in the console would be nice. Also, while you’re at it, it would be super if the “Last Connection” was actually the last connection! I.e., the last time we or you guys connected to the device. E.g., we’ve had devices that show Last Connection as two days ago and we’ve been talking to the device two minutes ago. Not useful.

All, I hope we’ve discovered THE problem. Upgrading the power supplies ain’t cheap including the ones in the field already.

In the meantime, any last hope for pushing an update via console to the devices still in the field sitting in safe mode??

Tahl

UPDATE

Guys, I just checked and the power supplies we have for those devices is 1500mA. So we are back to square one.

If I go get the devices from the field (ick!) I can give @mdma a dump but other than that, what can we do to diagnose the problem?? And as requested in the last post, anything we can do for the current devices in the field would be good!

(PS. I will be at Target Open House tonight if anyone but Will will be there…)

Tahl

@Tahl, do you have a spec for your power supplies, including the wire length to the Photon?

@peekay123,

They are 120v-240v 1500mA power supplies. Amazon sells the same one here: https://www.amazon.com/gp/product/B01B8R28O4/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1.

The customer has a choice of keeping the cable that comes with the Internet Button or we offer a 4.5 ft. cable. I don’t know the gauge, but it seems too hefty to me. We are actually looking for something lighter.

Tahl