@ericpietrowicz Sounds good. I appreciate the update! I certainly understand that as we all should enjoy time with friends and family over the holidays. I just wanted to make sure this continues to be investigated and hopefully resolved soon after the New Year.
I have a larger number of devices on solar panels and lipo battery. The user expects the device to be always on/always connected, our solar panels are not sized for this type of constant current draw. This worked quite well at the 9mA rate but won’t anymore.
The biggest question I have is do you think existing devices in the field that didn’t have any software updates made since last year are impacted by this as well? My use case is very seasonal (Feb - April) and most devices will be coming online in the next month. I am not sure if they will be impacted or not. If it’s cloud side/cellular tower side issue, I suspect they might be.
If it’s a device side issue and I leave the device at device OS 5.5.0 but deploy new application code, would it receive an update to the cellular mode. that would cause it to be impacted by this? I was planning on deploying new application code to the fleet but I’d hate to introduce this issue to the fleet for some minor most cosmetic updates.
I understand you may not know the answers to those yet, but as soon as you have any insights to share it would be greatly appreciated. You can share here or if you prefer, a DM is fine as well! Thanks!
As Eric mentioned - the engineers we need to look at this have not been able to over the break. We'll hopefully have feedback by next Friday during our triage call. (I'm also blocked form testing as I'm in ZA where there is no M1)
Unless you update DeviceOS when compiling your FW, nothing in terms of the modem/non-app FW behaviour would change. (Only when flashing FW compiled for a higher FW version will DeviceOS update).
Have you observed this behaviour on anything but newer DeviceOS versions? Is 5.5.0 still stable?
Yeah, I totally understand with the holiday break and what not... likewise I sure appreciate the update and responsiveness here.
To answer your question, I've tested a new out of the box device on both Device OS 5.5.0 and on Device OS 6.2.0 (I think I tested 6.1.1 as well). The device exhibited the same behavior on both Device OS versions. This seems to be related to the modem firmware somehow, as it happens even in sleep with network while the main MCU sleeping. The MCU also does not wake up when it happens. My hunch is the cellular modem firmware is different/updated with device OS so likely a new out of the box device gets a cellular modem firmware update along with any device OS update.
The knowledge gap I have now is if a device that was not behaving this way back in say March/April 2024, and that device never had a device OS update, does it exhibit the issue today. Does it exhibit the issue after an application software update? Does it exhibit the issue after a device OS update at the same device OS level, and finally does it exhibit the issue after a new device OS update (6.2.0). I'll try and round up an older device and see if I can do those tests.
On that note... is there a way to access what version of firmware is running within the cellular modem? Is anything like that published behind the scenes that we do not see in the console or does that concept even exist? Is the only firmware on the device as a whole, device OS? This complexity is abstracted away so I'm not sure if I'm even asking the question right?
I was just playing around with this a bit more. I'm viewing the consumption for a device on OS 5.0.0 and there seems to be a noteable improvement in frequency and duration of high current conditions. I found a note that OS 5.3.1 had a packet switch update to the modem. It might be worth trying an OS earlier than 5.3.1 to see what you discover.
Just a note - we don’t update modem FW with DeviceOS. All 404X devices have the same modem firmware. The update process is possible, but highly undesirable.
@ericpietrowicz, @no1089 Here are some additional test results at various device OS versions prior to OS 5.5.0. In short, I am able to replicate the behavior all the way back to device OS 4.0.0 and the issue is present for all device OS versions. In all scenarios, I've been able to call a particle function digitalRead() and it temporarily corrects it. In a smaller number of times, if I call a particle function when it's not occurring, it will induce it. will occur within 30 seconds of a cloud connect, sometimes it takes 15-20 minutes.
In some cases, I'm able to call a particle function when it's not happening and it induces it. This was at OS 6.2.1:
One other observation I'd like to share is during the periods of high current, there is some sort of cyclical pattern to it. It is about 62-64 hz (~12.5 repetitions in 195 ms). RF is black magic to me, but I wonder if we would see something on the RF side that matches this profile/frequency? Given the high current, the cyclical pattern, and able to impact it based on calling a particle function, do you think the radio is transmitting or receiving?
There is about 12 periods of the cyclical profile in this 195 ms snapshot:
@ericpietrowicz, @no1089. I do love this Nordic Power Profiler... I don't know how I went without one previously.
I was able to capture 8-9 hours or so. And in that 8-9 hours it transitioned in/out of the high current state roughly 11 times and was in the state about 50% of the time. It occurred anywhere from as small as 1 minute duration up to 23 minutes in duration withe the majority of them occurring for just over 23 minutes. If I recall, 23 minutes is the keep alive time for the Boron and BSOM right?
This is about 8 hours worth and you'll notice periods of higher current for extended periods of time. Notice, the extended periods of time seems to repeat:
My guess, is it's doing a keep alive message every 23 minutes and just like calling a particle function, any traffic on the cellular modem will induce a high current state if it previously was in a good state and vice verse if it was in a high current state it will put it in a low current state. Since this is the Keep Alive it further indicates it's the R510 cellular modem.
If there is anything you'd like me to try, I'm more than happy to try something.
Jeff, Did you get a chance to test an orphan device (a 1-device Particle Account) on an extended run since you got the Nordic Power Profiler, to rule out the Cloud issue ?
@Rftop, Yes, I forgot to mention. What you see above in my tests with the power profiler is an orphan device claimed to a dedicated account with only 1 product and this device is the only device in that product. The only caveat to that statement is the device at one point was in the product with the fleet of devices. It’s been isolated to its own account for roughly 2 weeks now. I would think any residual effects from being in that product are gone.
I am out of town until tomorrow evening. I did bring the power profiler and a Boron with me. I’m going to test again from a different location to see if I learn anything. Maybe it’s something funky with a specific cell tower or carrier? And then when I return on Saturday, I’ll test again with a brand new Boron out of the box.
Well.. I guess this depends on your perspective on goods news or bad news. We are at least learning. I was hoping this was a cloud side issue as that would be multiple orders of magnitude easier to fix (once the issue was identified that is) vs a cellular modem firmware issue that affecting countless devices. Deploying firmware to the cellular modem probably comes at a high risk of bricking the device as well. I'm hopeful the fine folks at Particle can get to the bottom of it and have something available via Device OS only. I recognize their team will need some time to investigate to see why this is happening and determine best course of action.
What's running through my mind is what can I do in the present scenario to mitigate the impact to the affected devices. I have an idea or two but not sure how well any of them will work.
@no1089 Any update you could share by chance from todays call with engineering? If you prefer, you can always send a DM as well. My use case is very seasonal (supporting the harvest of Maple Sap to produce maple syrup) which occurs every year primarily in Feb and March. The Particle devices in the field support remote monitoring for hundreds of wooded locations, many of which operate on solar panels only or internal battery only. My season is kicking off in the next few weeks and I need to start planning communication and contingency plans if this high current issues persists. Thus the sensitivity and why an update would be appreciated!
Alright... I appreciate the update as well as escalating this to the CTO. I'd greatly appreciate keeping me in the loop as this investigation progresses within the Particle team. Frequent communication would be great! If there is anything I can do to help from this side in the meantime just let me know.
For a short term band aide, is there any way I can detect when it's in this high current state via application code or some device OS command? Maybe there is some sort of AT command I can call to ask the cellular modem it's state? If it's in that state, I'd call another particle.publish() to try and kick it back to a lower current consumption state. I'd then just check it every minute or two. Yeah it might lead to higher data operations but I'll gladly sacrifice that tradeoff for a lot lower power consumption. Or more generically, if the team has any creative work arounds in the meantime, I'm all ears...