Boron LTE Stuck Flashing Green

Dave,
Did you receive the email from earlier today?

My setup has not changed since I began this thread, and ordered the additional Borons to confirm it was never the hardware, and was your firmware/bootloader/modem firmware, or issues with you carriers, and the issues is with all the borons in the exact same places they had been working fairly stabling for about a month, with only the occasional disconnect, maybe once a week, but they would reconnect on their own when I let them be, but this is still, nowhere near where it was from a stability standpoint, prior to 1.2.+ firmwares. My power did not go out. I was asleep and had not touched the firmware or changed any code, since this issue finally mysteriously went away with zero explanation from you guys a few weeks ago, again with zero changes it went away on my end and was fine, but now it’s back again. Right around the time you guys put out 1.3.0RC1 is when these issues began again, even though these devices where all still running 1.2.1RC1. It wasn’t until they has been offline for 3 days that I tried rebooting them and upgraded the firmware and bootloaders and started testing with only tinker, again to make sure it wasn’t anything i wrote, and to make sure it could not be blamed on me for anything i changed on my end.

I also have now explained this in 3 places, counting here and the first two are private, one earlier today, which is where these kinds of discussions should probably be had, I would really appreciate it, if you would just email me, as you wont call me, and also please touch base with Matt or have him copy you in on my conversation with him, so you have all the data. I know this is the weekend and I very much appreciate you reaching out, but I am getting different conflicting instructions from both you and Matt in different places too, so I am trying understand what I can do to help you make this better faster, as I feel I have given you all the data I can. If you need me to do logging I can send you them, I just need to know which one of you wants to take the lead on this, so I am not having two conversations about the same thing with two different people simultaneously. Again I appreciate the effort, but who should I send them to or be dealing with here to get the fastest results??

My only idea at this point, is to look into the modem firmware for some bugs or talk to your sim carriers providers, because this issue seems to primarily be impacting the LTE devices only right now. If you would like I can contact ATT, as I believe that is your sim provider on the builtin sim to inquire about outages, but at this point, I’m am having a really hard time believing this would last this long.

Help me to help you :grin:

Hi @sdevo619

I provided more information on your concerns in private. In short, There are currently no known causative links to device OS and the issues you are seeing.

A general idea of impacted use cases is known. Exact circumstances leading to bad behavior is still unknown, but is currently being investigated. There is no blame being cast for the issues here. We acknowledge that issues exist and wish to only understand them.

Both Dave and I are happy to assist you moving forward. However, for sake of simplicity, feel free to consider me the primary contact on this matter moving forward. :slight_smile:

As a follow up to Dave’s input: Dave’s instructions are certainly useful. It is good to understand any changes that may have occurred along the way. As your setup has not changed much, I believe we may not need to collect much in terms of additional information at this time–at least for the Borons.

From our direct message conversation, I understand there are some concerns about the Electron as well. The issues observed here and what is known at this point does not suggest they are impacting the Electron. As such, Dave’s suggestions are recommended–as we should be considering this a different issue with no pre-existing context.

To my understanding of what is known at this time, the hardware is currently undergoing a deep dive by our engineering team. As hard as it is, I’m afraid the best option here is to wait patiently for updates. As we receive updates with actionable test points, we can test them as soon as possible to confirm progress has been made.

1 Like

I just got one of boron up and running on the bench. Had a few days on the 9.0 firmware running fine. Just upgraded it to 1.2.1 will see how it works. Where it is setting I have very poor signal so hopefully it might help trigger the issues.

@darkstar2002 What code are you actually using to confirm it is running fine.?? are you getting any drop outs 5 min or longer? how are you recording that it is actually connected 24/7?

Hi ALL,

Also experiencing issues with particles losing connection and not automatically reconnecting to the cloud. We had been running 0.9 on the device successfully for a few weeks and then this issue started. We upgraded to OS 1.2.1 and now after 2-3 weeks we now see the same issue.

We are using reasonably simple code on the particle and have now connection based code, we simply loop through our applications code until a an rfid tag has been read, then we call a particle publish. We are also a startup company and about to initiate a pilot programme, unfortunately we are currently entirely reliant on these Borons. Our application is fortunate in that we can tolerate large periods of no connection, but we will get nowhere if the Boron’s are not capable of reconnecting.

I have a few questions for the particle engineers:

Have these issues been reproduced in problem solving exercises?
Does a complete power disconnect and reconnect resolve the issue?
I’d love to see a basic project plan for the problem solving road map for these issues, has a root cause been identified, at what stage are the corrective and preventative actions?
Essentially I would just like some more information on this issue.

Again when our Boron’s work as intended they are great, but our entire startup is based on this device and the resolution of this issue.

1 Like

Hi @StngBo,

All great questions.

There is still ongoing investigations into the issue. The cause of flashing green itself is a rather broad error (quite simply, no cellular connectivity) and as a result, can have several causes.

We are still investigating and working on reproducing all of the root causes in order to address them. Being able to reproduce each scenario and understand how they interact with each other will take some time.

With that in mind, progress is being made. We will be offering a more formal update this coming week with a Device OS release candidate to test. In the near term, I can offer you some insights as to where we stand:

For devices that are able to come online at first, but eventually degrade into blinking green, we have filed and merged pull request #1862 . This request tackles a memory leak in the muxer for when a device establishes connectivity. For a poor connection area, a device may re-establish several times, allocating (but never de-allocating memory).

Devices with the above symptom are able to be recovered through reset or turning the device off and on on current device OS.

We are hoping after our release of the next device OS release candidate, to gather feedback and continue investigating the other types of symptoms we are seeing.At present, other ongoing investigations we are looking into are devices that turn on in blinking green but never actually connect and misconfigured I2C slave devices causing device OS stalling.

In short, progress is being made. We will have some actionable testing coming soon. :slight_smile:

2 Likes

Thank you mstanley for this update. That memory leak bug is extremely motivating. It is spectacular to think that that might - as in, could possibly - be the cause for why my Borons repeatedly entered a no-reconnect stage when left at my test site, and that therefore its resolution would allow me to go back to using Particle Boron there.

I have 4 useful things to say to you:

  1. Your efforts on this front are important and much appreciated. In the meantime I had to go back and use pycom Gpy with an always-on brute-force catch-all restart structure (e.g., single connection loss or exception, restart the whole board and reobtain cell connection). This requires tons of power and tons of data and is wasteful, but it does stay up 100% of the time unlike Particle Boron so I’m currently using it. It wastefully uses 20mb a DAY and I’m paying $21/month for a 1GB nimbelink plan. You better believe, as soon as the Boron reconnectivity gets fixed, I will throw the GPy in the trash and go back to Boron! It’s only actually uploading 190kb per day (132 bytes every min).

  2. I have a second site with good cell signal which rarely drops out, unlike other site. Boron (1.2.1-rc3) has been up rock-solid since 7/14 and still uploading right now.

  3. Is it possible to have some sort of low-level watchdog feature that would accomplish the equivalent of a power cycle simply by detecting the infinite green flashing cannot reconnect condition? This would solve the issue for me, even if the root cause is not fixed. I can tolerate 99%, even 95% up-time. What I can’t tolerate is the never-reconnect without physical power cycle. I think many others’ applications are the same.

  4. Here is a specific test result which is helpful. I have one certain Boron that I took to the test location. Connects to cloud perfectly at home, but at the test location, it connects to cell, but then fails to connect to Particle cloud (flashed Orange three times in the middle of Cyan flashing sequence, and infinitely repeats Cyan sequence). So this is different from the primary issue with other my Borons, which is the primary topic here, (i.e., connects, stays connected, then loses the connection at some point and remains permanently in flashing green). This is rather a different but related issue where a certain Particle Boron was permanently inoperable at a certain location, not because it couldn’t connect to cell, but specifically because once connected only the Particle Cloud connection (different) consistently failed. I have no idea why that is or why it works at home but not at test site. I.e., stuck on flashing Cyan instead of flashing Green. Just wanted to send this data point along.

@mstanley, @Dave,

Just read through this thread, and I’d like to know if the Electron LTE is affected by this issue. Hard to determine with all the back and forth. I’ve got 4 Electron LTE’s in testing, with none of them showing this issue.

Hi Heath.

The muxer issue in #1862 is an issue that is known to impact both Boron 2G/3G but is not something that impacts E Series devices

For LTE devices you should be running on no earlier than device OS v1.2.1 for all the latest LTE fixes. That I am aware, no known LTE issues exist past v1.2.1 at this time.

@mstanley, thanks for the update. Just for clarity, we are testing and planning to deploy the Electron LTE, not E-Series LTE.

Hi Heath.

My apologies on the mispeak there. I had intended to speak of the Electron LTE, specifically. The Electron LTE and E Series LTE are very similar in nature. No Gen 2 (of which, the Electron LTE is based on) will be impacted by this issue, so there’s no worries to be had. It still stands that no known issues exist for the Electron LTE or any other gen 2 devices past v1.2.1 at this time.

Thanks for the clarification!

1 Like

I can attest coming across the same issue recently: disconnect in case of weak signal and never connect again, staying in the green light blinking mode. Running OS 1.1.0 for a few months. I also observed a little different behavior which I would like to draw attention to, I described in another post: Boron thinks it's online, console disagrees.

I’m about to place an order on 100 Borons for our pilot run of connected devices and it’s important to know we can proceed with the launch confidently as the product’s successful reception depends on it.

I would look at this issue: Boron LTE Fails to Connect before purchasing any more Boron’s.

In our test of 20 Borons in the field, 3 have failed this way. We’d like to put another 40-60 in the field, but I think we will hold on that until Particle gets to root cause on these issues.

For some context, our carrier board can sit either a Boron or an Electron – we have not had an Electron failure and have similar numbers of Electrons in the field.

Hi Paul,

The connectivity symptom you are describing is inline with an understood issue we have identified for all Boron devices. A release candidate will be out this upcoming week in order to address the issue. I certainly encourage you to update to it as soon as it’s out to let us know if it helps alleviate the connectivity issues you’re seeing here.

The memory leak issue identified on the Boron is specific to the Boron. It would make sense that your Electrons are not exhibiting similar behavior. Our upcoming v1.3.1 release candidate should alleviate this issue for your devices as well.

2 Likes

Thanks, Matthew. Will be looking out for the upcoming release.

@mstanley,

If this is a memory leak issue, wouldn’t a full power cycle resolve it?

My devices cannot connect even after a full power cycle. I’ve both power cycled in person AND my carrier board has a hardware watchdog that power cycles if no communication.

Hi Heath,

I would expect a power cycle to fix a memory leak issue, yes.

However, we are aware of more than one issue that can lead to blinking green. The case of where a device boots up fine and goes into a blinking green state is believed to be a memory leak. The state where it initializes in blinking green is known to have multiple causes and only some are understood. Some fixes to mitigate this are also in device OS v1.3.1-rc but may not mitigate all cases. It would be helpful to upgrade to v1.3.1-rc and see if the issues are still exhibited to understand how this bad state interacts with the latest bugfixes.

Do you mean “I would expect a power cycle to fix a memory leak issue, yes.” ?

Aha! So I did! Transposing some phrases there. :stuck_out_tongue: Fixed!