Boron LTE Stuck Flashing Green

The Boron disconnects from from everything, including dropping from the cell tower.

Hey folks – wanted to jump back in to let you know that the Particle team is watching this thread closely and is actively working to triage these symptoms into multiple root cause issues with the help of AT&T and u-blox.

In the meantime, if you are experiencing this issue we'd love to have your help capturing cloud debug logs that we will use to categorize and resolve these issues. You can find instructions for doing so here:

We will be a little slower to respond than usual due to the July 4 holiday, but want to reassure everyone that this is a top priority for our team. We're committed to continuing to improve the reliability of LTE M1 connectivity and look forward to squashing some nasty :bug:s with your help over the upcoming weeks.

PS - @ric_hard I’m really appreciate of the effort you’ve gone through already to add detail to what’s going on with your device. Super impressed by the u-blox updater that you made!

It looks like the device is using a 3rd party SIM – is that correct?

1 Like

Correct. Using a 3rd party sim TELSTRA.

@ric_hard I love the modem update jig! Can you share some details on how you made it? Is the board CNC routed? What did you use for the pins?

Hey mate, in order not to hijack or steer this post in the wrong direction, i’ll post all the details on this post Ublox modem firmware update that way this one can stay on track with the bigger connectivity LTE issues.

1 Like

@mstanley @will @Paul_M
Let me just start with, I have evidence to back this theory up :slight_smile:

So yesterday I had to drive 500km away (approx 7hrs). I decided to take one of my Boron’s with me and left one at home.
Home Boron running 1.3.0 powered by 12v.
Vehicle Boron running 1.1.1 powered by about 7v on battery.
Both are running a very similar version of code except for a few minor changes (only about what they post and when). Both are running updated Ublox firmware.

I left home about 12:30pm yesterday. (lunch time)
The Home Boron was junk - dropping its connection randomly however there were period for hours where it wouldn’t drop. A total of 57 dropouts from when I left to when I got home.

Vehicle Boron had a dropout or two before I left. On the 7hr drive it dropped out a few times, I think about 12 or so times. There was not always reception and I was travelling 100kph and it was in the boot, so I’m not too worried about that. I got into town about 7:30pm and arrived at the motel about 9pm. The Boron did not drop out once from 7:30pm to 10:30am (15 hrs) which for me is unheard of. At 10:30am I was on the road and dropping in and out of reception. On the 7 hour drive the vehicle boron dropped out about 11 times. Pretty consistent with the trip up.

Here is the weird part. I arrived home at 4:30pm to where both Boron’s are within range of each other.

The vehicle Boron is now dropping out randomly up to 35 dropouts, whilst the home Boron has been solid for the last 5.5hours still at 57 dropouts.

So… is there a possibility that the mesh is causing issues with the Boron’s? I generally have 2 Boron’s testing at most times… In my code I have called Mesh.off(); in the setup.

Sorry its long winded. I’m going to post this now and turn off the home Boron as my Final test and will update the results in the morning. In theory the vehicle Boron which is outside (but still in mesh range) should become good again… :-/

UPDATE :frowning: I guess my theory is a bust. It dropped 7 times over night.
I will try and take it to another location today and leave it there. I am now starting to wonder if it is tower related.?

2 Likes

Hey folks,

I have some Electrons experiencing a similar issue after upgrade to 1.1.0: Frequent Cloud Disconnects, then flashing green for extended periods of time, eventually resetting after something like 1-3 hours (our firmware resets itself after 2 hours offline). Occasionally one will flash cyan for a moment, then revert to flashing green.

Our Firmware: SEMI-AUTOMATIC mode, SYSTEM_THREAD enabled.
Reconnect: every 20 minutes while connected (for possible sleep mode)
Reconnect: every 6 minutes while offline (!Particle.connected())
Modem-Reset: every hour, while offline
System-Reset: every two hours, while offline.
A system reset also resets the timer for these events.

We have the same code running on 0.6.4 with zero issues. Somewhere between 0.6.4 and 1.1.0 the connection was crippled. We’re tracking some unrelated issues with our firmware, just wanted to add that this doesn’t seem limited to LTE, seeing this with 3g as well.

2 Likes

I will be firing up one of our boxed borons and will try I do some logging here in a few days. I want to say I think this is not isolated to the cell boards. I have several photons at home and work on different firmware versions and I can say that the newer the firmware the worse the cloud connection is. 2 of the photons are within 15ft of the AP with very strong signal readings. This might be a combination of device os and hardware issues for wifi and cell.

1 Like

Interesting. I’m now having a reconnection problem on my main development Boron LTE sitting on my desk. It has a high gain external antenna attached, and has been powered up and connected for about two weeks, including periodic resets. I pulled the USB connection to test battery life (2000mAH LiPo) and let ran it until the battery drained. It was offline for a day or so before I reconnected power. Now it will not connect with or without the battery attached. I’m using a Hologram SIM. I have tried firmware 1.2.1-rc.3 and 1.3.0-rc.1. Same behavior on both.

Behavior I’m seeing:

  • At startup AT commands may complete as expected, or they may take 30+ seconds (e.g. AT+CGDCONT=2,“IP”,“hologram”
  • The modem does not connect, and after several minutes my code resets the Boron to try again
  • I now see two items in the serial log. “ERROR: Failed to power off modem” and “ERROR: No response from NCP”. This occurs even after a full power cycle.

The location, antenna, circuitry, etc. did not change from the time it connected successfully to now.

I’m wondering if the modem has blacklisted my local towers for some reason. I don’t recall the command to clear the blacklist, but will look into it. Should I clear that on startup? I understand that will result in longer initial connection times, but if it will connect more reliably then it is worth it.

Any other suggestions?

EDIT: I just realized I reflashed 1.2.1-rc.3 instead of 1.3.0-rc.1. I’ll test with that version and update again.

EDIT 2: Same behavior with 1.3.0-rc.1.

EDIT 3: Here is the log from the Boron

...Reconnected to /dev/cu.usbmodem142301 ...
0000005491 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005492 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005492 [app] INFO: Reset Reason: 0x8c
0000005493 [app] INFO: Free Memory 61416 bytes
0000005494 [app] INFO: Initializing Display
0000005647 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011902 [hal] ERROR: Failed to power off modem
0000032903 [hal] ERROR: No response from NCP
0000103903 [app] INFO: >> AT+URAT=7
0000163903 [app] INFO: >> AT+UMNOPROF=0
0000223903 [app] INFO: >> AT+CFUN=1
0000283904 [app] INFO: Connecting to network...
0000386948 [gsm0710muxer] ERROR: The other end has not replied to keep alives (TESTs) 5 times, considering muxed connection dead
0000398599 [hal] ERROR: Failed to power off modem
0000419699 [hal] ERROR: No response from NCP
0000523904 [app] INFO: No cellular connection ...connection lost to /dev/cu.usbmodem142301 ...
...Reconnected to /dev/cu.usbmodem142301 ...
0000005487 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005488 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005489 [app] INFO: Reset Reason: 0x8c
0000005490 [app] INFO: Free Memory 61480 bytes
0000005490 [app] INFO: Initializing Display
0000005644 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011898 [hal] ERROR: Failed to power off modem
0000032899 [hal] ERROR: No response from NCP
0000103899 [app] INFO: >> AT+URAT=7
0000163899 [app] INFO: >> AT+UMNOPROF=0
0000223900 [app] INFO: >> AT+CFUN=1
0000283900 [app] INFO: Connecting to network...
0000396939 [gsm0710muxer] ERROR: The other end has not replied to keep alives (TESTs) 5 times, considering muxed connection dead
0000408590 [hal] ERROR: Failed to power off modem
0000429691 [hal] ERROR: No response from NCP
0000523900 [app] INFO: No cellular connection af...connection lost to /dev/cu.usbmodem142301 ...
...Reconnected to /dev/cu.usbmodem142301 ...
0000005487 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005488 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005489 [app] INFO: Reset Reason: 0x8c
0000005490 [app] INFO: Free Memory 61480 bytes
0000005490 [app] INFO: Initializing Display
0000005644 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011899 [hal] ERROR: Failed to power off modem
0000032900 [hal] ERROR: No response from NCP
0000103900 [app] INFO: >> AT+URAT=7
1 Like

Just wanted to thank everyone who is chiming in here!!! It has been really helpful to get you guys behind this thread and get it the attention it deserves. I am in contact with the Engineering Team at Particle, and for a couple weeks my Borons began to play nice. I wish I could say that was the end to this story.

The guys at Particle did send me a message on here which I must have missed with how busy this thread got, I was getting so many messages from this thread, it was hard to keep track of them in email. They offered to step up here with a refund, and I just replied this morning, as of course, in the last 72 hrs, these issues began to reappear for me overnight, while I was asleep.

I woke up to a bunch of green flashing lights. My initial suspicion this time, was a cell tower issue or the Particle backend was down, as ATT did have some reported outages in here in San Diego, but after waiting, and waiting some more, and then updating to from 1.2.1RC1 to 1.2.1GM, and then onto 1.3.0RC1. I can confirm that this issue has not been fixed in either one, and its been 3days, which is way too long to have these be dropping connection for in my book.

The issue does not appear to be exactly the same as it was originally for me though. Originally, my devices were stuck green and would never connect to the cloud. Now they will connect, but for very short periods of time, and then back to flashing green. I was barely able to get flashes to them before they disconnected again. As this issue is still occurring with a new bootloaders and TINKER installed, I know it is not related to my code. Just to note, as I saw someone list it above, I am using Semi Automatic and the System Thread is Enabled. Turning off Semi Automatic did not make a difference for me, just FYI.

While I hope the team at Particle, will take me up on my offer to work for them, even if they don’t, this is the sad truth to this story. If the team at Particle, cannot get the huge stability issues, fixed with their Borons, and their entire backend up to snuff, and I mean like RIGHT NOW, I will have to scrap their controllers and go another route, which will not only be a mess, but may also sink this product completely. I have been very patient, and continue to be, but at this point, I don’t care who you need to hire or fire, but this need to be fixed, NOW.

I am not going to expose the company I am putting my blood sweet and tears into, legally, for selling unreliable or defective devices, because I was shipped bad hardware or bad firmware updates, to good working hardware, as the same exact Boron hardware that started this thread, used to be rock solid in the locations I am having issues in now.

So, Mr. CEO, if you are keeping an eye on this thread, I don’t care, if it means that NFC and Bluetooth support, have to stay in beta a whole month longer, you absolutely, need to dedicate every single engineer you can possibly muster, to fixing these issues, as the quite literally, the fate of multiple companies is now in your hands, as it has been made abundantly clear in this thread, by not just me, but others too!!!

Please don’t take that for granted!!! If you want to market your business as helping creators, get products to market, and want people to sign contracts for priority service, then at least show us, that you can tackle the small stuff, like making your own stuff work right!!!

Thank you again, to everyone for jumping on this, to help confirm I’m not alone here!!! Best of luck with your own products and projects!!!
-Spencer

1 Like

Hi Spencer,

It sounds like you’ve got multiple threads going with the folks at Particle (myself included). Lets focus on digging into the root cause of what issues you’re seeing in those threads if that’s okay. What’s helpful in the forums are specific symptoms, device logs, code examples, schematics, and so on. I know in this case you mentioned you’re running tinker, and some system firmware versions which is helpful, thanks!

What signal strength are you seeing on your devices? How are they powered? Are they in a breadboard, or foam, or in a custom circuit board? Which antenna are you using? How is it mounted? When you mentioned waiting, what did you try? Did you reset or unplug / replug your devices?

It’s worth mentioning here that while we’re happy to help dig in, there are just so so so many variables that we need to understand in your case to help debug. Your device being offline for some amount of time could be your power supply, usb cable, battery, mounting, positioning, firmware, antenna selection, pcb, workstation, local carrier, local interference, carrier to internet, etc, etc. Your device could be operating outside the environmental or power conditions in the datasheet, etc, etc.

We have an awesome support team, and an awesome engineering team that try to make the best devices and firmware and experience, and we do keep an eye on threads where folks say things aren’t working. The more specific actionable information we can gather about something like this, the better we can understand the root cause and if there is something we can fix / change to make your experience better.

Thanks,
David

4 Likes

Dave,
Did you receive the email from earlier today?

My setup has not changed since I began this thread, and ordered the additional Borons to confirm it was never the hardware, and was your firmware/bootloader/modem firmware, or issues with you carriers, and the issues is with all the borons in the exact same places they had been working fairly stabling for about a month, with only the occasional disconnect, maybe once a week, but they would reconnect on their own when I let them be, but this is still, nowhere near where it was from a stability standpoint, prior to 1.2.+ firmwares. My power did not go out. I was asleep and had not touched the firmware or changed any code, since this issue finally mysteriously went away with zero explanation from you guys a few weeks ago, again with zero changes it went away on my end and was fine, but now it’s back again. Right around the time you guys put out 1.3.0RC1 is when these issues began again, even though these devices where all still running 1.2.1RC1. It wasn’t until they has been offline for 3 days that I tried rebooting them and upgraded the firmware and bootloaders and started testing with only tinker, again to make sure it wasn’t anything i wrote, and to make sure it could not be blamed on me for anything i changed on my end.

I also have now explained this in 3 places, counting here and the first two are private, one earlier today, which is where these kinds of discussions should probably be had, I would really appreciate it, if you would just email me, as you wont call me, and also please touch base with Matt or have him copy you in on my conversation with him, so you have all the data. I know this is the weekend and I very much appreciate you reaching out, but I am getting different conflicting instructions from both you and Matt in different places too, so I am trying understand what I can do to help you make this better faster, as I feel I have given you all the data I can. If you need me to do logging I can send you them, I just need to know which one of you wants to take the lead on this, so I am not having two conversations about the same thing with two different people simultaneously. Again I appreciate the effort, but who should I send them to or be dealing with here to get the fastest results??

My only idea at this point, is to look into the modem firmware for some bugs or talk to your sim carriers providers, because this issue seems to primarily be impacting the LTE devices only right now. If you would like I can contact ATT, as I believe that is your sim provider on the builtin sim to inquire about outages, but at this point, I’m am having a really hard time believing this would last this long.

Help me to help you :grin:

Hi @sdevo619

I provided more information on your concerns in private. In short, There are currently no known causative links to device OS and the issues you are seeing.

A general idea of impacted use cases is known. Exact circumstances leading to bad behavior is still unknown, but is currently being investigated. There is no blame being cast for the issues here. We acknowledge that issues exist and wish to only understand them.

Both Dave and I are happy to assist you moving forward. However, for sake of simplicity, feel free to consider me the primary contact on this matter moving forward. :slight_smile:

As a follow up to Dave's input: Dave's instructions are certainly useful. It is good to understand any changes that may have occurred along the way. As your setup has not changed much, I believe we may not need to collect much in terms of additional information at this time--at least for the Borons.

From our direct message conversation, I understand there are some concerns about the Electron as well. The issues observed here and what is known at this point does not suggest they are impacting the Electron. As such, Dave's suggestions are recommended--as we should be considering this a different issue with no pre-existing context.

To my understanding of what is known at this time, the hardware is currently undergoing a deep dive by our engineering team. As hard as it is, I'm afraid the best option here is to wait patiently for updates. As we receive updates with actionable test points, we can test them as soon as possible to confirm progress has been made.

1 Like

I just got one of boron up and running on the bench. Had a few days on the 9.0 firmware running fine. Just upgraded it to 1.2.1 will see how it works. Where it is setting I have very poor signal so hopefully it might help trigger the issues.

@darkstar2002 What code are you actually using to confirm it is running fine.?? are you getting any drop outs 5 min or longer? how are you recording that it is actually connected 24/7?

Hi ALL,

Also experiencing issues with particles losing connection and not automatically reconnecting to the cloud. We had been running 0.9 on the device successfully for a few weeks and then this issue started. We upgraded to OS 1.2.1 and now after 2-3 weeks we now see the same issue.

We are using reasonably simple code on the particle and have now connection based code, we simply loop through our applications code until a an rfid tag has been read, then we call a particle publish. We are also a startup company and about to initiate a pilot programme, unfortunately we are currently entirely reliant on these Borons. Our application is fortunate in that we can tolerate large periods of no connection, but we will get nowhere if the Boron’s are not capable of reconnecting.

I have a few questions for the particle engineers:

Have these issues been reproduced in problem solving exercises?
Does a complete power disconnect and reconnect resolve the issue?
I’d love to see a basic project plan for the problem solving road map for these issues, has a root cause been identified, at what stage are the corrective and preventative actions?
Essentially I would just like some more information on this issue.

Again when our Boron’s work as intended they are great, but our entire startup is based on this device and the resolution of this issue.

1 Like

Hi @StngBo,

All great questions.

There is still ongoing investigations into the issue. The cause of flashing green itself is a rather broad error (quite simply, no cellular connectivity) and as a result, can have several causes.

We are still investigating and working on reproducing all of the root causes in order to address them. Being able to reproduce each scenario and understand how they interact with each other will take some time.

With that in mind, progress is being made. We will be offering a more formal update this coming week with a Device OS release candidate to test. In the near term, I can offer you some insights as to where we stand:

For devices that are able to come online at first, but eventually degrade into blinking green, we have filed and merged pull request #1862 . This request tackles a memory leak in the muxer for when a device establishes connectivity. For a poor connection area, a device may re-establish several times, allocating (but never de-allocating memory).

Devices with the above symptom are able to be recovered through reset or turning the device off and on on current device OS.

We are hoping after our release of the next device OS release candidate, to gather feedback and continue investigating the other types of symptoms we are seeing.At present, other ongoing investigations we are looking into are devices that turn on in blinking green but never actually connect and misconfigured I2C slave devices causing device OS stalling.

In short, progress is being made. We will have some actionable testing coming soon. :slight_smile:

2 Likes

Thank you mstanley for this update. That memory leak bug is extremely motivating. It is spectacular to think that that might - as in, could possibly - be the cause for why my Borons repeatedly entered a no-reconnect stage when left at my test site, and that therefore its resolution would allow me to go back to using Particle Boron there.

I have 4 useful things to say to you:

  1. Your efforts on this front are important and much appreciated. In the meantime I had to go back and use pycom Gpy with an always-on brute-force catch-all restart structure (e.g., single connection loss or exception, restart the whole board and reobtain cell connection). This requires tons of power and tons of data and is wasteful, but it does stay up 100% of the time unlike Particle Boron so I’m currently using it. It wastefully uses 20mb a DAY and I’m paying $21/month for a 1GB nimbelink plan. You better believe, as soon as the Boron reconnectivity gets fixed, I will throw the GPy in the trash and go back to Boron! It’s only actually uploading 190kb per day (132 bytes every min).

  2. I have a second site with good cell signal which rarely drops out, unlike other site. Boron (1.2.1-rc3) has been up rock-solid since 7/14 and still uploading right now.

  3. Is it possible to have some sort of low-level watchdog feature that would accomplish the equivalent of a power cycle simply by detecting the infinite green flashing cannot reconnect condition? This would solve the issue for me, even if the root cause is not fixed. I can tolerate 99%, even 95% up-time. What I can’t tolerate is the never-reconnect without physical power cycle. I think many others’ applications are the same.

  4. Here is a specific test result which is helpful. I have one certain Boron that I took to the test location. Connects to cloud perfectly at home, but at the test location, it connects to cell, but then fails to connect to Particle cloud (flashed Orange three times in the middle of Cyan flashing sequence, and infinitely repeats Cyan sequence). So this is different from the primary issue with other my Borons, which is the primary topic here, (i.e., connects, stays connected, then loses the connection at some point and remains permanently in flashing green). This is rather a different but related issue where a certain Particle Boron was permanently inoperable at a certain location, not because it couldn’t connect to cell, but specifically because once connected only the Particle Cloud connection (different) consistently failed. I have no idea why that is or why it works at home but not at test site. I.e., stuck on flashing Cyan instead of flashing Green. Just wanted to send this data point along.

@mstanley, @Dave,

Just read through this thread, and I’d like to know if the Electron LTE is affected by this issue. Hard to determine with all the back and forth. I’ve got 4 Electron LTE’s in testing, with none of them showing this issue.