Boron LTE Stuck Flashing Green

boron
Tags: #<Tag:0x00007fe21de6fa80>

#61

Interesting. I’m now having a reconnection problem on my main development Boron LTE sitting on my desk. It has a high gain external antenna attached, and has been powered up and connected for about two weeks, including periodic resets. I pulled the USB connection to test battery life (2000mAH LiPo) and let ran it until the battery drained. It was offline for a day or so before I reconnected power. Now it will not connect with or without the battery attached. I’m using a Hologram SIM. I have tried firmware 1.2.1-rc.3 and 1.3.0-rc.1. Same behavior on both.

Behavior I’m seeing:

  • At startup AT commands may complete as expected, or they may take 30+ seconds (e.g. AT+CGDCONT=2,“IP”,“hologram”
  • The modem does not connect, and after several minutes my code resets the Boron to try again
  • I now see two items in the serial log. “ERROR: Failed to power off modem” and “ERROR: No response from NCP”. This occurs even after a full power cycle.

The location, antenna, circuitry, etc. did not change from the time it connected successfully to now.

I’m wondering if the modem has blacklisted my local towers for some reason. I don’t recall the command to clear the blacklist, but will look into it. Should I clear that on startup? I understand that will result in longer initial connection times, but if it will connect more reliably then it is worth it.

Any other suggestions?

EDIT: I just realized I reflashed 1.2.1-rc.3 instead of 1.3.0-rc.1. I’ll test with that version and update again.

EDIT 2: Same behavior with 1.3.0-rc.1.

EDIT 3: Here is the log from the Boron

...Reconnected to /dev/cu.usbmodem142301 ...
0000005491 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005492 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005492 [app] INFO: Reset Reason: 0x8c
0000005493 [app] INFO: Free Memory 61416 bytes
0000005494 [app] INFO: Initializing Display
0000005647 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011902 [hal] ERROR: Failed to power off modem
0000032903 [hal] ERROR: No response from NCP
0000103903 [app] INFO: >> AT+URAT=7
0000163903 [app] INFO: >> AT+UMNOPROF=0
0000223903 [app] INFO: >> AT+CFUN=1
0000283904 [app] INFO: Connecting to network...
0000386948 [gsm0710muxer] ERROR: The other end has not replied to keep alives (TESTs) 5 times, considering muxed connection dead
0000398599 [hal] ERROR: Failed to power off modem
0000419699 [hal] ERROR: No response from NCP
0000523904 [app] INFO: No cellular connection ...connection lost to /dev/cu.usbmodem142301 ...
...Reconnected to /dev/cu.usbmodem142301 ...
0000005487 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005488 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005489 [app] INFO: Reset Reason: 0x8c
0000005490 [app] INFO: Free Memory 61480 bytes
0000005490 [app] INFO: Initializing Display
0000005644 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011898 [hal] ERROR: Failed to power off modem
0000032899 [hal] ERROR: No response from NCP
0000103899 [app] INFO: >> AT+URAT=7
0000163899 [app] INFO: >> AT+UMNOPROF=0
0000223900 [app] INFO: >> AT+CFUN=1
0000283900 [app] INFO: Connecting to network...
0000396939 [gsm0710muxer] ERROR: The other end has not replied to keep alives (TESTs) 5 times, considering muxed connection dead
0000408590 [hal] ERROR: Failed to power off modem
0000429691 [hal] ERROR: No response from NCP
0000523900 [app] INFO: No cellular connection af...connection lost to /dev/cu.usbmodem142301 ...
...Reconnected to /dev/cu.usbmodem142301 ...
0000005487 [app] INFO: picsil Sense is running! ID e00fce6864c5f1a4f2c0606a
0000005488 [app] INFO: Firmware version 2.0.118, OS version 1.3.0-rc.1
0000005489 [app] INFO: Reset Reason: 0x8c
0000005490 [app] INFO: Free Memory 61480 bytes
0000005490 [app] INFO: Initializing Display
0000005644 [app] INFO: >> AT+CGDCONT=2,"IP","hologram"
0000011899 [hal] ERROR: Failed to power off modem
0000032900 [hal] ERROR: No response from NCP
0000103900 [app] INFO: >> AT+URAT=7

#62

Just wanted to thank everyone who is chiming in here!!! It has been really helpful to get you guys behind this thread and get it the attention it deserves. I am in contact with the Engineering Team at Particle, and for a couple weeks my Borons began to play nice. I wish I could say that was the end to this story.

The guys at Particle did send me a message on here which I must have missed with how busy this thread got, I was getting so many messages from this thread, it was hard to keep track of them in email. They offered to step up here with a refund, and I just replied this morning, as of course, in the last 72 hrs, these issues began to reappear for me overnight, while I was asleep.

I woke up to a bunch of green flashing lights. My initial suspicion this time, was a cell tower issue or the Particle backend was down, as ATT did have some reported outages in here in San Diego, but after waiting, and waiting some more, and then updating to from 1.2.1RC1 to 1.2.1GM, and then onto 1.3.0RC1. I can confirm that this issue has not been fixed in either one, and its been 3days, which is way too long to have these be dropping connection for in my book.

The issue does not appear to be exactly the same as it was originally for me though. Originally, my devices were stuck green and would never connect to the cloud. Now they will connect, but for very short periods of time, and then back to flashing green. I was barely able to get flashes to them before they disconnected again. As this issue is still occurring with a new bootloaders and TINKER installed, I know it is not related to my code. Just to note, as I saw someone list it above, I am using Semi Automatic and the System Thread is Enabled. Turning off Semi Automatic did not make a difference for me, just FYI.

While I hope the team at Particle, will take me up on my offer to work for them, even if they don’t, this is the sad truth to this story. If the team at Particle, cannot get the huge stability issues, fixed with their Borons, and their entire backend up to snuff, and I mean like RIGHT NOW, I will have to scrap their controllers and go another route, which will not only be a mess, but may also sink this product completely. I have been very patient, and continue to be, but at this point, I don’t care who you need to hire or fire, but this need to be fixed, NOW.

I am not going to expose the company I am putting my blood sweet and tears into, legally, for selling unreliable or defective devices, because I was shipped bad hardware or bad firmware updates, to good working hardware, as the same exact Boron hardware that started this thread, used to be rock solid in the locations I am having issues in now.

So, Mr. CEO, if you are keeping an eye on this thread, I don’t care, if it means that NFC and Bluetooth support, have to stay in beta a whole month longer, you absolutely, need to dedicate every single engineer you can possibly muster, to fixing these issues, as the quite literally, the fate of multiple companies is now in your hands, as it has been made abundantly clear in this thread, by not just me, but others too!!!

Please don’t take that for granted!!! If you want to market your business as helping creators, get products to market, and want people to sign contracts for priority service, then at least show us, that you can tackle the small stuff, like making your own stuff work right!!!

Thank you again, to everyone for jumping on this, to help confirm I’m not alone here!!! Best of luck with your own products and projects!!!
-Spencer


#63

Hi Spencer,

It sounds like you’ve got multiple threads going with the folks at Particle (myself included). Lets focus on digging into the root cause of what issues you’re seeing in those threads if that’s okay. What’s helpful in the forums are specific symptoms, device logs, code examples, schematics, and so on. I know in this case you mentioned you’re running tinker, and some system firmware versions which is helpful, thanks!

What signal strength are you seeing on your devices? How are they powered? Are they in a breadboard, or foam, or in a custom circuit board? Which antenna are you using? How is it mounted? When you mentioned waiting, what did you try? Did you reset or unplug / replug your devices?

It’s worth mentioning here that while we’re happy to help dig in, there are just so so so many variables that we need to understand in your case to help debug. Your device being offline for some amount of time could be your power supply, usb cable, battery, mounting, positioning, firmware, antenna selection, pcb, workstation, local carrier, local interference, carrier to internet, etc, etc. Your device could be operating outside the environmental or power conditions in the datasheet, etc, etc.

We have an awesome support team, and an awesome engineering team that try to make the best devices and firmware and experience, and we do keep an eye on threads where folks say things aren’t working. The more specific actionable information we can gather about something like this, the better we can understand the root cause and if there is something we can fix / change to make your experience better.

Thanks,
David


#64

Dave,
Did you receive the email from earlier today?

My setup has not changed since I began this thread, and ordered the additional Borons to confirm it was never the hardware, and was your firmware/bootloader/modem firmware, or issues with you carriers, and the issues is with all the borons in the exact same places they had been working fairly stabling for about a month, with only the occasional disconnect, maybe once a week, but they would reconnect on their own when I let them be, but this is still, nowhere near where it was from a stability standpoint, prior to 1.2.+ firmwares. My power did not go out. I was asleep and had not touched the firmware or changed any code, since this issue finally mysteriously went away with zero explanation from you guys a few weeks ago, again with zero changes it went away on my end and was fine, but now it’s back again. Right around the time you guys put out 1.3.0RC1 is when these issues began again, even though these devices where all still running 1.2.1RC1. It wasn’t until they has been offline for 3 days that I tried rebooting them and upgraded the firmware and bootloaders and started testing with only tinker, again to make sure it wasn’t anything i wrote, and to make sure it could not be blamed on me for anything i changed on my end.

I also have now explained this in 3 places, counting here and the first two are private, one earlier today, which is where these kinds of discussions should probably be had, I would really appreciate it, if you would just email me, as you wont call me, and also please touch base with Matt or have him copy you in on my conversation with him, so you have all the data. I know this is the weekend and I very much appreciate you reaching out, but I am getting different conflicting instructions from both you and Matt in different places too, so I am trying understand what I can do to help you make this better faster, as I feel I have given you all the data I can. If you need me to do logging I can send you them, I just need to know which one of you wants to take the lead on this, so I am not having two conversations about the same thing with two different people simultaneously. Again I appreciate the effort, but who should I send them to or be dealing with here to get the fastest results??

My only idea at this point, is to look into the modem firmware for some bugs or talk to your sim carriers providers, because this issue seems to primarily be impacting the LTE devices only right now. If you would like I can contact ATT, as I believe that is your sim provider on the builtin sim to inquire about outages, but at this point, I’m am having a really hard time believing this would last this long.

Help me to help you :grin:


#65

Hi @sdevo619

I provided more information on your concerns in private. In short, There are currently no known causative links to device OS and the issues you are seeing.

A general idea of impacted use cases is known. Exact circumstances leading to bad behavior is still unknown, but is currently being investigated. There is no blame being cast for the issues here. We acknowledge that issues exist and wish to only understand them.

Both Dave and I are happy to assist you moving forward. However, for sake of simplicity, feel free to consider me the primary contact on this matter moving forward. :slight_smile:

As a follow up to Dave’s input: Dave’s instructions are certainly useful. It is good to understand any changes that may have occurred along the way. As your setup has not changed much, I believe we may not need to collect much in terms of additional information at this time–at least for the Borons.

From our direct message conversation, I understand there are some concerns about the Electron as well. The issues observed here and what is known at this point does not suggest they are impacting the Electron. As such, Dave’s suggestions are recommended–as we should be considering this a different issue with no pre-existing context.

To my understanding of what is known at this time, the hardware is currently undergoing a deep dive by our engineering team. As hard as it is, I’m afraid the best option here is to wait patiently for updates. As we receive updates with actionable test points, we can test them as soon as possible to confirm progress has been made.


#66

I just got one of boron up and running on the bench. Had a few days on the 9.0 firmware running fine. Just upgraded it to 1.2.1 will see how it works. Where it is setting I have very poor signal so hopefully it might help trigger the issues.


#67

@darkstar2002 What code are you actually using to confirm it is running fine.?? are you getting any drop outs 5 min or longer? how are you recording that it is actually connected 24/7?


#68

Hi ALL,

Also experiencing issues with particles losing connection and not automatically reconnecting to the cloud. We had been running 0.9 on the device successfully for a few weeks and then this issue started. We upgraded to OS 1.2.1 and now after 2-3 weeks we now see the same issue.

We are using reasonably simple code on the particle and have now connection based code, we simply loop through our applications code until a an rfid tag has been read, then we call a particle publish. We are also a startup company and about to initiate a pilot programme, unfortunately we are currently entirely reliant on these Borons. Our application is fortunate in that we can tolerate large periods of no connection, but we will get nowhere if the Boron’s are not capable of reconnecting.

I have a few questions for the particle engineers:

Have these issues been reproduced in problem solving exercises?
Does a complete power disconnect and reconnect resolve the issue?
I’d love to see a basic project plan for the problem solving road map for these issues, has a root cause been identified, at what stage are the corrective and preventative actions?
Essentially I would just like some more information on this issue.

Again when our Boron’s work as intended they are great, but our entire startup is based on this device and the resolution of this issue.


#69

Hi @StngBo,

All great questions.

There is still ongoing investigations into the issue. The cause of flashing green itself is a rather broad error (quite simply, no cellular connectivity) and as a result, can have several causes.

We are still investigating and working on reproducing all of the root causes in order to address them. Being able to reproduce each scenario and understand how they interact with each other will take some time.

With that in mind, progress is being made. We will be offering a more formal update this coming week with a Device OS release candidate to test. In the near term, I can offer you some insights as to where we stand:

For devices that are able to come online at first, but eventually degrade into blinking green, we have filed and merged pull request #1862 . This request tackles a memory leak in the muxer for when a device establishes connectivity. For a poor connection area, a device may re-establish several times, allocating (but never de-allocating memory).

Devices with the above symptom are able to be recovered through reset or turning the device off and on on current device OS.

We are hoping after our release of the next device OS release candidate, to gather feedback and continue investigating the other types of symptoms we are seeing.At present, other ongoing investigations we are looking into are devices that turn on in blinking green but never actually connect and misconfigured I2C slave devices causing device OS stalling.

In short, progress is being made. We will have some actionable testing coming soon. :slight_smile:


What is the status of Boron LTE issues?
Memory leak in Gen3 network handling
#70

Thank you mstanley for this update. That memory leak bug is extremely motivating. It is spectacular to think that that might - as in, could possibly - be the cause for why my Borons repeatedly entered a no-reconnect stage when left at my test site, and that therefore its resolution would allow me to go back to using Particle Boron there.

I have 4 useful things to say to you:

  1. Your efforts on this front are important and much appreciated. In the meantime I had to go back and use pycom Gpy with an always-on brute-force catch-all restart structure (e.g., single connection loss or exception, restart the whole board and reobtain cell connection). This requires tons of power and tons of data and is wasteful, but it does stay up 100% of the time unlike Particle Boron so I’m currently using it. It wastefully uses 20mb a DAY and I’m paying $21/month for a 1GB nimbelink plan. You better believe, as soon as the Boron reconnectivity gets fixed, I will throw the GPy in the trash and go back to Boron! It’s only actually uploading 190kb per day (132 bytes every min).

  2. I have a second site with good cell signal which rarely drops out, unlike other site. Boron (1.2.1-rc3) has been up rock-solid since 7/14 and still uploading right now.

  3. Is it possible to have some sort of low-level watchdog feature that would accomplish the equivalent of a power cycle simply by detecting the infinite green flashing cannot reconnect condition? This would solve the issue for me, even if the root cause is not fixed. I can tolerate 99%, even 95% up-time. What I can’t tolerate is the never-reconnect without physical power cycle. I think many others’ applications are the same.

  4. Here is a specific test result which is helpful. I have one certain Boron that I took to the test location. Connects to cloud perfectly at home, but at the test location, it connects to cell, but then fails to connect to Particle cloud (flashed Orange three times in the middle of Cyan flashing sequence, and infinitely repeats Cyan sequence). So this is different from the primary issue with other my Borons, which is the primary topic here, (i.e., connects, stays connected, then loses the connection at some point and remains permanently in flashing green). This is rather a different but related issue where a certain Particle Boron was permanently inoperable at a certain location, not because it couldn’t connect to cell, but specifically because once connected only the Particle Cloud connection (different) consistently failed. I have no idea why that is or why it works at home but not at test site. I.e., stuck on flashing Cyan instead of flashing Green. Just wanted to send this data point along.


#71

@mstanley, @Dave,

Just read through this thread, and I’d like to know if the Electron LTE is affected by this issue. Hard to determine with all the back and forth. I’ve got 4 Electron LTE’s in testing, with none of them showing this issue.


#72

Hi Heath.

The muxer issue in #1862 is an issue that is known to impact both Boron 2G/3G but is not something that impacts E Series devices

For LTE devices you should be running on no earlier than device OS v1.2.1 for all the latest LTE fixes. That I am aware, no known LTE issues exist past v1.2.1 at this time.


#73

@mstanley, thanks for the update. Just for clarity, we are testing and planning to deploy the Electron LTE, not E-Series LTE.


#74

Hi Heath.

My apologies on the mispeak there. I had intended to speak of the Electron LTE, specifically. The Electron LTE and E Series LTE are very similar in nature. No Gen 2 (of which, the Electron LTE is based on) will be impacted by this issue, so there’s no worries to be had. It still stands that no known issues exist for the Electron LTE or any other gen 2 devices past v1.2.1 at this time.


#75

Thanks for the clarification!


#76

I can attest coming across the same issue recently: disconnect in case of weak signal and never connect again, staying in the green light blinking mode. Running OS 1.1.0 for a few months. I also observed a little different behavior which I would like to draw attention to, I described in another post: Boron thinks it's online, console disagrees.

I’m about to place an order on 100 Borons for our pilot run of connected devices and it’s important to know we can proceed with the launch confidently as the product’s successful reception depends on it.


#77

I would look at this issue: Boron LTE Fails to Connect before purchasing any more Boron’s.

In our test of 20 Borons in the field, 3 have failed this way. We’d like to put another 40-60 in the field, but I think we will hold on that until Particle gets to root cause on these issues.

For some context, our carrier board can sit either a Boron or an Electron – we have not had an Electron failure and have similar numbers of Electrons in the field.


#78

Hi Paul,

The connectivity symptom you are describing is inline with an understood issue we have identified for all Boron devices. A release candidate will be out this upcoming week in order to address the issue. I certainly encourage you to update to it as soon as it’s out to let us know if it helps alleviate the connectivity issues you’re seeing here.

The memory leak issue identified on the Boron is specific to the Boron. It would make sense that your Electrons are not exhibiting similar behavior. Our upcoming v1.3.1 release candidate should alleviate this issue for your devices as well.


#79

Thanks, Matthew. Will be looking out for the upcoming release.


#80

@mstanley,

If this is a memory leak issue, wouldn’t a full power cycle resolve it?

My devices cannot connect even after a full power cycle. I’ve both power cycled in person AND my carrier board has a hardware watchdog that power cycles if no communication.