Boron 2g/3g doesn't reconnect to cloud after a disconnect event

boron
cellular
Tags: #<Tag:0x00007fe21ab6f388> #<Tag:0x00007fe21ab6f248>

#1

I have a particle Boron 2g/3g running deviceOS1.1.1 here in Canada. The following two scenarios happen quite frequently:

1 - The device runs fine for a few hours and then randomly disconnects from device cloud (goes from breathing cyan to flashing green) requiring a reset to reconnect.
2 - The device status shows connected to cloud (breathing cyan) but it is unreachable from the cloud (functions cannot be called, it cannot be pinged…)

The first case occurs even when its just running an empty loop() so I suspect there’s an issue in deviceOS. I also doubt that it is a network coverage issue since it will be trying to connect to cellular (flashing green) for hours without success and when I reset it, it connects in 20 seconds.
The second case happens randomly and I don’t know how to reproduce it. Is there a way to enable log output from deviceOS to troubleshoot this? Any help would be appreciated.


#2

Have you read the logging section of the documentation?

https://docs.particle.io/reference/device-os/firmware/boron/#logging

SerialLogHandler is what you’ll want to use to enable the logging messages. There are a few different levels of messages. LOG_LEVEL_INFO is probably what you want to start with.

You can initialize a log handler like this:

SerialLogHandler logHandler(LOG_LEVEL_INFO);

#3

In order to get internal device OS output from the logHandler you may need to build a debug version of the device OS via Particle Workbench tho’


#4

Thank you, I was wondering why I saw no output on terminal even when using LOG_LEVEL_ALL. I will try with debug version of device OS.
Also, are the scenarios I described above, common with the 2g/3g Boron? Do you have any tips on resolving them?


#5

Without knowing your code my first guess would be heap fragmentation (e.g. due to “overuse” of String or other dynamic memor)


#6

After cleaning up my code for heap fragmentation the problem is still persistent. Even when running the Boron 2g/3g without any application code (empty setup() and loop()) it randomly loses connectivity and goes from breathing cyan to blinking green. I have tested this on two separate 2g/3g modules with the latest deviceOS 1.2.1. below is the log output when it disconnects:

[comm.protocol] ERROR: Event loop error 3
[system] WARN: Communication loop error, closing cloud socket

Here’s the code I use to get the error:

SYSTEM_THREAD(ENABLED);
SerialLogHandler LogHandler(LOG_LEVEL_ALL);

void setup() {
    pinMode(D7, OUTPUT);
    while(Particle.connected() == false){;}
    blinkLED();
    Particle.keepAlive(15*60);
}

void loop() {
}

void blinkLED(){
    digitalWrite(D7, HIGH);
    delay(200);
    digitalWrite(D7, LOW);
}

I also occasionally get the following error:

[system] ERROR: failed to load session data from persistent storage

I recently got a Boron LTE however and it recovers from the aforementioned error and is back to breathing cyan within 30 seconds of disconnecting.
Could this be a bug in deviceOS that’s only affecting the 2g/3g Borons?


#7

Hi @Starships,

A bug was found recently that impacts all Boron devices where a memory leak occurs and can lead to seemingly random disconnecting and getting stuck in a blinking green/flashing cyan state. This occurs far more frequently in poor connectivity areas. It’s possible you may soon see the behavior on your Boron LTE, especially if it is in a poor connectivity area.

Device OS v1.3.1-rc.1 addresses this memory leak issue and is planned for release tomorrow (8/21). If all testing goes well, expect this tomorrow. A topic will be posted with regards to this release and instructions on how to run further testing to understand if the issue is resolved.


#8

I have updated to v1.3.1-rc.1. but the issue persists on my 2G/3G Borons. Device is stuck in “blinking green” with no user code. The LTE Boron is still running v1.2.1 with user code and hasn’t had this issue yet.

After some digging I found this thread Boron LTE stuck flashing green. Base on the information found there, I have downgraded to v0.9.0 and haven’t had a disconnect event yet (~2 hours). I will be monitoring the device over a few days to see if it can recover from a disconnect event with this old firmware. I will be posting updates.

For completeness, the boron 2G/3G I am testing is on an empty breadboard, is using the antenna and battery that was shipped with it, and is connected to a windows 10 machine via the shipped usb cable.


#9

Hi @Starships, to clarify, is the device able to initialize fine and falls into a blinking green state, or is the device turning on and immediately blinking green, never connecting?

In either scenario, if you could provide logs as per Status Update for cellular connectivity issues | 8/21/19 using the new v1.3.1-rc.1 binary, that would be immensely helpful in pinpointing the cause so that we may address it.


#10

@mstanley, the device initializes fine and then falls into blinking green. The time it takes to fall to green can range from 5 minutes to several days.
I will collect the logs using v1.3.1-rc.1 and share them with you asap.

Update from last comment: It happened with v0.9.0 as well, so that wasn’t helpful.


#11

@mstanley The cloud debugger has been running for a couple of days now and a disconnect event hasn’t occurred yet. Interestingly, I had run a monolithic debug build (v1.2.1) before for over 3 days when I was trying to debug the issue myself and the event never occurred back then either. It seem like monolithic builds are immune to the problem. However, I have dedicated one device to the cloud debugger and will keep it running for a long time and update you if I catch the event.


#12

Monolithic builds shouldn’t have any impact, but that is interesting.

It should be noted the muxer memory leak issue isn’t a static one that occurs over a fixed period. It’s an intermittent issue that has to do with new cellular session negotiations. The rate at which a device would fall into the bad state on an older Device OS version is relative to the number of times a device has to renegotiate a cellular connection. Poor connectivity issues were far more likely to see this issue (and see it more quickly).It’s possible the v1.2.1 monolithic you were running on was just during a good day where cellular connectivity was that much more reliable.


#13

So I ran the cloud debugger for over a week on my 2g/3g boron and had no connectivity issues. as soon as I switched to 1.3.1-rc.1 the device gets stuck in blinking green after a being connected for a few minutes. Same issue with 1.4.0-rc.1. In the mean time the LTE boron is running fine and is still connected to the cloud after 5 days and 55 disconnect events. Seems like the issue isn’t fixed on the 2g/3g models. Have you heard back from other 2g/3g users with the same issue?

One improvement I have noticed after the update is that the 2g/3g boron does recover from blinking green sometimes (which never happened before) but eventually gets stuck.

Also, why is the LTE disconnecting so frequently? Both devices are in the same location about 20cm apart.


#14

Hi Starships.

I’ve not heard any customer concerns of the issue coming back up for any customer on v1.3.1-rc.1

I encourage you to run https://github.com/rickkas7/boron-clouddebug on your device to capture debug logs so that we can see what may be causing issues.

It’s unclear to me why your units are disconnecting so frequently. Feel free to direct message me your device IDs and I can investigate on our client side why they may be disconnecting so frequently.

Boron 2G/3G and Boron LTE make use of different operators, technologies, and towers. It could be a coverage issue or it could be something else. I can potentially provide more insight once I am able to view cloud side logs for these devices.


#15

I’m still having issues. I think there needs to be more testing done with sim cards that are not Particle sims?? from my understanding all the testing is with a particle sim?
I kinda have just given up to be honest. I should have never changed from the Electron to the Boron. :frowning:


#16

Hi ric.

Correct, we do not do testing on third party SIMs. The number of SIM configurations and regions are vast, making this difficult to test for. As such, we only do formal testing on our own first party SIMs.

With that said, the primary cause of third party SIM failure is likely around keep alive values not being properly set. I highly encourage you to experiment with the keep alive for your device to ensure it is appropriately set. A keep alive that is too high is likely liable to make the device all in and out of flashing green as the operator is closing the connection.

I highly suspect you would see this issue as well on the Electron and that this is not a Boron specific issue.

For preliminary testing, I encourage you to set the keep alive to a very low value, such as 30 seconds, and monitor activity over the course of an hour. Do keep in mind that this will consume more data than usual so do not let it run indefinitely.

To more deterministic-ally figure out the keep alive value for your third party SIM, it may be worth exploring my keep-alive-tester tool. It is a two part tool (user app and Node application) that sets up a simple user application with an intentionally high keep alive timeout. In that time, you can use a pub-sub model that is set up with the node application to send and receive data from the device. The tool can be run several times to help determine what the keep alive value for your device should be for its particular SIM in that particular region.


#17

Just want to point out that I’m having the issue with the particle SIM in Canada. @mstanley I will dm you my device IDs shortly. As I mentioned before, I ran the cloud debugger for over a week and the connection was solid throughout. The flashing green bug only happens in normal build for some reason.


#18

Understood. That suggests there may be something going on either in your user application or API usage that could be having issues then. We can certainly take a look once I have device IDs. :slight_smile: