I have a system in the field that has worked well for at least 1 year. However, after that year and over the last 6 months, the device will connect and send a message to Ubidots every 20 minutes for roughly 4 days and then the device will drop offline and not connect again for roughly 2 days and then reconnect and work fine for another 4 days and repeat.
Notes + what I have tried:
- AT&T Carrier
- Cell tower: 310-410-8328-39356682
- Using internal sim on Boron
- 5V wall power + 2000 mah battery connected
- Automatic Mode
- Thread enabled
- OS 4.2.0
- Ubidots LIbrary 3.2.3
- About 82K of ram remains available
- The device is not resetting. It continues to run the local control program the entire time, with or without cell communication. When online, I can retrieve reset reason and confirm that it was not due to watchdog, code, SOS, or error, etc.
- The antenna has been upgraded to enhance signal strength. Now running at 90% strength and 50% quality.
- Was using Ubidots library and integration method through particle. When it would regularly cycle as described.
- Changed to Ubidots library and TCP method to reach Ubidots. This caused the system to run for 9 days before disconnect. It has stayed disconnected since.
- Message rate average is about 1 message every 20 minutes, sending 1 variable. Sometimes 2 variables will be sent at the same time.
- This problem seemed to appear when a local I2C display was added.
- However, tests were run wth all I2C disabled with the same results. So, unsure if the issue was simply coincidental.
- I installed a second system BRN404X nearby to see how it acted in similar conditions running the same code. It was not attached to any hardware but exhibited a similar issue. When it would drop offline, it stayed that way until reset.
- I carefully looked for any arrays being overrun, etc..
- Next step: upgrade OS to 6.0.0.
- It was noticed that there would be only 1 or 2 disconnects for a few days and then there might be 5 to 8 disconnects on the day that service cuts off.
It is possible that I still have some kind of program bug or a former bug caused some kind of corruption. However, I have exhausted everything that I can think about.
Next step is to get to site and run device doctor.
I would welcome any advice. Have you seen similar?
Respectfully,
Brad
The best way to troubleshoot this if you can attach a laptop to the device to receive the USB serial logs is to enable
SerialLogHandler logHandler(LOG_LEVEL_TRACE);
and monitor the logs after the problem occurs.
Make sure you have an out of memory handler. If the device starts resetting after adding one, you have a memory leak or heap fragmentation.
When the device is not uploading to Ubidots, is it responsive to the Particle cloud? You may want to add a Particle.function that you can query remotely to see if the cloud is working when the problem is occurring.
1 Like
Thank you. I will have access Saturday and will try these ideas.
1 Like
SerialLogHandler is repeating this report: Device is flashing Green.
Reset nor full power down and up recovers communication.
0000282147 [net.pppncp] TRACE: NCP event 3
0000282147 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_DOWN
0000282148 [system.nm] TRACE: Interface 4 power state changed: 1
0000282148 [ncp.client] TRACE: Deinit modem serial.
0000282149 [net.pppncp] ERROR: Failed to initialize cellular NCP client: -210
0000282249 [ncp.client] TRACE: Powering modem on, ncpId: 0x44
0000282249 [net.pppncp] TRACE: NCP event 3
0000282250 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_ UP
0000282250 [system.nm] TRACE: Interface 4 power state changed: 4
0000282400 [net.pppncp] TRACE: NCP event 3
0000282400 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_UP
0000282401 [system.nm] TRACE: Interface 4 power state changed: 2
0000282401 [ncp.client] TRACE: Modem powered on
0000282401 [ncp.client] TRACE: Setting UART voltage translator state 1
0000282502 [ncp.client] TRACE: Setting UART voltage translator state 0
0000282602 [ncp.client] TRACE: Setting UART voltage translator state 1
0000283603 [ncp.at] TRACE: > AT
0000284603 [ncp.at] TRACE: > AT
0000285603 [ncp.at] TRACE: > AT
0000286603 [ncp.at] TRACE: > AT
0000287603 [ncp.at] TRACE: > AT
0000288603 [ncp.at] TRACE: > AT
0000289603 [ncp.at] TRACE: > AT
0000290603 [ncp.at] TRACE: > AT
0000291603 [ncp.at] TRACE: > AT
0000292603 [ncp.at] TRACE: > AT
0000293603 [ncp.at] TRACE: > AT
0000294604 [ncp.at] TRACE: > AT
0000295604 [ncp.at] TRACE: > AT
0000296604 [ncp.at] TRACE: > AT
0000297604 [ncp.at] TRACE: > AT
0000298604 [ncp.client] ERROR: No response from NCP
0000298604 [ncp.client] TRACE: Setting UART voltage translator state 0
0000298605 [ncp.client] TRACE: Hard resetting the modem
0000298605 [ncp.client] TRACE: Modem waiting up to 30s to power off with PWR_UC. ..
0000339605 [net.pppncp] TRACE: NCP event 3
0000339605 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_DOWN
0000339605 [system.nm] TRACE: Interface 4 power state changed: 1
0000339606 [ncp.client] TRACE: Deinit modem serial.
0000339607 [net.pppncp] ERROR: Failed to initialize cellular NCP client: -210
0000339707 [ncp.client] TRACE: Powering modem on, ncpId: 0x44
0000339707 [net.pppncp] TRACE: NCP event 3
0000339708 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_POWERING_ UP
0000339708 [system.nm] TRACE: Interface 4 power state changed: 4
0000339858 [net.pppncp] TRACE: NCP event 3
0000339858 [net.pppncp] TRACE: NCP power state changed: IF_POWER_STATE_UP
0000339859 [system.nm] TRACE: Interface 4 power state changed: 2
0000339859 [ncp.client] TRACE: Modem powered on
0000339859 [ncp.client] TRACE: Setting UART voltage translator state 1
0000339960 [ncp.client] TRACE: Setting UART voltage translator state 0
0000340060 [ncp.client] TRACE: Setting UART voltage translator state 1
0000341061 [ncp.at] TRACE: > AT
0000342061 [ncp.at] TRACE: > AT
0000343061 [ncp.at] TRACE: > AT
0000344061 [ncp.at] TRACE: > AT
0000345061 [ncp.at] TRACE: > AT
0000346061 [ncp.at] TRACE: > AT
0000347061 [ncp.at] TRACE: > AT
0000348061 [ncp.at] TRACE: > AT
0000349061 [ncp.at] TRACE: > AT
0000350061 [ncp.at] TRACE: > AT
0000351061 [ncp.at] TRACE: > AT
0000352061 [ncp.at] TRACE: > AT
0000353061 [ncp.at] TRACE: > AT
0000354061 [ncp.at] TRACE: > AT
0000355061 [ncp.at] TRACE: > AT
This looks like failure of the cellular modem module. This could be TAN004 or possibly TAN001.
OK. At least that is something I can work on.
Also implemented out of memory handler. Nothing unusual reported so far.
I will work on replacing hardware.
Maybe this is related to BRN402....
I had this problem a year ago. My Boron BRN404x with a SARA-R510S-01B-00 modem locked up the same way you're showing in your serial trace. (I reported that problem here too https://community.particle.io/t/boron-brn404x-modem-lockup/67618). My unit had been running OK for several months before this happened. I had in my firmware a System.reset(RESET_NO_WAIT); should it not be able to connect to the particle cloud for more than an hour.
This was not fixing it.
My unit was remote and I couldn't visit it for another 3 months during which it continued to firmware reset every hour. Once I got there I cycled power to it and it connected successfully within 1 minute.
Next, I added an external device that cycles power to the Boron once a day. I'm OK/functional should it lock up for a day, but I can't have it lock for longer than that.
As suggested by Gus at the time, I changed the System.reset to a deliberate watchdog timeout should this issue happen. Since last summer, my Boron has only been disconnected from the Participle cloud over an hour once, so I can't tell if this type of reset actually fixes the modem sudden lockup, since I don't actually communicate with the Boron every day.
(If I am able to verify with certainty that is does, I'll report it here).