One of 4 Xenons repeatedly crashing with SOS Code 7


#1

One of my 4 has been continuously crashing with a Red SOS code that seems to indicate Code 7 then rebooting.every couple of minutes or so and I am not sure why. All I have running on all of my devices is the Tinker code and my other devices (2 Argons, 3 Xenons) don’t exhibit this behavior.

This has been happening since I first got it. I have tried factory resetting it and re-adding to the mesh, I have updated the firmware on all of my devices(this happened on 25 and still occurring on 26). I have tried removing it and re-adding since the firmware update.

Does mine have some kind of hardware defect?

Video: https://imgur.com/a/fUrD3r9


#2

Probably not a hardware defect. Others have been having this problem with certain Xenons. As suggested by @avtolstoy, you can flash some special diagnostic binaries to output debugging information on Serial1. However, of the 2 or 3 logs others have already posted, the failure looks to be the same. Something about RTC/jitter correction. You may still want confirmation from someone at Particle but see the following post:


#3
0000088498 [net.th] TRACE: OpenThread state changed: 64
0000088498 [net.th] TRACE: Candidate preferred prefix: FD52:EF72:CE0::/64, preference = Medium, RLOC16 = 5800, preferred = 1, stable = 1
0000088508 [net.th] TRACE: OT_CHANGED_IP6_ADDRESS_ADDED
0000088514 [net.th] TRACE: OT_CHANGED_THREAD_ROLE
0000088514 [net.th] TRACE: OT_CHANGED_THREAD_RLOC_ADDED
0000088519 [net.th] TRACE: OT_CHANGED_THREAD_RLOC_REMOVED
0000088525 [net.th] TRACE: OT_CHANGED_IP6_MULTICAST_SUBSRCRIBED
0000088530 [system.ot] INFO: Role changed: router
0000088536 [system.ot] TRACE: RLOC was added
0000088541 [system.ot] TRACE: RLOC was removed
0000096819 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000096820 [comm.protocol] INFO: message id 27 complete with code 0.00
0000096825 [comm.protocol] INFO: rcv'd message type=13
0000120109 [hal] ERROR: Assertion failed: openthread/third_party/NordicSemiconductor/drivers/radio/raal/softdevice/nrf_raal_softdevice.c:236 timer_jitter_adjust (cc_margin > rtc_tick~
!0000000236 [system] INFO: Device e00fce68b411cab3576f1d65 started
0000000001 [system] TRACE: Last reset reason: 130 (data: 0x0a)
0000000010 [net.ifapi] TRACE: LwIP started
0000000018 [ot.api] INFO: OpenThread version: OPENTHREAD/0.01.00; Xenon; Dec 14 2018 16:58:07
0000000019 [ot.api] INFO: Max transmit power: 8
0000000026 [ot.api] INFO: Network name: Hyperion
0000000027 [ot.api] INFO: 802.15.4 channel: 11
0000000030 [ot.api] INFO: 802.15.4 PAN ID: 0x7b0e
0000000037 [net.th] INFO: Creating new LwIP OpenThread interface
0000000041 [net.ifapi] INFO: Netif th1 added
0000000049 [net.ifapi] INFO: Netif dm2 added
0000000050 [net.ifapi] INFO: Netif dm2 deleted
0000000053 [hal] TRACE: Heap: 57/171 Kbytes used
0000000060 [net.th] TRACE: OpenThread state changed: 17f333b
0000000067 [system.ctrl.ble] INFO: Device name: Xenon-KE37SB
0000000071 [system.nm] INFO: State changed: NONE -> DISABLED
0000000076 [system.nm] INFO: State changed: DISABLED -> IFACE_DOWN
0000000082 [system.nm] INFO: State changed: IFACE_DOWN -> IFACE_REQUEST_UP
0000000086 [net.ifapi] INFO: Netif th1 state UP
0000000093 [net.th] INFO: Bringing OpenThreadNetif down
0000000098 [net.th] INFO: Bringing OpenThreadNetif up
0000000104 [net.th] INFO: Network name: Hyperion
0000000104 [net.th] INFO: 802.15.4 channel: 11
0000000108 [net.th] INFO: 802.15.4 PAN ID: 0x7b0e
0000000230 [system.nm] INFO: State changed: IFACE_REQUEST_UP -> IFACE_UP
0000000342 [net.th] TRACE: OpenThread state changed: 4
0000000342 [net.th] TRACE: OT_CHANGED_THREAD_ROLE
0000000343 [system.ot] INFO: Role changed: detached
0000000653 [net.th] TRACE: OpenThread state changed: 10e4
0000000653 [net.ifapi] INFO: Netif th1 link UP
0000000653 [system.nm] INFO: State changed: IFACE_UP -> IFACE_LINK_UP
0000000659 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000669 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000670 [net.th] TRACE: Added FE80::CC5F:755F:41B8:D538 0
0000000680 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000681 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000691 [net.th] TRACE: Added FDF4:A03D:754E:0:7ED2:827:DCC7:CD9A 0
0000000692 [net.th] TRACE: OT_CHANGED_IP6_ADDRESS_ADDED
0000000702 [net.th] TRACE: OT_CHANGED_THREAD_ROLE
0000000702 [net.th] TRACE: OT_CHANGED_THREAD_RLOC_ADDED
0000000713 [net.th] TRACE: OT_CHANGED_THREAD_RLOC_REMOVED
0000000713 [net.th] TRACE: OT_CHANGED_THREAD_PARTITION_ID
0000000724 [net.th] TRACE: OT_CHANGED_IP6_MULTICAST_SUBSRCRIBED
0000000725 [system.ot] INFO: Role changed: router
0000000735 [system.ot] TRACE: RLOC was added
0000000736 [system.ot] TRACE: RLOC was removed
0000000736 [system.ot] TRACE: Partition ID changed
0000000746 [system.ot] TRACE: Subscribed to IPv6 multicast address
0000000751 [comm] INFO: channel inited
0000000760 [net.th] TRACE: OpenThread state changed: 200
0000000761 [net.th] TRACE: OT_CHANGED_THREAD_NETDATA
0000000761 [net.th] TRACE: Candidate preferred prefix: FD52:EF72:CE0::/64, preference = Medium, RLOC16 = 5800, preferred = 1, stable = 1
0000000777 [net.th] INFO: Switched over to a new preferred prefix: FD52:EF72:CE0::/64, preference = Medium, RLOC16 = 5800, preferred = 1, stable = 1
0000000788 [hal] INFO: DNS server list changed
0000000794 [net.th] INFO: DNS server on mesh network: FD52:EF72:CE0::1
0000000799 [system.ot] TRACE: Thread network data changed
0000000805 [net.th] TRACE: OpenThread state changed: 1
0000000810 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000816 [net.ifapi] TRACE: Netif th1 ipv6 addr state changed
0000000822 [system.nm] INFO: State changed: IFACE_LINK_UP -> IP_CONFIGURED
0000000827 [net.th] TRACE: Added FD52:EF72:CE0:0:F1FC:D97A:9609:2CA5 0
0000000832 [net.th] TRACE: OT_CHANGED_IP6_ADDRESS_ADDED
0000000838 [system.ot] TRACE: IPv6 address was added
0000000844 [system] INFO: Cloud: connecting
0000000845 [system] INFO: Read Server Address = type:1,domain:$id.udp-mesh.particle.io
0000000855 [system] INFO: Loaded cloud server address and port from session data
0000000860 [system] TRACE: Address type: 1
0000000866 [system] TRACE: Cloud socket=0, family=10, type=2, protocol=17
0000000871 [system] INFO: Cloud socket=0, connecting to 64:FF9B::36E3:ABAB#5684
0000000877 [system] TRACE: Cloud socket=0, connected to 64:FF9B::36E3:ABAB#5684
0000000883 [system] TRACE: Updating cloud keepalive for AF_INET6: 30000 -> 30000
0000000893 [system] TRACE: Applying new keepalive interval now
0000000899 [system] INFO: Cloud socket connected
0000000900 [system] INFO: Starting handshake: presense_announce=0
0000000905 [comm.protocol.handshake] INFO: Establish secure connection
0000001128 [comm.dtls] INFO: (CMPL,RENEG,NO_SESS,ERR) restoreStatus=0
0000001128 [comm.dtls] INFO: out_ctr 0,1,0,0,0,0,0,46, next_coap_id=1b
0000001133 [comm.dtls] INFO: app state crc: cached: 81a6cfb2, actual: 8e5d04f9
0000001140 [comm.dtls] INFO: restored session from persisted session data. next_msg_id=27
0000001145 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 2
0000001151 [comm.protocol.handshake] INFO: Sending HELLO message
0000005654 [comm.protocol.handshake] INFO: Handshake completed
0000005655 [system] INFO: Send spark/hardware/max_binary event
0000005657 [system] INFO: Send spark/device/last_reset event
0000005662 [system] INFO: Send subscriptions
0000005665 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000005671 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000005677 [comm] INFO: Sending TIME request
0000005685 [comm.protocol] INFO: Sending 'M' describe message
0000005688 [comm.protocol] INFO: rcv'd message type=1
0000005694 [system] INFO: Cloud connected
0000005698 [system] TRACE: Updating cloud keepalive for AF_INET6: 30000 -> 30000
0000005704 [system] TRACE: Applying new keepalive interval now
0000005710 [comm] INFO: Forcing a cloud ping
0000005789 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000005790 [comm.protocol] INFO: message id 29 complete with code 0.00
0000005795 [comm.protocol] INFO: rcv'd message type=13
0000005803 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000005807 [comm.protocol] INFO: message id 30 complete with code 0.00
0000005811 [comm.protocol] INFO: rcv'd message type=13
0000005841 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000005842 [comm.protocol] INFO: message id 31 complete with code 0.00
0000005848 [comm.protocol] INFO: rcv'd message type=13
0000005849 [comm.protocol] TRACE: Reply recieved: type=2, code=69
0000005858 [comm.protocol] INFO: message id 32 complete with code 2.05
0000005863 [comm.protocol] INFO: Received TIME response: 1544843495
0000005871 [comm.protocol] INFO: rcv'd message type=12
0000005874 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000005881 [comm.protocol] INFO: message id 33 complete with code 0.00
0000005885 [comm.protocol] INFO: rcv'd message type=13
0000007088 [comm.protocol] INFO: Sending 'S' describe message
0000007092 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000007301 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000007301 [comm.protocol] INFO: rcv'd message type=1
0000007302 [comm.protocol] INFO: Sending 'A' describe message
0000007309 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 4
0000007318 [comm.dtls] INFO: session cmd (CLS,DIS,MOV,LOD,SAV): 3
0000007319 [comm.protocol] INFO: rcv'd message type=1
0000037554 [comm.protocol] TRACE: Reply recieved: type=2, code=0
0000037554 [comm.protocol] INFO: message id 34 complete with code 0.00
0000037560 [comm.protocol] INFO: rcv'd message type=13



#4

Seems like each time it’s this error:

0000120109 [hal] ERROR: Assertion failed: openthread/third_party/NordicSemiconductor/drivers/radio/raal/softdevice/nrf_raal_softdevice.c:236 timer_jitter_adjust (cc_margin > rtc_tick~


#5

Yep, at timestamp 0000120109 it’s the same timer_jitter_adjust crash as everyone else found.


#6

I still wonder if it might be hardware. Out of four units this is the only one that’s been doing it and it’s been doing it since I first got it.


#7

It’s a firmware issue as mentioned and being investigated. :slight_smile:


#8

As stated on Gen 3 improvements update - 12/17, the cause of the SOS has been found, and a new Device OS build is being tested now that addresses that and a couple-few other low-level things. I cannot give an estimated release date though.


#9

Hello folks,

Wanted to follow-up with this thread to let everyone know that a fix for the SOS-7 issue has been released with v0.8.0-rc.27. The issues was tracked to a problem with the Nordic 802.15.4 driver. Instructions for upgrading are available below. We’d love to know if applying the release fixes the issue SOS-7 issue that you’re experiencing.


Note that we have seen some reports of change in behavior for rc.27 when users call the Mesh.subscribe() function within the setup() loop that can result in a separate SOS-10 code which we’re currently investigating. If this issue affects you, please note the following workaround.


#10

I’ll give it a shot tonight. I haven’t tried anything but the tinker code thus far.


#11

I didn’t have a chance to run the updates last night as I was helping my SO assemble a gift for her niece so I started them remotely just now from work.

The updates on the Argons went fine and they’re responding as expected. However only two of the Xenons seems to take it and the other two don’t seem happy. Since I can’t create a single network with two gateways yet I’ve made two separate mesh networks. I’ve tried to keep it simple and logical until I can make one network.

Argon_1 connects to Xenon_1a and Xenon_1b in one network
Argon_2 connects to Xenon_2a and Xenon_2b in the other network.


Xenon_1a claims the flash was successful but is still has the slow fading purple icon on the devices screen in the web IDE. All diagnostics in the console pass but it just will not update. This is the one I have been having the SOS issues with so I’ll probably have to remove and re-add it to network so it can update via my phone.

Also I found it slightly humorous that it’s currently reporting -75kb of 32kb RAM used in the console because those numbers seem wrong on two counts.


Xenon_1b accepted the update but i am seeing this in the console now:


Xenon_2a took the update but gives the same error above as Xenon_1b.


Xenon_2b wouldn’t accept the update, claims to be connected in the Web IDE and the console (at least I assume that’s what the slow pulsing blue dot on the pages should mean), but I can’t ping it and all diagnostic attempts in the consoles fail. I won’t be able to look at this until later tonight or tomorrow.


Now that the technical issues have been outlined here are my impressions of the Gen 3 devices thus far:

Overall I can say I am not really happy with how things have been so far. I absolutely was not expecting a pre-release product when these shipped, but I feel like that is what I have been given. Right out of the box I had issues getting them connected. I had to go online and find an older APK file and side-load it onto my phone just to even get started.

I don’t really like the idea of having to use a phone to get these connected to begin with. It is a slow cumbersome process because you have to add them one at a time and had I known this would of been required to deploy them upon release I most likely would have delayed my purchase until the product line had matured. I can’t imagine a company that might need to set up a hundred or so these devices would find this acceptable… and yet that seems to be the true market for these. For makers like myself this isn’t a huge issue, but it is somewhat annoying.

As it stands I haven’t even had a chance to try coding on these because of the various connectivity issues. Even the ones that haven’t given me any obvious trouble (based on the LED codes) since rc.26 have numerous disconnect events in the console logs which is frankly absurd. All of these devices are within 4 feet of each other on my desk which itself is about 10 or so feet from my router. I can connect to this router with my phone from across the street, how is that these Xenons can’t seem to maintain a connect to their Argons that are less than a foot away? Again I haven’t run anything but tinker code on these guys. Any traffic they are passing is purely whatever the mesh network / cloud connection generates.

The only thing I have liked about these devices so far is that they have the feather form factor. That was why I bought them. I liked what I saw from the particle ecosystem when I had it running on a Pi, but at the time the Pi’s library support wasn’t where I needed it to be so I wiped the Pi and used it for other purposes (and it seems development for the Pi version has been ceased).

I understand product launches aren’t easy but for me this has been an absolutely abysmal experience. I have spent zero time developing and most of the time I have spend with these devices has been split between troubleshooting and documenting that data here.


#12

Finally after multiple (at least 6) DFU flashes I seem to have managed to get Xenon_1a out of safe mode and updated so for the first time since I got these devices, this particular unit, which has been the most problematic of the four, seems to be stable.

I just got Xenon_2b reconnected but I had to factory reset it, unclaim it and re-add it. At least the app seems to working properly now and it didn’t take ages to get everything set up like it did the last time.

Still I am hesitant to even start programming these with all the issues they’ve had and the mesh issues mentioned above.