HardFault in FreeRTOS/uxListRemove

photon
Tags: #<Tag:0x00007fe22be1dc40>

#1

I’m working with a customer that is seeing spurious HardFault crashes in the field, running on device-os version 1.2.1. Since I can’t hook up a debugger I hacked the HardFault handler to save the debug info and log it on reboot. Here’s what I got back:

HardFault Info:
r0  = 08002d0d
r1  = b2685ad1
r2  = 066f01f1
r3  = 5842583b
r12 = a5a5a5a5
lr  = 08084f85
pc  = 080841ce
psr = 21000000

I’m not sure what to make of r0-r3, and r12 but lr, pc and psr look reasonable. The pc corresponds to:

080841c4 <uxListRemove>:
 80841c4:	6841      	ldr	r1, [r0, #4]
 80841c6:	6882      	ldr	r2, [r0, #8]
 80841c8:	6903      	ldr	r3, [r0, #16]
 80841ca:	608a      	str	r2, [r1, #8]
 80841cc:	6882      	ldr	r2, [r0, #8]
 80841ce:	6051      	str	r1, [r2, #4]

and the link register corresponds to:

08084f74 <xTaskRemoveFromEventList>:
 8084f74:	b538      	push	{r3, r4, r5, lr}
 8084f76:	68c3      	ldr	r3, [r0, #12]
 8084f78:	68dc      	ldr	r4, [r3, #12]
 8084f7a:	f104 0518 	add.w	r5, r4, #24
 8084f7e:	4628      	mov	r0, r5
 8084f80:	f7ff f920 	bl	80841c4 <uxListRemove>
 8084f84:	4b10      	ldr	r3, [pc, #64]	; (8084fc8 <xTaskRemoveFromEventList+0x54>)

These routines are very low level and called from all over the place - here is one call chain (inverted, callers indented more):

080841c4 <uxListRemove>:
    08084f74 <xTaskRemoveFromEventList>:
        08084276 <prvUnlockQueue>:
            0808438c <xQueueGenericSend>:
                080766c8 <os_queue_put>:
                08076770 <os_mutex_unlock>:
                0807680e <os_semaphore_give>:
                08084468 <xQueueCreateMutex>:
                080844b2 <xQueueGiveMutexRecursive>:
                080852ec <xTimerGenericCommand>:
                0808d748 <host_rtos_set_semaphore>:
                0808d7dc <host_rtos_push_to_queue>:
                0808d978 <wiced_rtos_unlock_mutex>:
                080929dc <sys_mbox_post>:
                08092a52 <sys_sem_new>:

Googling “HardFault uxListRemove” takes me to some threads discussing synchronization bugs. Not sure I can take thus much further.

Has anyone else seen bumped into this? Our current thinking is something spurious that’s tripping up the WICED driver but realistically, the root cause could be anything.

I also searched the various versions of FreeRTOS and it looks like there are lots of bug fixes and improvements in more recent versions but it also looks like the version used for the Photon is tied to the WICED driver and many years old (version 8.2.1 in hal/src/photon/lib vs 10.2.1 released recently in third_party to level dir).

Any plans to sync with the latest FreeRTOS for photon?