I'm still looking for an explanation here.. the latest update on this is that the above pmic code does not seem to be an issue. after testing for almost a week we found that some of the boards without the pmic code would also bug. It seems to just be some kind of latch state in the boot code.
a few days ago we put devices in the field so I had to come up with a workaround fast. The way that I get these devices out of this state is by pulling the battery out and pluging it back in. In the field we can't do that obviously so I use a FET (that i drove all over to find) as a low side switch from another pin on the master mcu. If the boron doesn't respond after a while, I pull power from it for 30 seconds. It's a workaround that 'should' work based on my knowledge of the issue but we didn't actually have time to test it so I'm not sure it works. I'd like to actually figure out whats happening here. I found one other forum member who seems to have the same issue, I'm including it mainly to show that I'm not insane: