detachInterrupt() leads to locked up Photon

I seem to have stumbled upon a scenario where you can lock up the Photon’s system loop. What you see when this happens:

  • Cyan LED is solid (not breathing). Photon unresponsive to any subsequent attempt to OTA flash the device. You have to power cycle the Photon. FYI, I’m using the latest device firmware (0.4.7).
  • Main application code (i.e. loop() ) stops executing. I’ve confirmed it doesn’t seem to lock up mid-execution of loop(), it just never gets called again by the system. Unclear if the RTOS has crashed.
  • ISR code runs. I have a timer on TIM5, and it’s still calling my timer ISR on a regular basis and executing it just fine. So at least the interrupt table is working okay.

I isolated the issue down to detachInterrupt(). I’m calling it from inside a timer-driven ISR, with interrupts fully disabled on the Photon at the time I call it. I have a 2nd ISR driven by a radio module, with the IRQ line coming in on D2. It’s originally set up with attachInterrupt(D2, theISRfunc, RISING). I’ll spare you the WHY, but I periodically do spectrum scanning, need to put the radio into RSSI scan mode, which requires changing its mode and re-attaching the radio ISR as attachInterrupt(D2, theISRfunc, CHANGE). When returning to regular radio mode, I need to re-attach the ISR only in RISING mode. So it’s really a “mode” change that I’m doing with the radio ISR here in the relevant code.

I had been using detachInterrupt() on the way to re-attaching. I realize now that it was totally unnecessary to do that, since any subsequent call to attachInterrupt() just replaces the previous ISR mapping. Taking out detachInterrupt() solved the problem, but in my view, using it should NOT have locked up the Photon’s system core.

My best guess on the pathology: Something goes wrong inside detachInterrupt(), which hoses the system. The call does seem to return, since it’s being called inside my timer ISR, which continues running after the Photon is locked up. My guess is detachInterrupt(), and potentially the subsequent call to attachInterrupt(), together put things into a bad state for the D2-linked IRQ channel and it can never recover. Subsequent calls to this code (again from the timer ISR which continues running) never seem to “reset” the radio ISR back into operation. It’s like attachInterrupt() never works again after that, so maybe that’s where things are broken? Interrupts ARE disabled (see below), but I’m wondering if something happens to trigger this, such as the radio module raising/lowering the IRQ line just as detachInterrupt() or attachInterrupt() are executing, causing some kind of mess at a lower level of the system code?

BTW, I did read in the Photon firmware docs re: attachInterrupt() that D2 is shared on EXTI among (D2, A0, A3), and I’m using SPI0. So it’s possible that A3 (SCK) is somehow interfering. The only scenario I can imagine is that the radio module sends an IRQ on D2, RIGHT as detachInterrupt() is executing; so, my radio ISR responds (since detachInterrupt() hasn’t finished un-mapping it), does an SPI transaction to determine why the IRQ happened, so then perhaps detach() and this SPI transaction (which would use A3) collide? But again, my radio ISR also runs with interrupts masked. So … seems like a stretch.

Code where this was happening:

–inside timer ISR code, with all interrupts masked–

uint32_t mask_val = __get_PRIMASK(); __disable_irq();  // Mask interrupts

... (execute a few quick SPI transactions w/the radio) ...
attachInterrupt(D2, theISRfunc, RISING);

if (!mask_val) { __enable_irq(); }  // Restore interrupt state

–exit timer ISR code–

The ultimate fix was to take out detachInterrupt() and this stopped happening.

Issue reproducibility: Anywhere between a few minutes to ~1 hour of code execution will trigger this. It doesn’t happen immediately, so it’s probably somewhat like finding a race condition issue.

Hope this helps @mdma and other Particle firmware folks in potentially looking into this?

I am having very similar issues of locking up of the Photon from anywhere from a few minutes to a couple hours of code execution. I am using the NeoPixel library and I believe it may be caused by the detaching and reattaching of the interrupts in the library which is built in to ensure signal integrity of the one wire comm protocol. I unfortunately cannot remove the interrupt detachment without visual errors on the LED’s.

Anyone have any information on what may be causing this and potential fixes?