Thanks @jagor for all the interesting hints!
Using DMA with SPI for WS2812 would need 8 bytes per bit, and this times 24 times 240 for one strip would be a lot (too much) of buffer space. I hoped it could be possible to install a transfer complete interrrupt (TCI) routine with higher IRQ priority than the spark core IRQs that take > 50µS, to pre-process data in smaller chunks.
The OctoWS2812 approach is interesting, however only makes sense if one really has multiple LED strips. For a single strip we’d be back at the 8 bytes per WS2812 bit problem.