Hi,
There seems to be a lot of library latency when reading data in a loop over SPI.
In the scope, I’m seeing each byte in 0.25us which is correct, however there’s a huge gap between the transmissions which should be a lot shorter given that the only thing that’s happening is (presumably) a couple of register writes.
It’s a bit beyond my scope’s bandwidth, but it’s still clear:
The code is below. I need to have interrupts off here, because the device I’m talking to is very sensitive to timing mismatches (eventually there’ll be other things going on here, but I’m trying to keep it to a minimal example)
nointerrupts();
for(int i=0;i<1024;i++){
myArray[2*i] = SPI.transfer(0);
myArray[2*i+1] = SPI.transfer(0);
}
interrupts();
I’ve tried using DMA, but then there’s an even longer delay (regardless of whether I remove the callback or use an empty function). In this case both bytes are sent with no latency between them (good!), but there’s a huge delay after.
nointerrupts();
for(int i=0;i<1024;i++){
SPI.transfer(NULL, &myArray[2*i], 2, NULL);
}
interrupts();
In the case where I have lots of 1-byte transactions back to back, there is a significant delay between each transaction, but there’s not much total overhead for each iteration. In the DMA case, the 2-byte transaction is as expected, but a massive delay is introduced after the transaction is complete. I wondered if this is because I’m blocking interrupts, but even if I don’t, the synchronous case still takes ages to return. Or is it because the code is setting up DMA every single time when the only thing that needs changing is the rx buffer offset?
How do I get the best of both worlds? It’s not as simple as doing all 1000 odd transactions in one go over DMA, otherwise that would probably be the solution. Ideally I’d like a non-blocking transfer that will let me do a few pin toggles and then I can manually delay until the transaction is complete. Synchronous DMA is fine, but the dead time afterwards is far too long for my application at the moment.
I assume I drop into STM32 library calls to set everything up, but that’s a bit of a faff. Am I right I thinking I can arrange things such that I can initialise SPI + DMA so that each time I perform a transaction it’ll automatically increment the pointer in the rx buffer whilst also calling the reads in 2-byte chunks with a delay in between e.g.
setupDMA(); // all the streams/channels/etc
for(int i=0; i < 1024; i++){
doTransaction(2); // read 2 bytes
delayMicroseconds(1);
}
For comparison, and to show what I’m trying to achieve, I can do something like this in AVR and know that there’s basically no latency at all.
SPDR = 0x00; // Initiate an SPI transaction, should take 0.8 us at f_sys = 20MHz
_delay_us(1); // Could do other stuff here
myArray[2*i] = SPDR;
SPDR = 0x00;
_delay_us(1);
myArray[2*i + 1] = SPDR;
Any suggestions?