SPI on the photon, library latency?

Hi,

There seems to be a lot of library latency when reading data in a loop over SPI.

In the scope, I’m seeing each byte in 0.25us which is correct, however there’s a huge gap between the transmissions which should be a lot shorter given that the only thing that’s happening is (presumably) a couple of register writes.

It’s a bit beyond my scope’s bandwidth, but it’s still clear:

The code is below. I need to have interrupts off here, because the device I’m talking to is very sensitive to timing mismatches (eventually there’ll be other things going on here, but I’m trying to keep it to a minimal example)

nointerrupts();
for(int i=0;i<1024;i++){
    myArray[2*i] = SPI.transfer(0);
    myArray[2*i+1] = SPI.transfer(0);
}
interrupts();

I’ve tried using DMA, but then there’s an even longer delay (regardless of whether I remove the callback or use an empty function). In this case both bytes are sent with no latency between them (good!), but there’s a huge delay after.

nointerrupts();
for(int i=0;i<1024;i++){
    SPI.transfer(NULL, &myArray[2*i], 2, NULL);
}
interrupts();

In the case where I have lots of 1-byte transactions back to back, there is a significant delay between each transaction, but there’s not much total overhead for each iteration. In the DMA case, the 2-byte transaction is as expected, but a massive delay is introduced after the transaction is complete. I wondered if this is because I’m blocking interrupts, but even if I don’t, the synchronous case still takes ages to return. Or is it because the code is setting up DMA every single time when the only thing that needs changing is the rx buffer offset?

How do I get the best of both worlds? It’s not as simple as doing all 1000 odd transactions in one go over DMA, otherwise that would probably be the solution. Ideally I’d like a non-blocking transfer that will let me do a few pin toggles and then I can manually delay until the transaction is complete. Synchronous DMA is fine, but the dead time afterwards is far too long for my application at the moment.

I assume I drop into STM32 library calls to set everything up, but that’s a bit of a faff. Am I right I thinking I can arrange things such that I can initialise SPI + DMA so that each time I perform a transaction it’ll automatically increment the pointer in the rx buffer whilst also calling the reads in 2-byte chunks with a delay in between e.g.

setupDMA(); // all the streams/channels/etc
for(int i=0; i < 1024; i++){
    doTransaction(2); // read 2 bytes
    delayMicroseconds(1);
}

For comparison, and to show what I’m trying to achieve, I can do something like this in AVR and know that there’s basically no latency at all.

SPDR = 0x00; // Initiate an SPI transaction, should take 0.8 us at f_sys = 20MHz
_delay_us(1); // Could do other stuff here
myArray[2*i] = SPDR;

SPDR = 0x00;
_delay_us(1);
myArray[2*i + 1] = SPDR;

Any suggestions?

Ok so it does look like the DMA call is configuring every time:

That’s a bum. So the transfer is done in the config function:

And every time it’s setting everything up. I’m not sure how that adds up to 10us of extra dead time, but there we go.

Were you able to speed up SPI.transfer()? I’m running into the same issue with long delays after each SPI1.transfer

@joshvm @garrett
Some possible necromancing, however a simple solution to your problem could be something like adding a parameter to SPI_State (I added DMA_ALREADY_CONFIGURED). Note that you have to recompile the base firmware to recompile the HAL.

typedef struct SPI_State
{
    SPI_InitTypeDef SPI_InitStructure;

    bool SPI_Bit_Order_Set;
    bool SPI_Data_Mode_Set;
    bool SPI_Clock_Divider_Set;
    bool SPI_Enabled;

    HAL_SPI_DMA_UserCallback SPI_DMA_UserCallback;

    uint16_t SPI_SS_Pin;
    SPI_Mode mode;
    HAL_SPI_Select_UserCallback SPI_Select_UserCallback;
    uint32_t SPI_DMA_Last_Transfer_Length;
    uint32_t SPI_DMA_Current_Transfer_Length;
    volatile uint8_t SPI_SS_State;
    uint8_t SPI_DMA_Configured;
    volatile uint8_t SPI_DMA_Aborted;
    bool DMA_ALREADY_CONFIGURED;

    __attribute__((aligned(4))) uint8_t tempMemoryRx;
    __attribute__((aligned(4))) uint8_t tempMemoryTx;

} SPI_State;

and then controlling the HAL_SPI_DMA_Config structure

static void HAL_SPI_DMA_Config(HAL_SPI_Interface spi, void* tx_buffer, void* rx_buffer, uin\
t32_t length)
{
    DMA_InitTypeDef DMA_InitStructure;
    NVIC_InitTypeDef NVIC_InitStructure;

    if(spiState[spi].DMA_ALREADY_CONFIGURED){
      // Dont reconfigure, just go on                                                         
    }else{
      spiState[spi].DMA_ALREADY_CONFIGURED=true;
      /* Deinitialize DMA Streams */
      DMA_DeInit(spiMap[spi].SPI_TX_DMA_Stream);
      DMA_DeInit(spiMap[spi].SPI_RX_DMA_Stream);

// Cut for brevity

      NVIC_InitStructure.NVIC_IRQChannelCmd = ENABLE;
      NVIC_Init(&NVIC_InitStructure);
    }

    /* Enable the DMA Tx/Rx Stream */
    DMA_Cmd(spiMap[spi].SPI_TX_DMA_Stream, ENABLE);
    DMA_Cmd(spiMap[spi].SPI_RX_DMA_Stream, ENABLE);

    /* Enable the SPI Rx/Tx DMA request */
    SPI_I2S_DMACmd(spiMap[spi].SPI_Peripheral, SPI_I2S_DMAReq_Rx, ENABLE);
    SPI_I2S_DMACmd(spiMap[spi].SPI_Peripheral, SPI_I2S_DMAReq_Tx, ENABLE);

    spiState[spi].SPI_DMA_Current_Transfer_Length = length;
    spiState[spi].SPI_DMA_Last_Transfer_Length = 0;
    spiState[spi].SPI_DMA_Aborted = 0;
    spiState[spi].SPI_DMA_Configured = 1;
}

This removes 8us for me for a total delay of 1-2us from when SPI.transfer is called. I only have one type of the SPI dma interrupt… so I just care if the DMA has been configured once for that SPI resource. If you want more complicated behavior there would be more switching based on the DMA_ALREADY_CONFIGURED flag.

Yeah that looks like one way around it. In the end I seem to remember I avoided the particle layer and wrote it in STM32 ARM (including the DMA), which did the job. It wasn’t particularly onerous, and at the time was a good exercise in learning more about ARM.

If I have the code somewhere, I’ll dig it out and put up a gist.

@Garrett