@mtnscott, the current SPI implementation is a simple port of the Arduino version for the sake of compatibility. However, the STM32 does support SPI DMA as implemented by @SaratogaDude here:
So there is definitely a possibility of using DMA. The DUE solution writes out the screen on horizontal line at a time, using a much smaller line buffer (640 bytes). From what I see, it is a very nice implementation. I will absolutely look at porting it to the Core since this is a very popular display especially since it is available with a touch overlay. Nice find!