Photon 2 Slow UART Serial TX Speeds

Hi,

I've been using Serial1 on the Photon at 115200 baud and it has been fine, and moving over to the Photon 2 it has also been fine. However, I tried increasing the baudrate and found that there would be large inter-byte gaps in the transmitted bytes. I had dismissed this as being due to Serial1 using the RTL872x's LP_UART0 operating in interrupt mode, so I tried using Serial2 to make use of the HS_UART0 peripheral with (maybe) DMA, but I am seeing the same issue. Attached is a scope plot of TX and RX at 1 Mbps, along with the large, 48-66 µs gaps between bytes on TX. The RX from another microcontroller shows no inter-byte gaps.

This effectively slows the baudrate ~137 kbps (10 bits of information every 73 µs), which is about where I started with on Serial1. This is also writing via a thread with a priority level 1 below OS_THREAD_PRIORITY_CRITICAL. I'm calling write() with a character buffer and a length Serial2.write(send_buffer, bytes). The D7 signal shows my read and write thread durations.

Any ideas?

How many bytes are you writing before yielding the thread? Since the FreeRTOS thread scheduler uses a 1 millisecond tick, if you are writing a very small number of bytes you can empty the transmit FIFO before the next write.

Conversely, how are you handing FIFO full? If you write blindly, the thread will block if the FIFO is full and you don't have discard when full enabled. This can result in poor performance in other threads, especially since you are writing from a high priority thread which will block all other threads while busy waiting for the FIFO to clear.

Usually around 8 bytes. I don't think the write thread is being preempted, unless it is by a critical thread. That D7 signal stays high during the duration of the bytes being sent out of the FIFO, and the entire duration of the write thread here is 550 µs.

void serialWriteThread(void) {
  int bytes;

  while(true) {
    // Take number of bytes to send from the queue, blocking until there are bytes to send
    if (!os_queue_take(writeQueue, &bytes, CONCURRENT_WAIT_FOREVER, nullptr)) {
        Serial2.write(send_buffer, bytes);
    }
  }
}

We only send a handful of bytes at a time, and wait for a reply before sending any more. FIFO should never get full. It seems as though the write thread is not being interrupted, but it just takes a long time to either load or empty the FIFO, even though the bytes themselves go out on the wire at the right speed.

To better rule out my configuration, I made a new project that only sends a few random bytes on serial, then waits 50 ms.

I see the same issue. Byte spacing is still delayed.
I also tried removing the acquireBuffer() call, but no dice.

When I put the write() in a SINGLE_THREADED_BLOCK() it does actually come down a bit, but still has delays. (In my previous post the write was being called from a very high priority thread, however may have still seen some preemption?)

The interesting thing is when I put everything in an ATOMIC_BLOCK(), because there is still a large delay after the first byte, but then all subsequent bytes are sent immediately after one another.

So clearly interrupts are slowing us down byte to byte, but also we see that large initial delay. Possibly from loading up the FIFO before the transmit? A similar delay was seen in the first example.

I guess my questions from all of this are,

  • Is there a way to reduce that initial delay between the first and second byte?
  • Is ATOMIC_BLOCK() ever really safe to call anywhere, and is it safe with calls to Serial.write()?
  • Given that these delays are clearly caused by interrupts, is there a way to better utilize the hardware UART peripheral in the RTL782x? I'm not intimately familiar with the platform but surely we should be able to tell the hardware to send a few bytes back to back without software intervention? It is as if we're loading one byte at a time in software and greatly restricting the benefit of the hardware itself.

So here's a tl;dr of what I've found:

  • SerialN.write(buf, len) on USARTSerial falls back to Print::write(const uint8_t*, size_t), which iterates byte-by-byte through write(uint8_t).
  • On P2/Photon 2 that causes the UART TX path to be fed one byte at a time, introducing software-created inter-byte gaps and making write(buf, len) take much longer than expected.
  • Adding a native USARTSerial::write(const uint8_t*, size_t) overload that routes to hal_usart_write_buffer() removes those gaps and restores expected buffered UART behavior.

When adding the following overload to spark_wiring_usartserial.cpp/.h (and exporting the symbol in the dynamlib) the issues go away.

spark_wiring_usartserial.h
virtual size_t write(const uint8_t* buffer, size_t size) override;

spark_wiring_usartserial.cpp
size_t USARTSerial::write(const uint8_t* buffer, size_t size)
{
  if (!buffer || !size) {
    return 0;
  }
  // attempt a write if blocking, or for non-blocking if there is room.
  if (_blocking || hal_usart_available_data_for_write(_serial) > 0) {
    ssize_t written = hal_usart_write_buffer(_serial, buffer, size, sizeof(*buffer));
    return written > 0 ? written : 0;
  }
  return 0;
}

hal_dynamlib_usart.h
DYNALIB_FN(BASE_IDX2 + 7, hal_usart, hal_usart_write_buffer, ssize_t(hal_usart_interface_t serial, const void* buffer, size_t size, size_t elementSize))
//May also want to link hal_usart_read_buffer and hal_usart_peek_buffer here

For 8 byte writes and an unmodified v6.4.0, Serial2.write(buffer,size) took ~420 µs to run, and has the inter-byte delays. In the scope shots, D7 is high for the duration of the individual Serial.write() calls.


Using Serial2.write(buffer,size) with an unmodified v6.4.0

For 8 byte writes after patching spark_wiring_usartserial.cpp, Serial2.write(buffer,size) took ~50 µs to run and runs without inter-byte delays.


Using Serial2.write(buffer,size) after adding write(const uint8_t* buffer, size_t size) to spark_wiring_usartserial.cpp

I also looked at the other Serials for 8 and 64 byte writes using SerialN.write(buffer,size),

Serial1 Serial2 Serial3
8 byte write, v6.4.0 (µs) 379 397 367
8 byte write, mod v6.4.0 (µs) 71 27 64
64 byte write, v6.4.0 (µs) 2826 3302 2840
64 byte write, mod v6.4.0 (µs) 80 30 64

64 byte send on Serials 1, 2, and 3 for unmodified v6.4.0

64 byte send on Serials 1, 2, and 3 for modified v6.4.0

So with these changes, all 3 UART interfaces on the P2 are able to transmit without inter-byte delays and no longer waits for the entire send.

@rickkas7 let me know what you think. If this should be rolled into the develop branch I don't mind making a PR but I'm not sure what tests should be added or changed.