Serial RX bytes dropped on Muon and Photon2 or any RTL872x device

Looping back to a topic that got automatically closed on me:


At the time, it was thought that Device OS 6.3.1 would improve things, however that is not the case. We are still seeing bytes routinely being “discarded” during long packet transfers on the UART peripheral.

The only workaround we have found thus far is to reduce the probability of the Particle device OS thread being blocked from servicing the UART RX DMA interrupt by reducing the length of individual packets being sent over our network.

This is a pain and a lot of technical debt is accumulating in our apps as a result.

I asked Claude to analyze the situation and it came up with this report:

https://claude.ai/share/699f7ead-b0f5-4b16-905d-9e201e3afb6a

From that report:

The Critical DMA Limitation

There's a significant hardware quirk: the RTL872x DMA peripheral won't transfer data from its internal FIFO into the destination buffer if the FIFO contains less than 4 bytes, regardless of burst size or FIFO settings. GitHub This is a known issue that has required workarounds in the firmware - when fewer than 4 bytes remain, the driver must force-flush the data by suspending the DMA channel. GitHub

Buffer Architecture

The RTL872x UART implementation has multiple buffering stages:

  1. UART Hardware FIFO - The exact size isn't explicitly documented in what I found, but based on typical implementations it's likely 16-64 bytes

  2. DMA Internal FIFO - Holds data transferred from UART FIFO

  3. Software Ring Buffer - The default size is 64 bytes for older devices, though Gen 3 devices use 128 bytes. Starting with Device OS 3.2.0, you can increase this using acquireSerial1Buffer(). ParticleParticle

Your Problem at 230400 Baud

At 230400 baud, you're receiving approximately 23,040 bytes per second (assuming 10 bits per byte with start/stop bits). This means:

  • One byte every ~43 microseconds

  • A 64-byte buffer fills in ~2.7 milliseconds

When multiple threads are running, higher-priority interrupts can delay the UART RX DMA ISR, causing bytes to be lost, especially at higher baud rates. GitHubGitHub With only a 64-byte buffer, there's very little margin for error before losing data at 230400 baud, making buffering essentially a necessity. Particle

How big is your Device OS serial buffer?

And are you reading the UART serial buffer from its own dedicated thread, or part of loop or some other thread?

Using hal_usart_buffer_config_t acquireSerial1Buffer()function, we allocating approximately 22kB RX buffer and 1200 byte TX buffer (see snippet below).

This should not matter though in terms of the underlying issue of Particle Device OS not unloading the DMA buffer fast enough to prevent overwrite (bytes getting dropped), right?

hal_usart_buffer_config_t acquireSerial1Buffer()
{
	// #if !HAL_PLATFORM_USART_9BIT_SUPPORTED
	#if (PLATFORM_ID == PLATFORM_MSOM)
	const size_t rxBufSize = MAX_LINKNET_PACKET_SIZE_BYTES_TO_HOST; //Give Particle OS enough of an RX buffer to hold the largest possible LinkNet packet
    const size_t txBufSize = MAX_LINKAPP_LINKNET_TX_SIZE;
	#else
    const size_t rxBufSize = 2048;
    const size_t txBufSize = 129;
	#endif
	// #else
    // // If 9-bit mode is supported (e.g. on Gen 2 platforms)
    // // and it's planned to use this mode, it's necessary to allocate
    // // 2x the number of bytes.
    // const size_t rxBufSize = MAX_LINKNET_PACKET_SIZE_BYTES * sizeof(uint16_t);
	// #endif // HAL_PLATFORM_USART_9BIT_SUPPORTED
    hal_usart_buffer_config_t config = {
        .size = sizeof(hal_usart_buffer_config_t),
        .rx_buffer = new (std::nothrow) uint8_t[rxBufSize],
        .rx_buffer_size = rxBufSize,
        .tx_buffer = new (std::nothrow) uint8_t[txBufSize],
        .tx_buffer_size = txBufSize
    };
    return config;
}

where:

#define MAX_LINKNET_PACKET_SIZE_BYTES_TO_HOST (sizeof(linknet_packet_header_simple_t) + SIMPLENET_CRC_SZ + 1024*22)	//22 KB plus overhead
#define MAX_LINKAPP_LINKNET_TX_SIZE ((BOSS_PAGE_SIZE_BYTES * 2U) + sizeof(linknet_packet_header_simple_t)+ 4 + 4 + 128) //1168 bytes - down from 1024*14 = 14336 bytes  - saved 13168 bytes

We are reading the UART from the application thread, but there is not other stuff going on during this time (it is a blocking call). Here is that code snippet:

	do{
		if(
			millis() - start_ms > timeout 
			// || ( millis() - last_byte_ms > 100 && last_byte_ms )
		){
			break;
		}
		//Serial reception on Particle, best practices: https://community.particle.io/t/particle-serial-interrupt/43342
		while(Serial1.available()){
			link_snet.rx_buf.data[link_snet.rx_buf.sz++] = Serial1.read();
			if(link_snet.rx_buf.sz > link_snet.rx_buf_sz_max){
				myLog.error("RX buffer overflow!  Received %u bytes - max is %u", link_snet.rx_buf.sz, link_snet.rx_buf_sz_max);
				#ifdef VERBOSE_SIMPLENET_RX
				dump_link_rx_buf();
				#endif
				return LINKNET_ERR_RX_BUF_OVERFLOW;
			}
		};
		//Check if the packet size info is available to be decoded
		if(
			!link_snet.rx_expected_sz
			&& link_snet.rx_buf.sz >= sizeof(simplenet_header_t)
		){
			link_snet.rx_expected_sz = header->sz;
			branch_checkpoint(BCHECK_LINKNET_RX_HEADER_RX_DONE);
		}
		//Check if we have receive all the bytes expected
		if(link_snet.rx_buf.sz && link_snet.rx_expected_sz == link_snet.rx_buf.sz){
			link_snet.rx_pending = true;
			#ifdef VERBOSE_SIMPLENET_RX
			dump_link_rx_buf();
			#endif
			branch_checkpoint(BCHECK_LINKNET_RX_PACKET_DONE);
			//Check the CRC
			if(!simplenet_rx_packet(&link_snet)){
				myLog.error("CRC check failed!");
				return LINKNET_ERR_BAD_CRC_RESPONSE;
			}
			branch_checkpoint(BCHECK_LINKNET_RX_CRC_OK);
			#ifdef VERBOSE_LINKNET_DEBUG_HEAVY
			myLog.info("Received %s response", linknet_rsp_name(header->rsp));
			#endif
			sram.last_linknet_rsp = header->rsp;
			return header->rsp;
		}
	}while(1);

We are using one custom dedicated thread in our app - the one that is allocated to the BackgroundPublishRK process:

void BackgroundPublishRK::start()
{
    if(!thread)
    {
        os_mutex_create(&mutex);

        // use OS_THREAD_PRIORITY_DEFAULT so that application, system, and
        // background publish thread will all run at the same priority and
        // be able to preempt each other
        thread = new Thread("BackgroundPublishRK",
            [this]() { thread_f(); },
            OS_THREAD_PRIORITY_DEFAULT);
    }
}

There are no places in our code base where we use SINGLE_THREADED_BLOCK() or anything like that.

I do notice that in the BackgroundPublishRK.cpp file that background publishing is only delaying by 1 time slice when it is waiting. Would increasing the delay(1) calls to delay(100)or some such be an advisable way to get around these issues, or would that cause other problems of its own?

void BackgroundPublishRK::thread_f()
{
    while(true)
    {
        while(state == BACKGROUND_PUBLISH_IDLE)
        {
            // yield to rest of system while we wait
            // a condition variable would be ideal but doesn't look like
            // std::condition_variable is supported
            delay(1);
        }

        if(state == BACKGROUND_PUBLISH_STOP)
        {
            return;
        }

        // temporarily acquire the lock
        // this allows a calling thread to block the publish thread if it needs
        // additional synchronization around a publish request and acts as a
        // memory barrier around publish arguments to ensure all updates
        // are complete
        lock();
        unlock();

        // kick off the publish
        // WITH_ACK does not work as expected from a background thread
        // use the Future<bool> object directly as its default wait
        // (used by WITH_ACK) short-circuits when not called from the
        // main application thread
        #if SYSTEM_VERSION >= SYSTEM_VERSION_DEFAULT(6,2,0) && defined(FEATURE_BIGGER_CBOR_VARIANT_WEBHOOKS)
        EventData local_ev = Variant::fromJSON(event_data);
        auto ok = Particle.publish(event_name, local_ev, event_flags);
        #else
        auto ok = Particle.publish(event_name, event_data, event_flags);
        #endif

        // then wait for publish to complete
        while(!ok.isDone() && state != BACKGROUND_PUBLISH_STOP)
        {
            // yield to rest of system while we wait
            delay(1);
        }

        if(completed_cb)
        {
            completed_cb(ok.isSucceeded(),
                event_name,
                event_data,
                event_context);
        }

        WITH_LOCK(*this)
        {
            if(state == BACKGROUND_PUBLISH_STOP)
            {
                return;
            }
            event_context = NULL;
            completed_cb = NULL;
            state = BACKGROUND_PUBLISH_IDLE;
        }
    }
}

By the way, Serial1 on both M-SoM and Photon 2 does not use DMA, it uses interrupt mode.

DMA mode is used on the cellular modem UART on M-SoM and Serial2 on Photon 2.

I was able to solve the issue by:

  1. increasing the yield delay() calls in the BackgroundPublishRK library from 1 ms to 100 ms –> delay(100)
  2. I refactored my app’s code where it waits for serial response to have a doubled nested while(Serial1.available()) loop with delay(1) between loop evaluations. This ensures that when my application’s thread finishes unloading from Serial1, it waits for more serial bytes to come in (defers to Particle Device OS thread) before moving on to higher-order packet processing chores. This effectively makes my application thread just unload the Serial1 class via read() calls, with deference to the Particle Device OS thread in between.

The net result is that the Particle Device OS thread now likely runs the vast majority of the time that we are waiting for bytes, (likely > 75% of the time?) whereas before it was only running 1/3 of the time (sharing runtime equally with the other two threads).

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.