Is the Mesh devices UART implementation buggy?

I have spent hours trying to find out why my Xenon stops working (locks up) when connected to some serials sensors. It works fine for about 20 minutes then suddenly locks up.

I have isolated the culprit to this statement:

while (Serial1.available()) Serial1.read();

I would appreciate the community’s input and their experience for a workaround.

Apart from the -260 bug I already confirmed and filed an issue for there also is another quirk I have started discussing with Particle a while ago (3/12/2019).
It revolves around the way how the circular RX buffer is handled and consequently who USARTSerial::available() calculates the distance between head and tail of it.

@avtolstoy and @rickkas7 are aware of the “issue”, but no public issue has been filed yet AFAICT.

As a temporary workaround, I’d rewrite your code above in a similar way to what we already discussed when we were talking about the -260 bug you found.

  while(Serial1.read() >= 0); // flush all available RX bytes

BTW, what baudrate are you running that with?
Is your client device permanently sending or only on demand?

I have just built an app using serial and serial1 and initially had issues like this.

I now use the serialEvent() structure to get the data in and then inside my serialEvent1() function…

  while (Serial1.available()) {
        char inByte = Serial1.read();

Initially I had lockups as well but then found I had used serial1.begin(9600, SERIAL_FLOW_CONTROL_RTS_CTS) in error. Changing that resolved my issue. I have an external serial device sending data every 2s (with a checksum) and after a week on testing have no issues so far

2 Likes

Thank you @Scruffr.

I tried:

while(Serial1.read() >= 0); // flush all available RX bytes

While I did not get a freeze, I did get red flashes and the Xenon restarted.

Thank you @shanevanj.

Are you saying I need to use?

Serial1.begin(9600, SERIAL_FLOW_CONTROL_RTS_CTS)

I still think there is a UART implementation error. You are not seeing it because my sensor updates at 500Hz which would make it 1000 times faster than the one you are using.

No don’t use Serial1.begin(9600, SERIAL_FLOW_CONTROL_RTS_CTS) - it will definitely cause an issue.

If you are at 500Hz - how big are the messages in bytes - if you multiply it out - do you have enough time to empty the buffer ?

The messages are rather short (about 10 bytes). There is also no problem on Arduino Mega’s which is lesser of a processor …

I have tried reducing the baud rate which helped but did not eliminate the problem.

I also found out that even though the device is frozen, it continues to give the impression that it is reachable by the cloud.

Asking for Particle variables returns values even though the processor is frozen. In an Iot device, this should not happen.

In my case, the only way to tell the processor is frozen is to execute a Particle function which never returns indicating a problem.

Okay - I will crank my message rate up and see where it breaks … it broke at 50Hz (20ms intervals of 40 bytes at 9600bps) - not great - I had red SOS of doom flashing briefly and back to breathing cyan. So it means the serial subsystem is broken, since my other (non-serial based) functions are working.

So I would tend to agree that there is a UART issue - maybe related to the -260 bug ?

1 Like

That doesn't sound like it could be caused by that line of code.

That was the only change I made in the code.

@shanevanj just reported a similar behavior when increasing his baud rate.

Doesn't necessarily mean that was the cause of the SOS. If your previous code was blocking at that line and nevere reached the actually offending code unblocking that one line would explain that behaviour just the same.

How would you know the processor is frozen?
With an multithreaded device OS your application thread may be blocked (not frozen), but other threads may still do their job as expected.

Hence this assertion is ignoring the actual facts

I found that the rest of my code did not freeze as I was using the serialEvent1() system function - this means that the UART subsystem was no longer calling my function so the rest of my code carried on and did not block on a while (serial1.available()) call.

With an multithreaded device OS your application thread may be blocked (not frozen ), but other threads may still do their job as expected.

You are correct. That may explain why Particle functions are not returning.

However, I am not finding where my code could be blocking ....

@shanevanj, it is important to note that serialEvent1() is not "called" by the UART subsystem as in "interrupt driven". Instead, it is called by the DeviceOS at the end of each loop(). From the docs:

The serialEvent functions are called in between calls to the application loop() . This means that if loop() runs for a long time due to delay() calls or other blocking calls the serial buffer might become full between subsequent calls to serialEvent and serial characters might be lost. Avoid long delay() calls in your application if using serialEvent .

Since serialEvent functions are an extension of the application loop, it is ok to call any functions that you would also call from loop() . Because of this, there is little advantage to using serial events over just reading serial from loop().

Understanding this behaviour can impact how you use it and how you design your loop() code to be non-blocking.

1 Like

If you are using Serial1.available() or Serial1.read() then I think the underlying UART functions are blocking due to buffer overflows or -260 issue previously mentioned and hence your code freezes.

My code does not use these to detect serial data is waiting to be read, but instead relies on the system function to call serialEvent1() - so my theory is that once the underlying UART functions are blocked due to -260 or overflow (or something) - then the system never calls my serialEvent1() function so my code doesn’t freeze - just the serial data is never received.

Makes sense and explains what we are seeing.

I do not know why though we are both getting restarts (after red flashes) …

I have the following in my loop() and no delay statements in any sections of my code.

//------------------------------------------------------------------
void loop()
//------------------------------------------------------------------
{
    wd.checkin(); // resets the AWDT count
    
    checkTimers(); //
    
    ser1Receive(); // does panel want to send a message?
    
}

ser1Receive() checks if an I/O pin is HIGH or LOW and sets a boolean flag accordingly.

checkTimers() is the millisecond polledTimers library and has no interrupts or delays in use.

Based on this I really cant see anything causing this lockup of receiving data - at worst I would expect to see dropped characters?

We seem to be talking past eachother.
I didn't say that your while (Serial1.available()) Serial1.read(); but I also didn't say that that blocking issue must be the only issue in your code.
While my suggested alternative while(Serial1.read() >= 0); should solve the blocking issue it can't be expected to solve any other issue still present and causing your red flashes.

You were clear and I understood you correctly. I meant to say that two very different programs (mine & @shanevanj) are getting red flashes.

So in a general way, yes, that can be caused by different reasons, it is just an unlikely coincidence ....