Is the Mesh devices UART implementation buggy?


#22

So in order to pin it down - I created (copied) this

void setup() {
  Serial.begin(9600);
  Serial1.begin(9600);
  pinMode(D19, OUTPUT);
  digitalWrite(D19, HIGH);
}

void loop() {
  if (Serial.available()) {      // If anything comes in Serial (USB),
    Serial1.write(Serial.read());   // read it and send it out Serial1 (pins 0 & 1)
  }

  if (Serial1.available()) {     // If anything comes in Serial1 (pins 0 & 1)
    Serial.write(Serial1.read());   // read it and send it out Serial (USB)
  }
}

The device sending can only do messages at around 250Hz.

At 10 bytes per message - no problems. Even when I send “TheQuickBrownFoxJumpedOverTheLazyDog” it’ still okay and doesn’t crash.

So this kinda shows that at a simple level - it all works .

@Jimmie perhaps you can run this simple code above and connect to your sensor and see what you get?


#23

Does SYSTEM_THREAD(ENABLED) make any difference?


#24

May I suggest a slight modification?

void loop() {
  while (Serial.available() > 0) {      // If anything comes in Serial (USB),
    Serial1.write(Serial.read());   // read it and send it out Serial1 (pins 0 & 1)
  }

  while (Serial1.available() > 0) {     // If anything comes in Serial1 (pins 0 & 1)
    Serial.write(Serial1.read());   // read it and send it out Serial (USB)
  }
}

#25

Agreed - I copied that code from the Arduino IDE examples just as a quick 'n dirty test :smiley:


#26

This is interesting - I looked at Task Switching and see that the Device OS switches every 1ms in non threaded mode - so if there is any application code that adds a further 1ms - then @Jimmy code will struggle with a 500hz (2ms gap) serial stream potentially ?


#27

In a quick, bare-bones test I did a few weeks ago, the loop() ran about 5-7 times faster with SYSTEM_THREAD(ENABLED). As I said, that was just a super-simple test case, not real-world code that was trying to do other things. But it’s definitely something to keep in mind when speed is essential.


#28

Mesh devices have a 128 byte buffer so if you can ensure to read up to 127 bytes while the application thread is active you should be good.

Task switching would take place in 1ms time slices but when a thread has no desire to use that 1ms slice it will (should) yield and the next thread in the row will gain access to the core.
e.g. in SYSTEM_THREAD(ENABLED) a call of delay() will immediately surrender control over the core (yield) and hence it may take longer for a delay(1) to regain control and finish, but at the same time currently active threads are serviced more quickly.

@shanevanj, in addition to the above, when you see cycle time of 1ms in non-threaded mode you should not add the 1ms time slices to that time since the 1ms between iterations of loop() in non-threaded mode are mainly caused by the cloud tasks. Shifting these to an independent system thread will remove that extra time between iterations not add ontop.


#29

I will test with System Thread.

There is a very high probability that the problem has to do with the cloud functions. The reason is because a few days ago, with @ScruffR’s help, I ran the same code overnight without problems.

In checking that code, I realized I had

System Mode (manual).

So I will try again and report.

Thanks.


Debugging a locked up Boron
#30

Cool - thanks for the detail around thread execution - even though the documentation is great, I missed that.


#31

Just saw this Serial buffer for electron discussed in another thead that may be a way to deal with your sensor data issue?


#32

Thank you very much @shanevanj, definitely worth investigating.

Update:

Unfortunately, after using library, Xenon still restarting after a couple of hours (not locking up). Same behavior before using Buffer library but this time I could not trap the error. Before, I could trap an error from the Serial1 port before it leads to a restart.

Will try increasing buffer size from 4096 and see if this helps.

The code changes recommended by @ScruffR were a big improvement since the Xenon restarts after a red flash (rather than a lock up). This makes the system useable until the firmware is fixed.


#33

Update 2:

Increasing the buffer to 8192 made for a stable connection. I left it overnight and it is still working (never lasted that long before).

The strange thing is the same code does not run on a slower sensor. On a slower sensor (< 100Hz), the serial readout is stable without the need for using the SerialBuffer approach by @rickkas7 . So it appears there are two issues:

  1. -260 issue reported earleir.

  2. Serial Buffer Overflow on faster sensors.

Here is the code:

void readDist()
{
  if (Serial1Buf.available()) {  //check if serial port has data input
    if (Serial1Buf.read() == HEADER2) {  
      uart[0] = HEADER2;
      if (Serial1Buf.read() == HEADER2) {  
        uart[1] = HEADER2;
        for (i = 2; i < 9; i++) { //save data in array
          uart[i] = Serial1Buf.read();
        }

        myCS = uart[0] + uart[1] + uart[2] + uart[3] + uart[4] + uart[5] + uart[6] + uart[7];
        if (uart[8] == (myCS & 0xff)) { //verify the received data as per protocol

          distance = uart[2] + uart[3] * 256;      
        }
        else
        {
          distance = 0;
        }
      }
    }
  }
}

#34

I also have big problems with the reliability of the UART implementation on the mesh devices. I read about these two bugs:

  • Serial.available() can also return negative values if no data is available, should return 0
  • Serial.read() can return other negative values than -1 if no data is available

If this would be everything, working around is easy. But I think there are more and stranger problems. After searching and testing a lot, I made this small program running on an Argon (0.9.0) with Rx and Tx pins connected:

void setup() {
    Serial1.begin(115200);
    Log.info("ok");
    pinMode(D7, OUTPUT);
}

void loop() {
    Serial1.print("abc");
    delay(10);
    digitalWrite(D7, HIGH);
    
    int ava = 0;
    int rd = 0;
    
    ava = Serial1.available();
    rd = Serial1.read();
    if(ava != 3) Log.warn("1a %i", ava);
    if(rd != 'a') Log.warn("1b %i", rd);
    
    ava = Serial1.available();
    rd = Serial1.read();
    if(ava != 2) Log.warn("2a %i", ava);
    if(rd != 'b') Log.warn("2b %i", rd);
    
    ava = Serial1.available();
    rd = Serial1.read();
    if(ava != 1) Log.warn("3a %i", ava);
    if(rd != 'c') Log.warn("3b %i", rd);
    
    ava = Serial1.available();
    rd = Serial1.read();
    if(ava > 0) Log.warn("4a %i", ava);
    if(rd >= 0) Log.warn("4b %i", rd);
    digitalWrite(D7, LOW);
}

If everything is fine, there should be no Log outputs. But there are Log outputs:

0000009913 [app] WARN: 1a 2
0000009914 [app] WARN: 2a 1
0000009968 [app] WARN: 1a 2
0000009968 [app] WARN: 2a 1
0000010073 [app] WARN: 1a 1
0000010127 [app] WARN: 1a 1
0000010226 [app] WARN: 1a 2
0000010226 [app] WARN: 2a 1
0000010275 [app] WARN: 1a 2
0000010276 [app] WARN: 2a 1

This goes on like that forever.
Please have a look at the timestamps in the Logs. The delay(10) causes the loop to be executed about 100 times a second. The warnings appear much rarer. Warning 2a is always after warning 1a, but 1a is not always followed by 2a. 3a never appears.
Changing the loop speed by adjusting the delay changes the error frequency accordingly.

What is going on here? I can not explain this with the known bugs of read() and available(). These problems are causing a lot of strange effects in my production firmware.


#35

Thank you @nils. I am glad I am not imagining things.

Indeed, as you mentioned there is a lot more wrong with the UART implementation in the firmware. I do not have the knowledge to debug it but successive “panic” failures in my programs seem to lead primarily to serial buffer overflow.

I do not know this for sure but this is my best guess.

I hope Particle will fix this serious problem as the platform is useless if it cannot interface reliably to serial sensors.


#36

Hey folks! Just spent some time catching up on this thread. Reminder that we are reachable via ping if you’d like to call our attention to an issue.

@avtolstoy, who has the most context on this issue, is presently out of office, but I will connect with him when he returns to better understand what might be a root cause and suggest a timeline for the delivery of a fix.


#37

Hi, did you make any progress on this topic by now? Is there already a timeline?
The bug prevents reliable use of the UART, while not having a fix at least some kind of workaround would be important. Unfortunately I have no idea how to work around in this case.

Thank you.


#38

Regrettably, I do not think that any progress has been made. I am using the 1.2.0-beta.1 release and the same problems persist.

Frankly this is concerning and I hope that Particle is not running against any platform limitations as it is taking too long to fix a very serious problem.

One questions the value of a superior communications architecture (which Particle certainly has) when there are problems interfacing to fast serial sensors.


#39

@nils, @Jimmie, it looks like there is a possible fix tagged for 1.2.0-beta.3:

There are also this issue being discussed:


#40

Thank you @peekay123 but the problem is more serious than this.

I am testing a loop with identical code on both a Xenon and Arduino MEGA. Serial Sensor is running at 57,600 baud and an update frequency of 194Hz.

Here is a summary:

  1. 20% erroneous readings on Xenon. Of 1000 readings, about 202 had no read. On MEGA, not a single read was missed.
  2. Read frequency on Mega (194, i.e. full). On Xenon, ready frequency is 80Hz.

So as you can see, something much more problematic is going on …


#41

Have you tried decouple the cloud/mesh communication from your application thead via SYSTEM_THREAD(ENABLED)?

The MEGA doesn’t do a lot between iterations of loop() - when connected Particle devices do a lot.