Is the Mesh devices UART implementation buggy?

Have you tried decouple the cloud/mesh communication from your application thead via SYSTEM_THREAD(ENABLED)?

The MEGA doesn’t do a lot between iterations of loop() - when connected Particle devices do a lot.

@Jimmie, the Mega is not running a DeviceOS on top of FreeRTOS so things are a lot simpler. The UART issues are very much related to the issue I posted along with other problems which I, along with other Elites, will continue to pester Particle about.

2 Likes

Yes,

SYSTEM_THREAD(ENABLED);

On the first line in my code …

My loop is “local”, and not running in the loop() routine. It is running in setup()

You should try to use this library to get around the "slow" loop speed. @shanevanj mentioned it with a link as well. As others stated, it's an issue of giving each thread a slice of time and the loop is just another thread (so to speak). Even without the library, if you put together a test code that blocks loop() and does nothing else but serial reads, the read rate should be quite high. Blocking loop() is not good coding practice but may validate the serial read rate is high enough to accomplish your goals. The serial buffer library will help you not lose any bytes while time slices are given to other threads. Hopefully it works for your sensor:

I haven't used this library with the Mesh devices but it does work with Photon quite well. It should work with mesh but if it doesn't, rickkas7 may be able to help. With his library, rickkas7 achieved reads at 230400 baud.

1 Like

Thank you @ninjatill.

I did use that library and while it did work in the sense that the code did not crash with a red blinking light, it was slow. This was with an earlier firmware.

Without that library, the current Beta 1.2 firmware does not crash but as mentioned is slow and has read errors. Also, my loop is a blocking loop within setup() to make sure that nothing else is taking the processor’s time …

I will try the library again per your suggestion.

Hi @ninjatill

I tried again @rickkas7 SerialBuffer library with the latest firmware (1.2 Beta). Now, the library causes panic/reset errors. When it works, read frequency was exactly the same as without using it.

This leads me to to think that that 1.2 Beta has improved UART stability but there is still occasional buffer losses, and it is almost 50% slower than a Mega.

Update:

I moved the blocking loop in my code to the loop() routine from setup(). When I did that, I found that I am reading the full sensor update speed. So it looks like the OS is doing something during setup() which takes some processor time even with SYSTEM_THREAD(ENABLED);. I want to thank Particle for the 1.2Beta which is a major improvement.

I am still getting some panic, hard_fault every about 15-25 minutes (sometimes after a couple of hours). There remain issues (apparently due to UART buffer overflow) during setup() where the sensor readings are missing/incorrect. In my application, this is serious because some major operational parameters are re-calculated during each restart. I would appreciate the community’s help for a workaround.

Hopefully these remaining issues will be resolved as 1.2 matures.

Just wanted to give feedback that the bug described in

was fixed by

which is tagged for 1.2.0-rc.1.

5 Likes

I’m checking progress on 1.2.0-rc 1 every day! Cant wait for this fix as well!

I had suspended my development on Xenon since May due to the ongoing problems with fast serial sensors causing a reset.

When I resumed development last week, I was prematurely encouraged by the number of firmware revisions that have come out since May.

However, the problem still exists. I am getting frequent panic/hardware resets on my system which is supposed to keep an accurate count of incidents. So every time a reset occurs, the data is lost rendering the system useless.

Over months, I have isolated the problem to UART sensors with high frequencies. The system works for a while (sometimes a few hours) but then the reset eventually occurs. This problem does not happen with slower sensors (updates < 100Hz)

Since the mesh devices have now been out for as while, I am really interested in the community’s input as to whether this persistent issue is “fixable” or if the memory devoted to the UART stack in mesh devices is so limited as to preclude its use in my application?

BTW, the SAME sensor works fine on an Arduino MEGA. I realize that mesh devices do many more things which occupy firmware space but what is the utility of an advanced micro-controller which cannot keep up with an older and much slower device?

As always - post sample compilable code that reproduces the issue will really enable folks to help you - I use serial on Mesh devices with no issue but there will of course be completely different approaches that only code will reveal!

Thank you @shanevanj. I am also using several UART devices with Mesh. The problem is with faster sensors.

These are lidar sensors. I am also using i2c lidar sensors with the same code base (only difference is in the distance acquisition code below) and there have been no problems since December 2018 (running 24/7 with a 24-hr daily reset).

Here is the code on one of those sensors. The code is executed in the loop().

void readLW20()
{
    if (lw20GetStreamResponse())
    {
      float distanceM = getNumberFromResponse(responseData);
      
      distanceCM = distanceM * 100;
      distIN = distanceM * 100 * 0.394;
      
      distance = distanceCM;
      distanceP = distance;
    }
    else
    {
      Serial.println("No streamed data within timeout");
    }
    delay(2);
}

bool lw20GetStreamResponse() {
  // Only have 1 second timeout.
  unsigned long timeoutTime = millis() + 1000;

  responseDataSize = 0;

  // Wait for the full response.
  while (millis() < timeoutTime) {
    if (Serial1.available()) {
      int c = Serial1.read();
      if (c == '\n') {
        responseData[responseDataSize] = 0;
        return true;
      }
      else if (c != '\r') {
        if (responseDataSize == sizeof(responseData) - 1) {
          responseDataSize = 0;
        }

        responseData[responseDataSize++] = (char)c;
      }
    }
  }

  return false;
}


float getNumberFromResponse(char* ResponseStr) {
  // Find the ':' character.
  int index = 0;

  while (true) {
    if (ResponseStr[index] == 0)
      break;

    if (ResponseStr[index] == ':') {
      return atof(ResponseStr + index + 1);
    }

    ++index;
  }

  return 0.0f;
}

OK, I have a test rig setup for this kind of testing - send me a(few) message stream(s) you would typically get over the serial1 port and I will use hat an do some test and let you know what rates I get to

Thank you for your time. This is a really kind offer and certainly beyond the call…

The sensor is located a long distance away and I only have one sensor as I did not want to buy more until I have solved these problems. It is this sensor:

The manual is here:

Okay - nice device !

What commands are you sending to setup the streaming mode?

Hello Shane,

This device is a bit complicated! In the default operational mode, one needs to issue a command each time a measurement is needed.

When I wrote to technical support a few months ago, they sent me a rather “involved” way of changing the sensor to streaming mode. I will try to dig up the instructions as it has been a while.

From your code I gather each “packet” is terminated with "\r\n" (or is it \n\r which would be less common).
With that info, I’d rewrite your lw20GetStreamResponse() in a way to sync for the trailing byte via a tight loop and then use Serial1.readBytesUntil().

BTW, timeoutTime = millis() + 1000; isn’t the best way to go into your timeout check.
The better way would be

const unsigned long READ_TIMEOUT = 1000;
...
  unsigned long timeoutTime = millis();
  ...
  while (millis() - timeoutTime < READ_TIMEOUT) {
    ...
  }

But with Serial1.readBytesUntil() the timeout comes free.

Can you confirm you are using 1.4.2 or 1.4.3?
What is your maximum packet size (can’t guess that from your code snippet)?

getNumberFromResponse() should take a const char*.
Instead of while(true) ... break I’d rather use for (int i=0; i < strlen(ResponseStr); i++) or even better don’t do that yourself and let strchr() do that for you

float getNumberFromResponse(const char* ResponseStr) {
  const char* value = strchr(ResponseStr, ':');
  if (value) return atof(value+1);
  return NAN;
}

Thank you very much @ScruffR for the detailed information. I will test it and will also ask about the packet size.

I am using 1.4.2 (was not aware that 1.4.3) is out. Did 1.4.3 have any changes to serial interfacing?

Nope, that was virtually a single-bug fix release.

Shouldn't impact mesh devices at all.

Hello @ScruffR:

I just heard from the sensor’s technical support. They agree that your suggestion for timeoutTime is a good one as “… it takes care of a nasty overflow bug”. They were not sure why you had recommended the remaining changes but I will try them.

For your questions:

1. What is the maximum packet size from the sensor?

40 bytes.

2. Is each packet terminated with "\r\n" (or is it \n\r)?

Packets are terminated with ‘\r\n’.

Thanks again for your time and help.