Red Flash SOS when SYSTEM_MODE is automatic?

I have an extremely simple app reading data from an RPlidar module. I honestly assumed the crash was caused by their library.

I believe i’m seeing the SOS + 1 error (Hard Fault) on a Particle Argon. It happens after about 30-60 seconds of uptime.

When I set SYSTEM_MODE(MANUAL) and DO NOT INITIALIZE Particle cloud in any way ( disable Particle.connect(), etc. ) the crash does not happen. Is this a known possible issue? I have also tried SYSTEM_THREAD(ENABLED), it doesn’t improve the situation.

Without seeing the code it’s difficult to guess :wink:
In particular, which library are you using?

Sure thing, there is the library I am using:

Here is the code.

During my testing 2 things will fix the issue:

  • enabling SYSTEM_MODE(MANUAL);
  • or removing the RPLIDAR code

Its strange to me that it’s a one or the other but not both situation. I thought that something could be eating up all the memory, but freememory() is fine, in fact it only reduces over time. Any thoughts?

#include <RPLidar.h>

//SYSTEM_MODE(MANUAL);

byte LED = D0;
byte RPLIDAR_MOTOR = A3;
RPLidar lidar;
int points = 0;
uint32_t next_second = millis() + 1000;

void setup() {
  Serial.begin();
  pinMode(LED, OUTPUT);
  pinMode(RPLIDAR_MOTOR, OUTPUT);
  //settle
  delay(3000);
  lidar.begin(Serial1);
  digitalWrite(LED, HIGH);
}

void rplidar_loop()
{
  if (IS_OK(lidar.waitPoint())) {
    //float distance = lidar.getCurrentPoint().distance; //distance value in mm unit
    //float angle    = lidar.getCurrentPoint().angle; //anglue value in degree
    bool  startBit = lidar.getCurrentPoint().startBit; //whether this point is belong to a new scan
    //byte  quality  = lidar.getCurrentPoint().quality; //quality of the current measurement
    points += 1;

    if (startBit) {
      Serial.print("new frame\n");
    }

    if (millis() > next_second) {
      Serial.printf("%d\n", points);
      points = 0;
      next_second = millis() + 1000;
    }
    
  } else {

    // reconnect
    analogWrite(RPLIDAR_MOTOR, 0); //stop the rplidar motor
    
    // try to detect RPLIDAR... 
    rplidar_response_device_info_t info;
    if (IS_OK(lidar.getDeviceInfo(info, 100))) {
       // detected...
       lidar.startScan();
       // start motor rotating at max allowed speed
       analogWrite(RPLIDAR_MOTOR, 255);
       delay(1000);
    }

  }
}

void loop() {
  rplidar_loop();
}

System.freeMemory() is only half the story. You may have enough memory but your heap may be fragmented (no offering a big enough contiguous block of data or you might exhaust the stack size.

But both of these conditions would not cause a hard fault but have their own distinct SOS code.

However, this …

… would indicate some sort of memory leak.

This is not the best way to do this.
The better approach would be this

    if (millis() - lastTime > 1000) {
      ...
      lastTime = millis();
    }

You may also want to consider replacing the delay(1000) with a similar construct.

Another thing to consider (which should also not cause a hard fault tho’) is that you should either call Particle.process() within any longish running loop in the library or alternatively add SYSTEM_THREAD(ENABLED) to have the system take care of that.

To hone in on the offending instruction you could add a bunch of Serial.print() statements (with some short delay to allow for the data to be sent before moving on) to see where exactly the SOS+1 is triggered.

1 Like

Thanks for the feedback, I tested your idea, first by simply printing numbers between sections of code to narrow down where it is crashing. Interestingly enough, with this code, it will not crash… I’ve left it for 10 minutes and nothing. It gets weirder.

  • With the new Serial.println()'s, it does not crash DURING “particle serial monitor” (up to 10 minutes) A notable observation is that my lidar sampling is reduced by 50% when doing this.

  • With the new Serial.println()'s, if I ctr+c “particle serial monitor” it will crash within 30-60 seconds… So essentially I can’t watch it crash.

  • If I comment the Serial.println()'s out, it crashes within 30 seconds DURING “particle serial monitor” like it used to.

This has become some kind of schrodinger’s cat situation.


#include <RPLidar.h>

byte LED = D0;
byte RLY1 = D1;
byte RPLIDAR_MOTOR = A3;
RPLidar lidar;
int points = 0;
uint32_t lastTime  = millis() + 1000;
bool startBit;

void setup() {
  Serial.begin();
  pinMode(LED, OUTPUT);
  pinMode(RLY1, OUTPUT);
  pinMode(RPLIDAR_MOTOR, OUTPUT);
  //settle
  delay(3000);
  lidar.begin(Serial1);
  digitalWrite(LED, HIGH);
  digitalWrite(RLY1, HIGH);
}

void rplidar_loop()
{
  Serial.print("2");
  if (IS_OK(lidar.waitPoint())) {
    Serial.print("3");
    //float distance = lidar.getCurrentPoint().distance; //distance value in mm unit
    //float angle    = lidar.getCurrentPoint().angle; //anglue value in degree
    startBit = lidar.getCurrentPoint().startBit; //whether this point is belong to a new scan
    Serial.print("4");
    //byte  quality  = lidar.getCurrentPoint().quality; //quality of the current measurement
    points += 1;

    Serial.print("5");

    if (startBit) {
      Serial.print("new frame\n");
    }

    Serial.print("6");

    if (millis() - lastTime > 1000) {
      Serial.printf(" p%d\n", points);
      points = 0;
      lastTime  = millis();
    }

    Serial.print("7");
    
  } else {

    Serial.print("r");

    // reconnect
    analogWrite(RPLIDAR_MOTOR, 0); //stop the rplidar motor
    
    // try to detect RPLIDAR... 
    rplidar_response_device_info_t info;
    if (IS_OK(lidar.getDeviceInfo(info, 100))) {
       // detected...
       lidar.startScan();
       // start motor rotating at max allowed speed
       analogWrite(RPLIDAR_MOTOR, 255);
       delay(1000);
    }

  }
}

void loop() {
  Serial.print("1");
  rplidar_loop();
}

.

So in an attempt to see when the error is happening, I removed all of the random printing and only print the angle every time a new point is read. This also prevented the argon from crashing…

.

/*
 * Project test
 * Description:
 * Author:
 * Date:
 */

#include <RPLidar.h>

byte LED = D0;
byte RLY1 = D1;
byte RPLIDAR_MOTOR = A3;
RPLidar lidar;
int points = 0;
uint32_t lastTime  = millis() + 1000;
bool startBit;

void setup() {
  Serial.begin();
  pinMode(LED, OUTPUT);
  pinMode(RLY1, OUTPUT);
  pinMode(RPLIDAR_MOTOR, OUTPUT);
  //settle
  delay(3000);
  lidar.begin(Serial1);
  digitalWrite(LED, HIGH);
  digitalWrite(RLY1, HIGH);
}

void rplidar_loop()
{
  if (IS_OK(lidar.waitPoint())) {
    //float distance = lidar.getCurrentPoint().distance; //distance value in mm unit
    float angle    = lidar.getCurrentPoint().angle; //anglue value in degree
    startBit = lidar.getCurrentPoint().startBit; //whether this point is belong to a new scan
    //byte  quality  = lidar.getCurrentPoint().quality; //quality of the current measurement
    points += 1;


    if (startBit) {
      Serial.print("new frame\n");
    }


    if (millis() - lastTime > 1000) {
      Serial.printf(" p%d\n", points);
      points = 0;
      lastTime  = millis();
    }

    Serial.printf("%f\n", angle);
    
  } else {

    // reconnect
    analogWrite(RPLIDAR_MOTOR, 0); //stop the rplidar motor
    
    // try to detect RPLIDAR... 
    rplidar_response_device_info_t info;
    if (IS_OK(lidar.getDeviceInfo(info, 100))) {
       // detected...
       lidar.startScan();
       // start motor rotating at max allowed speed
       analogWrite(RPLIDAR_MOTOR, 255);
       delay(1000);
    }

  }
}

void loop() {
  rplidar_loop();
}

.

Based on the discovery that printing something every “point” prevents a crash, I removed the angle print() code and added a simple delay(1) during every point and it’s perfectly stable, no crashes (after 30 minutes of testing). Essentially, I’m doing something too fast.

This makes me think about how disabling Particle.cloud with SYSTEM_MODE(MANUAL) fixed the issue. It honestly feels like Particle cloud doesn’t like how fast I’m doing (something?) and is crashing. Maybe it has something to do with slamming Serial1 with data? Correlation does not imply causation, but all arrows are pointing in the direction of the cloud system.

It may be notable that commands like “particle serial mac” and “particle serial identify” timeout/error when running this code even though the void loop() is running & printing properly.

Little update,

  • delayMicroseconds(200) allows for maximum samples from the sensor, but causes a crash.
  • delayMicroseconds(250) reduces to 90% maximum samples and does not crash.

So something about the sensor reading at maximum speed is causing the issue (but only when SYSTEM_MODE is automatic). Consider me stumped, but at least I have some kind of fix for the issue.

@ccc, did you modify the sensor library in any way? Is your code on the Web IDE so it can be shared?

@peekay123 I have not modified the code, I just checked again to be sure. I did have to modify the folder structure to import the library since I’m using VScode (not Web IDE).

Here is the link to the library again:

https://github.com/robopeak/rplidar_arduino

Here is the folder structure for proper import. Essentially just copying the files into a “src” folder:

lib --> RPLidar --> src --> inc
                        --> keywords.txt
                        --> RpLidar.cpp
                        --> RpLidar.h
src --> test.ino
    --> test.cpp

Here is the exact .ino file I’m using (recently cleaned up for clarity).

#include <RPLidar.h>

byte RPLIDAR_MOTOR = A3;
RPLidar lidar;
int points = 0;
uint32_t lastTime  = millis() + 1000;
bool startBit;

void setup() {
  Serial.begin();
  pinMode(RPLIDAR_MOTOR, OUTPUT);
  //settle
  delay(3000);
  lidar.begin(Serial1);
}

void rplidar_loop()
{
  if (IS_OK(lidar.waitPoint())) {

    startBit = lidar.getCurrentPoint().startBit; //whether this point is belong to a new scan
    points += 1;

    if (millis() - lastTime > 1000) {
      Serial.printf(" p%d\n", points);
      points = 0;
      lastTime  = millis();
    }

    delayMicroseconds(250);
    
  } else {

    // reconnect
    analogWrite(RPLIDAR_MOTOR, 0); //stop the rplidar motor
    
    // try to detect RPLIDAR... 
    rplidar_response_device_info_t info;
    if (IS_OK(lidar.getDeviceInfo(info, 100))) {
       // detected...
       lidar.startScan();
       // start motor rotating at max allowed speed
       analogWrite(RPLIDAR_MOTOR, 255);
       delay(1000);
    }

  }
}

void loop() {
  rplidar_loop();
}

The key to success right now is the delayMicroseconds(250). I’m seeing about 90-95% efficiency with this technique, and most important no crashes. I’m planning on changing this to a (micros() - lastmicros) check that wraps rplidar_loop(). Still not sure if it’s going to work, but it would obviously be cleaner and more reliable in the long term.

Obviously the long term solution is probably to rewrite the RPLidar library and find the root cause.

Just some random guess based on your observations:
With a cloud connection the cadence of loop() will be about 1kHz without it will be much higher.
Hence the Serial1 RX buffer will probably be read more often without cloud than with and hence not overflow.
This in combination with a possible lack of resilience of the library against such cases this might explain the crashes.

BTW, have you tried this suggestion at all?

@ScruffR Possible. If an overflow is possibly responsible do you know of any documentation detailing how to detect and handle serial overflow properly?

I did try SYSTEM_THREAD(ENABLED), it didn’t work. I just tried it again with the old configuration and it instantly crashed.

What is strange to me is where the delay is located (and working). Cloud being enabled would still limit loop() to 1kHz, but every time a new point is read, we delay 250 microseconds which is basically an eternity. The RX buffer should be filling up during that delay right? So shouldn’t the delay exacerbate the issue?

@ScruffR Interesting update SYSTEM_THREAD(ENABLED) crashes the argon with the new functional code. When SYSTEM_THREAD(ENABLED) is disabled it will run seemingly forever. I would expect the opposite.