Debugging a locked up Boron

Hello Readers,

I’ve got a Boron 2G/3G device hooked up to another device via Serial connection. The goal of this is to monitor and control this other device via serial (no I2C or SPI sadly) since it does not have 2G/3G capabilities.

I have a FSM and when running the Boron for a long period of time (4-24h) I find that the device locks up (i.e. solid Cyan) in its idle phase randomly (i.e. random time). The device never leaves the idle phase and during its idle phase it does the following:

  • Ping the device via Serial1
    • if the device response with data (i.e. data != “”), keep the LED breathing Cyan
    • if the device does not response and times out (5s wait), blink the LED Red
  • This state exits and ends the loop() really quick
  • This state loops around on idle until a web call happens where the handler changes the next state. The other states will send different serial commands to the device but the dead lock happens in the idle state so I won’t go into details about the other states

One recent discovery is that monitoring the device via serial I see an error message. I’m rerunning some tests to see if this message appears consistently. Could this be the reason for a lock up?

0006072005 [comm.protocol] ERROR: Event loop error 1
0006072005 [system] WARN: Communication loop error, closing cloud socket

Some questions I have for readers

  • Any tips for debugging a locked up device?
  • Anyone know what “Event loop error 1” means?
  • Anyone have a Boron running in the field for 1-2days + ?
  • A simplified version of my code is posted below, if anyone can spot anything it would be much appreciated.

I’ve been debugging this issue for the last 2 weeks so any help or suggestions would be greatly appreciated =(.

I also tried to include a 5min application watch dog in attempt for the system to recover. If the system locks up, I expected the watchdog to kick in and reset the system. But this did not work as intended. I guess a side question is, does the watchdog also get locked up during this mode?

ApplicationWatchdog wd(300000, System.reset); // Watchdog set at 5mins 

Set up

  • Boron 2G/3G, DeviceOS 0.9.0, powered by 5V 2.4A
  • 3rd party device connected to Serial 1 via…
    • 3.3V for power
    • GND pin
    • P0.08 for RX
    • P0.06 for TX

Stripped down version of my code

// Global Variables
unsigned int gBaudRate = 9600;
States gNextState, gCurrentState;
unsigned int gWebTimeout = 500;
unsigned int gDeviceCheckPeriod = 5000; // Orginal 1min
unsigned int gSerialTimeout = 5000;
unsigned int gMonitorPeriod = 10000;
unsigned int gSendDataPeriod = 31000;
int gRedPin = D8;                             // Pins for RGB LED
int gGreenPin = D6;
int gBluePin = D5;

// Declarations
SYSTEM_MODE(SEMI_AUTOMATIC);                  // Do not automatically connect to cloud
SerialLogHandler logHandler(LOG_LEVEL_ALL, {  // Logging level for non-application messages
    { "app", LOG_LEVEL_ALL }                  // Default logging level for all application messages
});
WebHooks* web;
SerialController* serialController;
LED* led;
PMIC pmic;
ApplicationWatchdog wd(300000, System.reset); // Watchdog set at 5mins 

// States Initialization
WaitingForAuth waitingForAuth(gSerialTimeout, gDeviceCheckPeriod);
...

void setup() {
  gNextState = WAITING_FOR_AUTH;
  gCurrentState = WAITING_FOR_AUTH;
  web = new WebHooks();  // This registers all cloud components
  serialController = new SerialController(gBaudRate);
  led = new LED(gRedPin, gGreenPin, gBluePin);

  // Disable charging
  pmic.begin();
  pmic.disableCharging();
  pmic.disableWatchdog();

  // Switch to Roger's IOT SIMs before connecting
	Cellular.setActiveSim(EXTERNAL_SIM);
	Cellular.setCredentials("m2minternet.apn");
	Particle.keepAlive(30); // This is needed if we are using Roger's network
        Particle.connect(); // Make sure connect is alway last since we want to register all cloud components first
}

void loop() {
  switch (gCurrentState) {
    case WAITING_FOR_AUTH:
      gCurrentState = waitingForAuth.run(gWebTimeout, serialController, led);   
      break;
    case STATE_1:
      ...
      break;
    case STATE_2:
      ...
      break;
    case State_3:
      ...
      break;
    default:
      ...
      break;
States WaitingForAuth::run(unsigned int timeout, SerialController* serialController, LED* led) {
    if (millis() > mDeviceCheckTime) {
        // sendCMD just sends a serial command, waits for response and returns the String
        String temp = serialController->sendCMD("Some serial command", mSerialTimeout);
        mDeviceCheckTime = millis() + mDeviceCheckPeriod;
        Log.trace("mDeviceCheckTime is: " + String(mDeviceCheckTime));
        if (temp.equals("")) {
            Log.error(memberName + "Device is not responding!");
            led->blinkRed();
        } else {
            Log.trace("Device is alive:");
            Log.trace("\t" + temp);
            led->idleColor();
        }
    }
    return gNextState;
}
LED::LED(int redPin, int greenPin, int bluePin) {
    this->mBlinkOrange = new LEDStatus(RGB_COLOR_ORANGE, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mBlinkRed = new LEDStatus(RGB_COLOR_RED, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mBlinkGreen = new LEDStatus(RGB_COLOR_GREEN, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mSolidGreen = new LEDStatus(RGB_COLOR_GREEN, LED_PATTERN_SOLID, LED_PRIORITY_IMPORTANT);
    this->mRedPin = redPin;
    this->mGreenPin = greenPin;
    this->mBluePin = bluePin;
    pinMode(mRedPin, OUTPUT);
    pinMode(mGreenPin, OUTPUT);
    pinMode(mBluePin, OUTPUT);
    RGB.onChange(ledHandler);
    Log.info("LED: Initialized...");
}

void LED::blinkRed() {
    Log.trace("Blinking red...");
    if (!mBlinkRed->isActive()) mBlinkRed->setActive(true);
}

void LED::idleColor() {
    Log.trace("Going back to system default...");
    if (mBlinkRed->isActive()) mBlinkRed->setActive(false);
}

void LED::ledHandler(uint8_t r, uint8_t g, uint8_t b) {
    Log.trace("Mirroring to external LED...");
    analogWrite(mRedPin, r);
    analogWrite(mGreenPin, g);
    analogWrite(mBluePin, b);
}

Just an update, in another debugging attempt I tried to strip down all my code as well as swap out the 3rd party device for an RPi. After 24-25h of run time, the Boron locks up at solid Cyan again. The RPi is also not powered by the Boron so this rules out the 3rd party device being the cause of the issue. Using an RPi also makes it easier if anyone wants to try replicating this.

In this scenario:

  • Particle will periodically send a ping to the RPI (e.g. “I am Particle!”)
  • RPi mimicking the third party device will send a reply back (e.g. "I am RPi!) upon getting a ping from the Particle
  • Particle will receive the reply and then reset the timer again.

Boron Serial Logs - Runtime ~13.6h

0048913185 [app] TRACE: Recieved 'I am RPi!'...
0048913186 [app] TRACE: RPi is alive
0048913187 [app] TRACE: LED::idleColor: Going back to system default...
0048974220 [app] TRACE: Recieved 'I am RPi!'...
0048974221 [app] TRACE: RPi is alive
0048974222 [app] TRACE: LED::idleColor: Going back to system default...
0049035255 [app] TRACE: Recieved 'I am RPi!'...
0049035256 [app] TRACE: RPi is alive
0049035257 [app] TRACE: LED::idleColor: Going back to system default...
0049096290 [app] TRACE: Recieved 'I am RPi!'...
0049096291 [app] TRACE: RPi is alive
0049096292 [app] TRACE: LED::idleColor: Going back to system default...
<Boron is frozen here solid Cyan, RPi is still responsive>

Set up for the Boron is as follows

  • Boron 2G/3G, DeviceOS 0.9.0, powered by 5V 2.4A
  • 3rd party device connected to Serial 1 via…
    • GND pin to RPi
    • P0.08 for RX to RPi Tx
    • P0.06 for TX to RPi Rx
      Note: No Vdd for power, RPi is powered via USB

Boron - Main .ino File

#include "LED.h"

// Global Variables
unsigned int gBaudRate = 9600;
unsigned int gRPiCheckPeriod = 60000;
unsigned int gRPiCheckTime = 0;
unsigned int gSerialTimeout = 5000;
unsigned int gMonitorPeriod = 10000;
int gRedPin = D8;                             // Pins for RGB LED
int gGreenPin = D6;
int gBluePin = D5;

// Declarations
SYSTEM_MODE(SEMI_AUTOMATIC);                  // Do not automatically connect to cloud
SerialLogHandler logHandler(LOG_LEVEL_ALL, {  // Logging level for non-application messages
    { "app", LOG_LEVEL_ALL }                  // Default logging level for all application messages
});
LED* led;
PMIC pmic;

void setup() {
  led = new LED(gRedPin, gGreenPin, gBluePin);

  // Serial to talk to RPi
  Serial1.begin(gBaudRate);

  // Disable charging
  pmic.begin();
  pmic.disableCharging();
  pmic.disableWatchdog();

  // Switch to Roger's IOT SIMs before connecting
	Cellular.setActiveSim(EXTERNAL_SIM);
	Cellular.setCredentials("m2minternet.apn");
	Particle.keepAlive(30); // This is needed if we are using Roger's network
  Particle.connect(); // Make sure connect is alway last since we want to register all cloud components first
}

void loop() {
  if (millis() > gRPiCheckTime) {
      Serial1.println("I am Particle!");
      unsigned int timeout = millis() + gSerialTimeout;
      String parsedLine = "";
      while (millis() < timeout) {
          parsedLine = Serial1.readStringUntil('\n');
          if (parsedLine != "") {
            Log.trace("Recieved '" + parsedLine + "'...");
            break;
        }
      }
      if (parsedLine == "I am RPi!") {
          Log.trace("RPi is alive");
          led->idleColor();
      } else {
          Log.error("RPi not responding!");
          led->blinkRed();
      }
      gRPiCheckTime = millis() + gRPiCheckPeriod;
  }
}

Boron - LED.h

#pragma once
#include "application.h"

class LED {
    public:
        LED(int redPin, int greenPin, int bluePin);
        void blinkOrange();
        void blinkRed();
        void blinkGreen();
        void solidGreen();
        void idleColor();
    private:
        static void ledHandler(uint8_t r, uint8_t g, uint8_t b);
        static String mClassName;
        LEDStatus* mBlinkOrange;
        LEDStatus* mBlinkRed;
        LEDStatus* mBlinkGreen;
        LEDStatus* mSolidGreen;
        static int mRedPin;
        static int mGreenPin;
        static int mBluePin;
};

Boron - LED.cpp

#include "LED.h"

String LED::mClassName = "";
int LED::mRedPin = 0;
int LED::mGreenPin = 0;
int LED::mBluePin = 0;

LED::LED(int redPin, int greenPin, int bluePin) {
    this->mClassName = "LED::";
    this->mBlinkOrange = new LEDStatus(RGB_COLOR_ORANGE, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mBlinkRed = new LEDStatus(RGB_COLOR_RED, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mBlinkGreen = new LEDStatus(RGB_COLOR_GREEN, LED_PATTERN_BLINK, LED_PRIORITY_IMPORTANT);
    this->mSolidGreen = new LEDStatus(RGB_COLOR_GREEN, LED_PATTERN_SOLID, LED_PRIORITY_IMPORTANT);
    this->mRedPin = redPin;
    this->mGreenPin = greenPin;
    this->mBluePin = bluePin;
    pinMode(mRedPin, OUTPUT);
    pinMode(mGreenPin, OUTPUT);
    pinMode(mBluePin, OUTPUT);
    RGB.onChange(ledHandler);
    Log.info(mClassName + "LED: Initialized...");
}

void LED::blinkOrange() {
    String memberName = mClassName + "blinkOrange: ";
    Log.trace(memberName + "Blinking orange...");
    mBlinkOrange->setActive(true);
    mBlinkRed->setActive(false);
    mBlinkGreen->setActive(false);
    mSolidGreen->setActive(false);
}

void LED::blinkRed() {
    String memberName = mClassName + "blinkRed: ";
    Log.trace(memberName + "Blinking red...");
    if (mBlinkOrange->isActive()) mBlinkOrange->setActive(false);
    if (!mBlinkRed->isActive()) mBlinkRed->setActive(true);
    if (mBlinkGreen->isActive()) mBlinkGreen->setActive(false);
    if (mSolidGreen->isActive())  mSolidGreen->setActive(false);
}
void LED::blinkGreen() {
    String memberName = mClassName + "blinkGreen: ";
    Log.trace(memberName + "Blinking green...");
    if (mBlinkOrange->isActive()) mBlinkOrange->setActive(false);
    if (mBlinkRed->isActive()) mBlinkRed->setActive(false);
    if (!mBlinkGreen->isActive()) mBlinkGreen->setActive(true);
    if (mSolidGreen->isActive())  mSolidGreen->setActive(false);
}

void LED::solidGreen() {
    String memberName = mClassName + "solidGreen: ";
    Log.trace(memberName + "Solid green...");
    if (mBlinkOrange->isActive()) mBlinkOrange->setActive(false);
    if (mBlinkRed->isActive()) mBlinkRed->setActive(false);
    if (mBlinkGreen->isActive()) mBlinkGreen->setActive(false);
    if (!mSolidGreen->isActive())  mSolidGreen->setActive(true);
}

void LED::idleColor() {
    String memberName = mClassName + "idleColor: ";
    Log.trace(memberName + "Going back to system default...");
    if (mBlinkOrange->isActive()) mBlinkOrange->setActive(false);
    if (mBlinkRed->isActive()) mBlinkRed->setActive(false);
    if (mBlinkGreen->isActive()) mBlinkGreen->setActive(false);
    if (mSolidGreen->isActive())  mSolidGreen->setActive(false);
}

void LED::ledHandler(uint8_t r, uint8_t g, uint8_t b) {
    String memberName = mClassName + "ledHandler: ";
    Log.trace(memberName + "Mirroring to external LED...");
    analogWrite(mRedPin, r);
    analogWrite(mGreenPin, g);
    analogWrite(mBluePin, b);
}

RPi Code - Python 3 Script

import sys
from time import sleep
import RPi.GPIO as GPIO
import serial

# Setup GPIO using BCM numbering
GPIO.setmode(GPIO.BCM)

# Set pins as output
pin = 24
GPIO.setup(pin, GPIO.OUT)

blinkTime = 1
dataPeriod = 1
ser = serial.Serial()
ser.port = '/dev/ttyS0'
ser.baudrate = 9600
ser.timeout = 1
ser.open()
cmd = "I am RPi!"

try: 
    while True:

        # Blink
        print("Turning pin " + str(pin) + " high...")
        GPIO.output(pin, GPIO.HIGH) # Open circuit
        print ("GPIO pin " + str(pin) + " is " + str(GPIO.input(pin)))
        sleep(blinkTime)

        print("Turning pin " + str(pin) + " low...")
        GPIO.output(pin, GPIO.LOW) # Closed circuit
        print ("GPIO pin " + str(pin) + " is " + str(GPIO.input(pin)))
        sleep(blinkTime)

        # Read serial
        output = ""
        parsedLine = ""
        while output == "": # String is NULL
            output = output + ser.readline().decode('ascii')
        print("output:")
        print(output)
        for line in output.splitlines():
            if "I am Particle!" in line:
                print("Correct message recieved! Sending..")
                print(cmd)
                ser.write(str(cmd).encode())

except KeyboardInterrupt:
    print("Terminiated by keyboard interrupt...")

finally:

    # Close serial and turn off relay
    print("Finally: Closing serial connection....")
    serialConnection.close()

    # Free RPi GPIO pins
    GPIO.cleanup()

@ScruffR I’m working with Dave from tech support on this to reproduce this on his side. He pointed me to a new thread below:

Do you think this could be the same issue? I read through the thread and I do not use Serial1.availble() and instead of Serial1.read() I use Serial1.readStringUntil().

1 Like

I haven’t looked, but it might be related.
If you use Serial1.read() (and maybe readStringUntil() internally) and check for -1 to learn whether there are any more bytes in the RX buffer, then that would fail due to that -260 bug too.

So end conclusion from working with Dave is that he also managed to reproduced the issue. It seems they acknowledged that this is a problem and a particle engineer is now investigating the issue. This issue also seems that this occurs on all gen 3 devices.

Decreasing the period in which the Boron pings via Serial1 also seems to shorten the time taken to reproduce the issue. So i guess its something software related?

Hopefully this gets fixed soon. For now, I guess any serial/UART sensors are a no go on Gen 3 devices. In parallel I’ll be working with an electron for now, just to get things going on my side. So far electron seems stable and I’ve been able to run it for 48h+.

I’ll keep this posted updated if anything comes up.

1 Like

So, after weeks of debugging with Dave we think that the issue is not serial1 but probably calling a Log.trace() in the RGB.onChange() handler. :pensive: . When debugging I intentionally left in all print statements in an attempt to see where the Boron device would freeze on, I guess its counter intuitive that removing a print statement would actually fix the issue. Hence I never found the problem.

The reason why RGB.onChange() was used is as a work around for RGB.mirrorTo() not working in OS 0.9.0 as mentioned in RGB mirror not working on Argon - #4 by slacker89 . This also explains why my electron did not freeze since I was using RGB.mirrorTo(). This should be fixed in the next up coming OS.

Dave also pointed out that this is mentioned on their document which I missed out on: https://docs.particle.io/reference/device-os/firmware/boron/#rgb

The onChange handler is called 1000 times per second so you should be careful to not do any lengthy computations or functions that take a long time to execute. Do not call functions like Log.info, Serial.print, or Particle.publish from the onChange handler. Instead, save the values from onChange in global variables and handle lengthy operations from loop if you need to do lengthy operations.

Note: We tried a few suggestions like adding an application watchdog to reboot the device but when the Boron locks up, it really locks up. I guess this is an issue Particle should fix anyway since nothing should ever lock the device up this badly.

Long and painful journey but at least its solved.

3 Likes