Boron RC26 Stability - Random reboots - How to debug?


#1

I have a remote site installation of one Boron and 2 Xenons. Each Xenon sends a short sentence (200 bytes) every about 5 minutes. The Boron sends a short sentence (200 bytes) to Thingspeak about every 15 minutes. I have upgraded the wiring to eliminate any power issues and have flashed and updated the 3 micros.

I have tested the following scenarios and the results are below:

  1. Boron running with no Xenons (very stable, no restarting in 24 hours).

  2. When Xenons are present, Boron has random restarts. By restarts I mean rebooting (not loss of mesh/cloud connection). Sometimes it works for 15-30 minutes without restarts, and sometimes it works for a few hours before restarting. In my latest test, the system was powered at 12:00 am., the Boron has restarted at 1:21 am, 2:26 am, 6:21 am, 6:41 am, 6:50 am, 7:06 am, and 8:28 am. The cloud indicates the 2 Xenons have been working fine (no restarts) which was confirmed by checking Xenon Particle Variables.

I would appreciate the community’s help on how to find out what is causing this problem? Is this a potential RC26 stability issue or something else?


#2

You probably heard that before: “One crucial part in assessing the potential causes would be - as always - seeing your code.”


#3

Good morning @Scruffr:

Yes, indeed, I did. The code is long so was reluctant to show. Here it is:

==============================================================================


//==============================================================================
//SYSTEM_THREAD(ENABLED);
//==============================================================================

//=======================================================================================================
// ThingSpeak.h, ThingSpeak.cpp, QueueList.h, Utilities.cpp, Utilities.h

#include "ThingSpeak.h"

// This #include statement was automatically added by the Particle IDE.
#include <PublishManager.h>
PublishManager<> publishManager;
//=======================================================================================================
//   Main Settings
//=======================================================================================================
String verNum = "1.15";
int baseNum = 4444;          //Electron #
int myTimeZone = -4;

int showLCD = 1;

int myPlatform;             //=6 Photon, =10 Electron

int a, b, c;

String myData;
//========================================================================================================
// This #include statement was automatically added by the Particle IDE.
#include <ThingSpeak.h>
TCPClient client;

// This #include statement was automatically added by the Particle IDE.
#include <Adafruit_SSD1306.h>
#define OLED_RESET 4
Adafruit_SSD1306 display(OLED_RESET);

// This #include statement was automatically added by the Particle IDE.
#include "Utilities.h"

int count = 0;

//========================================================================================================
/*  Data Format
  ---------------------------------------------------
  1) All messages start with "$$" and end with "^"
*/



//===========================================================================================================
//Screen
//===========================================================================================================
int countLCD = 0;
int refreshLCD = 20;

//===========================================================================================================
int unitNum;                      //Xenon Unit sending the data
int inform = 0;                   //=1 if message was sent
int congChanged = 0;               //=1 congestion changed from congested to clear or vice versa
int goPublish = 0;                 //=1 Particle Publish, =0 do NOT

int numSensors = 0;                         //Total number of sensors sending data
int numNuSensors = 0;                       //Total number of sensors after zeroing

int mySensors[] = {0, 0, 0, 0};
int myPsuSpeed[] = {0, 0, 0, 0};
int myCongestion[] = {0, 0, 0, 0};

int psuSpeed;                            //average psuedo speed from sensors

int numBatch = 99;                       //Total number of cars + trucks per sensor before sending

bool allCongested = false;
int numCongested = 0 ;                   //Number of congested lanes
int congestion = 0;

int sensorInd;                            //Index of base within array
int myTrucks[] = {0, 0, 0, 0};
int myCars[] = {0, 0, 0, 0};
int totalTrucks[] = {0, 0, 0, 0};         //Total Trucks for each sensor
int totalCars[] = {0, 0, 0, 0};           //Total Cars for each sensor
int myTime[] = {0, 0, 0, 0};
int myTimeInt = 0;
int oldTraffic;

int number_of_truck = 0, number_of_car = 0;
int total_Trucks = 0, total_Cars = 0;
int curTrucks = 0, curCars = 0;

String lastStat = "None";
String lastComm = "None";

int numOcc = 0;
//===========================================================================================================
int writeEach = 0;                          //=0 do not write each record to disk, =1 write each vehicle

NonBlockDelay d;
//===========================================================================================================
int currentHour;
int currentMinute;

unsigned long startTime;                      //Start operations millis()
unsigned long dayMS = (86400 * 1000);         //milliseconds in a day
unsigned long lastMS;                         //last time in milliseconds

unsigned long resetTime = 60 * (60 * 1000);   //Reset if no info in 60 minutes
unsigned long transmitTime = 0;               //last successful transmission (after checking connected)
unsigned long infoTime;                       //last message time received from Xenon

unsigned long updateTime = 60 * (60 * 1000);   //Heartbeat 60 minutes if (goPublish ==1 )

//===========================================================================================================
/* Thingspeak */
// https://www.mathworks.com/help/thingspeak/use-photon-client-to-publish-to-a-channel.html

char writeAPIKey[] = "XENUOR3";                // Change this to your channel Write API Key.
long channelID = 632;                                // Change this to your channel number.
unsigned long thingConTime = 0;              //last time data was sent to Thingspeak

struct thingMessageS {
  int baseNum;
  int unitNum;
  String number_of_truck;
  String number_of_car;
  String total_Trucks2;
  String total_Cars2;
  int psuSpeed;
  int congestion;
  unsigned long timeInterval;
};

thingMessageS thingMessage;
//========================================================================================================
//https://playground.arduino.cc/Code/QueueList
#include "QueueList.h"
QueueList <thingMessageS> queue;
int queueCount = 0;
const unsigned long transmitInt = 20 * 1000UL;

//===========================================================================================================
//https://forum.arduino.cc/index.php?topic=246654.0

union cvt {
  float val;
  unsigned char b[4];
} x;

//===========================================================================================================
String tMessage;

//void myHandler(const char *event, const char *data)
void myHandler(const char *event, const char *data)
{
  //Serial.printlnf("event=%s data=%s", event, data ? data : "NULL");
  
  infoTime = millis();

  myData = data ;
  parseRecData(data);
  if (goPublish == 1 ) publishManager.publish("RECEIVED", "$$,33," + String(baseNum) + ", Data ="  +  String(data)  + ","  +  String(numSensors) + ","  +  String(numNuSensors)  + ","  +  String(Time.timeStr()) + "," + "^");
}

//===========================================================================================================

#4

A quick look at your code and I again found something else you may have heard before (frequently) already: "Don’t use String" :wink:

With these usual suspects, we are somewhat reluctant to search for other causes before these well known No-Nos are removed.


#5

@scruffr,

I hear you. I have read about it after your comments and understand how it defragments the heap but find it hard to believe that this is the cause given that the reboots sometimes occur in a rather short period of time.

I have turned out publishing an am getting:

spark/device/last_reset panic, 7

Can this be caused by using String? If using “String” can cause such extreme behavior, I am thinking it should not be allowed by the compiler?

I would appreciate your help in a link on how to use Publish without using String? In the Particle docs, they show the example code:

Particle.publish(String eventName, **String data**);

Thanks in advance for your time.


#6

The SOS 7 panic is a known issue which will be addressed in rc.27, but that has mainly hit Argons and Xenons and then mostly within the first few minutes.
For panic that happens after some longish but unpredictable time heap fragmentation one of the easiest to avoid before needing to dig any deeper.

The common answer to this is use C strings (aka character arrays - char s[16]) and snprintf()


#7

OK, thanks.

Will go through my code and change Strings.


#8

Whatever function takes a String also takes a C string.

It’s not the mere use of String that has this disruptive effect, but the way how it’s used.
e.g when you write code like this

"$$,33," + String(baseNum) + ", Data ="  +  String(data)  + ","  +  String(numSensors) + ","  +  String(numNuSensors)  + ","  +  String(Time.timeStr()) + "," + "^"

You are creating loads of independent String objects (each at least 16 bytes long) feed them one by one into a “target” object which consequently grows out of it’s original 16 byte memory and hence needs to be relocated and resized (each time doubling in size) just to destroy most of these temporary objects in the next moment and start the same procedure again.
That’s the cause for heap fragmentation. When String is used right, it’s not that much of a problem, but because it is so convenient and easy to use (without needing to know what happens behind the scene) most users don’t use it right.


#9

Thank you @scruffr. It is clear now what is going on.

A small request, for a simple statement like:

"$$,33," + String(baseNum) + ", Data = " + String(data) + "," + String(numSensors) + "^");

baseNum and numensors areint, data is const char *data

what is the recommended replacement equivalent?


#10

Assuming baseNum and numSensors to be integers and data to be a string;

char str[64];
snprintf(str, sizeof(str), "$$,33,%d, Data = %s,%d^", baseNum, data, numSensors);

#11

The ThingSpeak Library also uses Strings.
The Decreased stability while the Xenons are running would make sense (since I assume they are responsible for collecting the data).

I eventually stopped using the ThingSpeak Library in favor of a Webhook and snprintf (which I assume @ScruffR is about to suggest).

That method is more stable and Less data usage

(Edit}, Yup, he beat me to it :grin:


#12

Thank you.


#13

Hello @Rftop,

Thank you. You are correct, the Xenons collect data and then send to the Boron which aggregates it. The problem is that the Boron updates ThingSpeak about once in 10 minutes which is not that often …

Can you share a link for how to use Webhooks in lieu of the ThingSpeak library?


#14

Sure,
Here’s my “Starter Code” when @ScruffR was teaching me.

https://go.particle.io/shared_apps/5bffec808bf964444d00150f

This is pretty old, so the Webhook instructions in the Comments may have changed slightly (but not the JSON for the actual Webhook).

The Sample code is long. The actual usage is pretty simple:

    snprintf(msg, sizeof(msg),  "{\"1\":\"%.0f\", \"2\":\"%.2f\", \"3\":\"%.0f\", \"k\":\"%s\"}", SOC, lipoVoltage, tempF,  myWriteAPIKey)  ;
     Particle.publish(eventName, msg, PRIVATE, NO_ACK);

Where do I place the API key in the Webhook settings?
#15

@Rftop, can you also share the webhook definition as well (I suspect a screenshot may be needed) .


#17

New Integration, Webhook, :

Select CUSTOM TEMPLATE


#18

Just to make sure: @Rftop has the actual copy/pastable JSON in the comment header of his shared project - so no need to type it off of the image :wink:


#19

I have made all the changes, but still getting reboots. I suspect the reboots were due to this:

spark/device/last_reset panic, 7

===========================================
That said, thank you again @scruffr. It is good to use recommended programming practices for Strings. I am not a C programmer, hence the challenge :-).

Also what threw me off is that the same code was running on an Electron for a couple of months without a reboot (except intentionally once a day via a relay). I realize that it is a different platform …


#20

Hello folks,

Wanted to follow-up with this thread to let everyone know that a fix for the SOS-7 issue has been released with v0.8.0-rc.27. The issues was tracked to a problem with the Nordic 802.15.4 driver. Instructions for upgrading are available below. We’d love to know if applying the release fixes the issue SOS-7 issue that you’re experiencing.


Note that we have seen some reports of change in behavior for rc.27 when users call the Mesh.subscribe() function within the setup() loop that can result in a separate SOS-10 code which we’re currently investigating. If this issue affects you, please note the following workaround.