Red flashes - Xenon 1.3.0-rc.1 - Assertion Error?

I have a Xenon with firmware 1.3.0-rc.1. The firmware has a loop() function with a check of mesh connectivity evry 60 seconds, and a single sub (GetSensorReadings). This sub gets readings from a serial sensor and returns if the sensor does not detect the presence of something in front of it.

void loop()
{
    ..... code to check mesh is connected 

   GetSensorReadings();
    delay(1);
}

For an unknown reason, the program is crashing every 5 minutes or so (sometimes longer) with what appears to be an “Assertion Error” - 10 flashes. I am not really sure about the exact number of flashes because they happen so fast!

I thought initially there was an processing error within that sub but upon watching the Xenon, the crash is occurring even though there is nothing in front of the sensor. When the sensor does not detect something, the sub returns.

I have looked at the code for hours and could not find anything that should cause a heap error.

I would appreciate the community’s insight as to how one would debug such an error?

The actual number of significant flashes is not fast. You may see fast flashes as part of the SOS (morse code ... --- ...) signals before and after the significant blinks.

Just showing the functino call but not what the function does isn't really helpful.
Also how do you know it's not in the part that's in your ..... code to check mesh is connected part?

"Heap error" and Assertion Fault (or I'd rather suppose a Hard Fault SOS+1) are different things.

  1. It would make it much easier if you posted your code
  2. Comment out all of your startup() and loop() code - does it work?
  3. Uncomment startup() code - does it work? (if not comment out sections of startup() until it does work)
  4. Then on to loop()
  5. Rinse and repeat !

I am also seeing a SOS 14 times with the latest 1.30-rc.1 firmware on code that use to run fine on 1.21. I am using the DHT-22 example code with a Xenon board. The unit resets about every 45 seconds. It does connect to my MESH network and shows a cyan breathing prior to red SOS. I have not weeded through the code yet, but I don’t see a reason why something like this should break. Any ideas ?
Thanks
Thomas

This is a known issue with some new “safety” mechanism as outlined here


and here

Hi,
I have been peeking and poking in DHT library. I was able to stop the red SOS issue by eliminating the delayMicroseconds function and using a simple while loop listed below. The unit now seems to breathe normal, but does not respond to the interrupt to acquire. I suspected the delay functions were causing issues with the ISR routines. this is a common mistake high level programmers make when working in the Embedded world. I will continue to dig deeper to try and get ISR working again. Keep in mind I am just working a hunch. Here is what I tweaked

int PietteTech_DHT::acquire() {
    // Check if sensor was read less than two seconds ago and return early
    // to use last reading
    unsigned long currenttime = millis();
   
    if (currenttime < _lastreadtime) {
        // there was a rollover
        _lastreadtime = 0;
    }
    if (!_firstreading && ((currenttime - _lastreadtime) < 2000 )) {
        // return last correct measurement, (this read time - last read time) < device limit
        return DHTLIB_ACQUIRED;
    }

    if (_state == STOPPED || _state == ACQUIRED) {
        /*
         * Setup the initial state machine
         */
        _firstreading = false;
        _lastreadtime = currenttime;
        _state = RESPONSE;

#if defined(DHT_DEBUG_TIMING)
        /*
         * Clear the debug timings array
         */
        for (int i = 0; i < 41; i++) _edges[i] = 0;
        _e = &_edges[0];
#endif

        /*
         * Set the initial values in the buffer and variables
         */
        for (int i = 0; i < 5; i++) _bits[i] = 0;
        _cnt = 7;
        _idx = 0;
        _hum = 0;
        _temp = 0;

        /*
         * Toggle the digital output to trigger the DHT device
         * to send us temperature and humidity data
         */
        pinMode(_sigPin, OUTPUT);
        digitalWrite(_sigPin, LOW);
        uint32_t start = millis();
        uint32_t finish = 1500; // Value for DHT-22
        
        if (_type == DHT11)
           
              
               
               
            while (start < start + 1800) // Added this
            {
            }
            
            //delayMicroseconds(1800);                  // DHT11 Spec: 18ms min (removed this)
        else
        
          while (start < start + 1500) // Added this
            {
            }
               
            //delayMicroseconds(1500);    // DHT22 Spec: 0.8-20ms, 1ms typ (removed this)
        pinMode(_sigPin, INPUT);        // Note Hi-Z mode with pullup resistor
                                        // will keep this high until the DHT responds.
        /*
         * Attach the interrupt handler to receive the data once the DHT
         * starts to send us data
         */
        _us = micros();
        attachInterrupt(_sigPin, &PietteTech_DHT::_isrCallback, this, FALLING);

        return DHTLIB_ACQUIRING;
    } else
        return DHTLIB_ERROR_ACQUIRING;
}

Thank you @Scruffr.

I will try to get the correct number of flashes and will also post the code per your recommendation and @shanevanj.

I realize that a heap error is different but since the crash occurs after repeated execution of the loop that something gets fragmented …

Thank you @shanevanj.

I will post the code. It is just complicated as the hardware is located at a remote site and not reachable for testing the next 3 weeks.

In my case, the same code is running on 2 Xenons on the same mesh network. The second Xenon uses an interrupt that is fired upon receiving a pin change from the first Xenon.

In my case, both Xenons are crashing, but not at the same time.

Also, both Xenons are running the same code except that one has a pin change and the other has the ISR.

Also, the system does not crash due to the ISR or the pin change. It often will work for 10 minutes or more before they crash.

Here is the code.

Notes:

Code crashes after a while of vehEntry = false and curHeight < resetHeight

Once vehEntry = true, the last if statement is executed and the loop continues.

I also forgot to mention that almost the IDENTICAL code (sans the ISR and SendMessage) works on the Photon and is very stable (never crashed).

I have been having some many problems with the Xenons and I wish I could just go back to the Photon. The challenge is finding an easy way to communicate from one Photon to another which does not require a serial port (softSerial will not work because an interrupt is needed for the sytem to work due to timing).

void loop()
{
 //=========================================================================
  if  ((bConnectMesh) && ((millis() -  mesh_check_time) > 60000)) {                                   //every 60 seconds
    if (!Mesh.ready())
    {
      Mesh.disconnect();
      delay(2000);

      Mesh.connect();
      numMeshRecon ++;
      
      display.clearDisplay();
      display.display();
      display.setCursor(0, 0);
      display.println("Mesh");
      display.setCursor(0, 25);
      display.println("Reconnect");
      display.setCursor(0, 50);
      display.println(numMeshRecon);
      display.display();
      
      mesh_check_time = millis();
    }
    else
    {
      mesh_check_time = millis();  
    }
  }
//==============================================================
GetSensorReadings();
delay(1);
//==============================================================

void GetSensorReadings()
{
char result = Leddar1.getDetections();

  if (result >= 0)  //valid reading
  {
    if ((vehEntry) && (scanNum >= 1))   scanNum ++; 
    int upperB = Leddar1.NbDet;

    for (int i = 0; i < upperB; i++)
    {
      mySegment = (Leddar1.Detections[i].Segment);
      myDistance = (Leddar1.Detections[i].Distance);
      //---------------------------------------------------------------------------------------------
      double myCurDV = myDistance * cos (curAngle[mySegment] * 3.143 / 180.0 );
      curDist2Veh = myCurDV;
      curHeight = vertDist2GND - curDist2Veh;

      if (curDist2Veh > 100)   
      {
        if (curHeight < resetHeight) curHeight = 0;
        //==============================================================================================================================================
        if ((!vehEntry) && (curHeight >= resetHeight))  //done ONCE only
        {
          //--------------------------------------------------------
          if (sensorPos == 0) digitalWrite(A1, HIGH);
          if (sensorPos == 0) delay(2);
          if (sensorPos == 0) sendMessage(0);   
          if (sensorPos == 0) digitalWrite(A1, LOW);
          //--------------------------------------------------------
          vehEntry = true;
          vehArrTime = millis();
          
          scanNum = 1;     //start recording scans
          
        }    // ((!vehEntry) && (curHeight > resetHeight)
        //-------------------------------------------------------------------------------------------
        if (vehEntry)
        {
          cloudPoints[scanNum][mySegment][0] = curHeight;
        }  //vehEntry
      }    //curDist2Veh > 100
    }      //for loop - each Detection/Segment
    //---------------------------------------------------------------------------------------------------------------------------
    if ((vehEntry) && (scanNum > 0) && (cloudPoints[scanNum][(sensorSegments / 2)][0] < resetHeight)) resetCount ++;
    //---------------------------------------------------------------------------------------------------------------------------
  } 

if ((vehEntry) && (resetCount >= 5))       
{
  vehEntry = false;  
  numVeh ++;
  resetCount = 0;
}

}

Thanks will you also post (DM if sensitive) the rest - esp. the declarations of the various vars

1 Like

Where in your while loops are you updating start?
You are also setting start = millis() but 1800ms would be 1.8 seconds not 18ms as the comment suggests.
As it seems you have created two infinite loops and that would expain why you don't get the SOS+14 and no readings either :wink:

The delayMicroseconds() calls are not inside the ISR and hence won't interfere with an interrupt and both (delay() and delayMicroseconds()) are running as normal application code and hence are interruptable.

BTW, few microsecond delays are even acceptable in an ISR when they are used with consideration :wink:

@Jimmie, in your code above your loop() seems to have a missing closing curly brace but have an extra one after GetSensorReadings() :wink:
Or do you really intend to define GetSensorReadings() as "local" function inside of loop() - which would be unorthodox to say the least :see_no_evil:

Also, why?

A single if() spanning over all four statements would be more logical IMHO.

As @shanevanj said, you are omitting important parts of the code that may well be home of the issue (e.g. sendMessage() and the Leddar lib).
Without knowledge about the size of your arrays and the range of your indices and still no info about the actual SOS code I'd assume the crash might most likely point to some index outside your array.

I thought that each time the DHT.aquire() routine was called I defined a finish time. I then define start as the current millis() time. The while loop is suppose to take the start time in the next line of code, and wait until it is greater than start + finish before exiting the loop. This should delay approximately 1.5MS. I just realized I had a decimal point issue, but it should still exit the while loop unless I am not seeing a simple mistake I made. I will go through the code later in the week and post a change when I get it verified.

Take this section

        uint32_t start = millis();
        uint32_t finish = 1500; // Value for DHT-22        
        if (_type == DHT11)             
            while (start < start + 1800) // Added this
            {
            }

You are defining a local variable start = millis() which will just write an "arbitrary" number of milliseconds since system start into that variable and then you enter the while() loop which will run as long start will be less than itself + 1800 how would that loop ever break? It just cannot - under no circumstance whatsoever will this condition ever become false to bail out of this loop.

Just replace start with 0 and see when this loop will finish

          while (0 < 0 + 1800) // Added this
            {
            }

Nope, a tight loop like that will run considerably faster than 1µs per iteration, even if you actually incremented start and hadn't got a circular reference in the condition.

I've done both - high level and embedded - and in neither world should a tight loop with a fixed iteration count be chosen over proper timing routines :wink:
Counting clock cycles might be something for machine code (which I'm acquainted to too) but interrupts will disrupt this kind of timing even more than a timed function like delay() or delayMicroseconds().

Sorry for the confusion in my earlier post. I have added in a routine which is simple and replaces the Delaymircoseconds routine just to test a theory. When I add in the code below to the DHT.aquire function and comment out the delay routine it now should delay approximately 1.5ms in the while loop then proceed. The code I posted the other day was indeed flawed as ScruffR so graciously pointed out. This is a very common timer loop used in the Arduino world, and works just fine. After compiling and loading the code my Xenon does indeed breathe again, but I am still not getting an Interrupt to sample the DHT sensor. The point I wanted to make the other day was modifying the aquire() routine seemed to change how the module worked. I have not dug into the OS code at all so I guess I should take a peek. I will keep digging.

Added this instead of Delaymircoseconds in two places of the DHT.aquire() function

      time_now = millis();
      while(millis() < time_now + period)
      {
    //wait approx. [1.5] ms
      }

Added this at the top of DHT.aquire() to Initialize the variables

unsigned long period = 1.5; // Delay period fixed to approx. 1.5ms
unsigned long time_now = 0; // Get time before entering while loop

That makes more sense now :wink:

However, it is actually safer to write a delay loop slightly differently

  while (millis() - previousTime < period);

The rational for that is outlined in this post

Thanks for the update. I have been playing more with the code, and I think you are right changing the delay function did nothing. It seems that if I flash the unit while it is breathing it appears to update but nothing works. If I put it in safe mode then flash it I get the SOS message back. I have tried to downgrade the OS for now by flashing after selecting a lower version, but it does not do it. Is this a issue with Windows 7 64 bit? Also I get a long hash string as well. Is there a definition somewhere of what all this data means ?
Thanks
String:

"p":

14

"m":[

0:{

"s":

49152

"l":

"m"

"vc":

30

"vv":

30

"f":

"b"

"n":

"0"

"v":

311

"d":[]

}

1:{

"s":

671744

"l":

"m"

"vc":

30

"vv":

30

"f":

"s"

"n":

"1"

"v":

1301

"d":[

0:{

"f":

"b"

"n":

"0"

"v":

311

"_":

""

}

]

}

2:{

"s":

131072

"l":

"m"

"vc":

30

"vv":

30

"u":

"2FEDD691415462ED55A1F4A8EC2DEA16B4B12CB0E6D4F75DCAEC995CD98F6240"

"f":

"u"

"n":

"1"

"v":

6

"d":[

0:{

"f":

"s"

"n":

"1"

"v":

1104

"_":

""

}

]

}

]

"f":[]

"v":{}

}

An int cannot have a decimal component. Also, how can you get 1.5ms when one millisecond is the minimum resolution of millis()! Using micros() would allow you to time 1.5ms since that is 1500 microseconds. You need to review and fix your code.

1 Like

Just flashing a application targeted at a lower version will not downgrade the device OS.
To that you could use particle update in DFU Mode (which will currently put you back on the latest official release (1.2.1) or by downloading the system binaries from the GitHub repo and flash them.

1 Like

Thank you @ScruffR. The Leddar library is the one you helped me fix earlier. It runs without errors for hours on a Photon.

https://playground.arduino.cc/Code/Leddar/

The comment on the code below is valid of course :slight_smile: but should not cause a crash.

if (sensorPos == 0)

@shanevanj comment is valid of course and thank you for helping. The code (I could share) and variable definitions are below:

unsigned int cloudPoints[100][16][3];
unsigned long  vehArrTime = 0;
void sendMessage(int msgVal)
{
  char data[48];

  char comType[2];
  char ID[20];
  char numBuf1[4];
  char numBuf2[4];
  char numBuf3[4];
  char numBuf4[4];

  if (msgVal == 0)                     
  {
    sprintf(comType, "%i", 0);
    sprintf(ID, "%07i", lTruckID);

    strcpy(data, "$,");
    strcat(data, comType);
    strcat(data, ",");
    strcat(data, ID);
    strcat(data, ",0,0,0,0,0,^");

    Mesh.publish("ABC", data);         
  }

  if (msgVal == 1)                      
  {
    sprintf(comType, "%i", 1);
    sprintf(ID, "%07i", lTruckID);
    sprintf(numBuf1, "%03i", tIn);
    sprintf(numBuf2, "%03i", tWIn);
    sprintf(numBuf3, "%03i", scanNum);
    sprintf(numBuf4, "%03i", offCV);

    strcpy(data, "$,");
    strcat(data, comType);
    strcat(data, ",");
    strcat(data, ID);
    strcat(data, ",");
    strcat(data, numBuf1);
    strcat(data, ",");
    strcat(data, numBuf2);
    strcat(data, ",");
    strcat(data, numBuf3);
    strcat(data, ",");
    strcat(data, numBuf4);
    strcat(data, ",");
    strcat(data, "^");

    Mesh.publish("ABC", data);     
  }
}