Electron went silent after 10 days

We have been testing the Electron board to do remote temperature monitoring. It is a simple sketch publishing temperature data to the Particle cloud every hour. Everything was fine until this morning when I noticed that the unit have not reported a temperature since yesterday at 6pm… I checked the webhooks logs thinking that it might be an API error but the log showed no activity and no errors. I restarted the Electron by unplugging the battery and it is working fine again.

My question is: is there a way to remotely reset the Electron in case it freezes?.. Perhaps something to build in the code or hardware to force a reset in case the publish event doesn’t occur after n number of cycles?
Once we deploy the units to the field we won’t be able to reset it manually. How can I troubleshoot this issue? Any ideas?

Thank you!

Otto Krauth

This might be of interest https://docs.particle.io/reference/firmware/electron/#application-watchdog

Yes… thank you!!!

I noticed that the article mentions: “The application watchdog requires interrupts to be active in order to
function. Enabling the hardware watchdog in combination with this is recommended, so that the system resets in the event that interrupts are not firing.”

How would one an interrupt? Are there any examples of the Hardware Watchdog?

[quote="okrauth, post:3, topic:26713"]
How would one an interrupt?
[/quote] Not sure what you are asking. However, your code should not be disabling interrupts anywhere ideally. For the hardware watchdog, have a look here:

:smile:

In addition to the application watchdog there are a couple things you can add. One issue is that reset doesn't reset the modem, so going into deep sleep for several seconds or using the command to reset the mode is also a good idea. This post has some more tips as well:

1 Like

Thank you!!.. I will check it out.

I’ve added the Watchdog code: " ApplicationWatchdog wd(60000, System.reset);" and “wd.checkin();” to my script… but today the Electron crashed again… a solid red light instead of the breathing cyan… I plugged the USB cable in hoping to see if the script was still running but the serial port is not available.

Before I look at other solutions (hardware Watchdogs)… is there anything else I can try? I was under the impression that the software Watchdog routine was on a safe stack and if no check-in occurred it will force a system.reset()…

Thanks for your help.

Otto

@okrauth, for the watchdog not to work means that your (or some) code is crashing FreeRTOS entirely as in stack overrun, memory writing out of boundary, div zero, etc. I believe system firmware 0.6.0-rc2 implements a System Reset Reason but I haven't seen any documentation yet. Some information on codes can be found here:

Getting to the root of the problem would be good.

@peekay I will simplify my code and update the firmware to 0.6.0rc2 with the resetReason as suggested and report back.

Before I do the firmware upgrade from 0.5.0 to 0.6.0rc2 I will copy the exact same code I have running on a photon (for several weeks with no issue) over to the electron and see if the crash happens again… The only difference between the photon code and the electron is the fuel gauge call for battery voltage and cellular signal strength… so perhaps that could be the culprit.

1 Like

UPDATE: The electron again went silent after 12 days or so… for comparison I flash a Photon with the same code and it is still going. I am wondering why the Watchdog did not recover the Electron… I did notice the led going through a rapid flashing pattern but never went into its normal cyan breading. I am wondering if I am implementing the watcdog wrong… Here is my code:



 //SETUP THE SOFTWARE WATCHDOG
ApplicationWatchdog wd(30000, System.reset);

#include "OneWire/OneWire.h"
 int lcount = 0;
 int xcount =0;
 char temp[129];
String cpost;
String SAD;

OneWire ds = OneWire(D2);  // on pin 10 (a 4.7K resistor is necessary)
unsigned long lastUpdate = 0; 
void setup() {
  Serial.begin(9600);
}

void loop() {
 byte data[12];
 byte type_s;

   double celsius, fahrenheit;
   double bat = 0.00;
   bool success;


        //lastUpdate = now;
        byte i;
        byte present = 0;
        byte addr[8];
        lcount++;
        xcount++;

      if ( !ds.search(addr)) {
          lcount=0;
         
    
    Serial.print(" POST STRING: ");
    // remove the last character
    cpost = cpost.substring(0, cpost.length() - 1);
    Serial.println(cpost);
    //cpost=""; // for test only
    
    //temp.toCharArray(cpost, 125);
       // HERE should we send to Cloud Publish
        if(xcount>300 ){ // 600 is one hour
           success = Particle.publish("TA", cpost, PRIVATE);
            if (!success) {
                    // get here if event publish did not work
                    Serial.println("POST ERROR");
                    // Maybe add a counter and publish to a log
                } else {
                    Serial.println("*** POST ***");
                }
            xcount=0;
     
          
            
        }
        
        cpost="";
    
        ds.reset_search();
        
        return;
      }
            // the first ROM byte indicates which chipp
      switch (addr[0]) {
        case 0x10:
          Serial.println("Chip = DS18S20");  // or old DS1820
          break;
        case 0x28:
          Serial.println("Chip = DS18B20");
          break;
        case 0x22:
          Serial.println("Chip = DS1822");
          break;
        default:
          Serial.println("Device is not a DS18x20 family device.");
          return;
      }

//
 ds.reset();
 delay(600); // Otto this delay seem to stabilze reading at least 3 sensors continuously
  ds.select(addr);
  ds.write(0x44, 1);        // start conversion, with parasite power on at the end
 
  delay(1000);     // maybe 750ms is enough, maybe not

 
  present = ds.reset();
  ds.select(addr);    
  ds.write(0xBE);         // Read Scratchpad
 
 
   Serial.print("Data: ");
  
  for ( i = 0; i < 9; i++) {           // we need 9 bytes
    data[i] = ds.read();
  
  }

        
     SAD = String(String(addr[7],HEX)); // Get the last  2 characters of the address (bit address 7)
      //Serial.println();
     Serial.print("CRC: ");
     Serial.print(OneWire::crc8(data, 8), HEX );
     Serial.print(" / ");
     Serial.println(data[8], HEX);
     
      if (OneWire::crc8(addr, 7) != addr[7]) {
          Serial.println("CRC is not valid!");
          return;
      }
      
      // Check Data integraty by CRC
        if (OneWire::crc8(data, 8) != data[8]) {
          Serial.println("**** Data packet was not valid ****");
           xcount=xcount-2; // to allow for an extra loop count
          return;
      }

  // Convert the data to actual temperature
  // because the result is a 16 bit signed integer, it should
  // be stored to an "int16_t" type, which is always 16 bits
  // even when compiled on a 32 bit processor.
  int16_t raw = (data[1] << 8) | data[0];
  
  
  if (type_s) {
    raw = raw << 3; // 9 bit resolution default
    if (data[7] == 0x10) {
      // "count remain" gives full 12 bit resolution
      raw = (raw & 0xFFF0) + 12 - data[6];
    }
  } else {
    byte cfg = (data[4] & 0x60);
    // at lower res, the low bits are undefined, so let's zero them
    if (cfg == 0x00) raw = raw & ~7;  // 9 bit resolution, 93.75 ms
    else if (cfg == 0x20) raw = raw & ~3; // 10 bit res, 187.5 ms
    else if (cfg == 0x40) raw = raw & ~1; // 11 bit res, 375 ms
    //// default is 12 bit resolution, 750 ms conversion time
  }
  celsius = (float)raw / 16.0;
  fahrenheit = celsius * 1.8 + 32.0;
  
  sprintf(temp, "%.2f", celsius );
  
 
  
 // build the post string to send to cloud
 cpost+=SAD.toUpperCase()+","+temp+",";


    ds.reset(); // reset sensor loop
    delay(6000);    

 
wd.checkin(); // resets the AWDT count
}

Any ideas?

thanks

Otto