Software Timers stop to work after time

I have a peculiar problem that i am unable to track. I have setup 3 Timers that execute callbacks getting sensor data. They work fine initially but after a few hours they completely stop working. The rest of the program works fine. I have schedules inside the program using TImeAlarms and they work fine. I can call the individual callbacks the timers are supposed to run and they return the right data. I have not called Timer.stop() nor there is any blocking code.

Are these RTOS software timers, or TimeAlarm timers or Nextion display timers?

RTOS timers. The TimeAlarm based ones are working fine. I am not calling any display timers from the MCU.

What exactly is happening in those timer callbacks and how frequent are they.
Since you’ve only got a pool of four (I think) timers and they run on one thread, you might somehow build up too much lag between all of them to “deadlock” them alltogether (just a hypothesis :blush:)

10 seconds reads 4 DHT 22’s. and an i2c pressure sensor.
15 seconds updates variables on the touchscreen. Does not call any of the sensors. Just reads them from previous calculations.
20 seconds performs an 12c write from the tentacle shield. This activates a TimeAlarms timer for 2 seconds to read between read from the atlas sensors. Once read and write is complete the timer is called free and invalid. Everything is supposed to repeat in 20 seconds.

Since this seems to be a seperate issue I split this off into its own thread. Maybe some RTOS geeks like @peekay123 or @mdma might chime in that way.

@Ali, Software timers are run in a single separate thread with all timers running round-robin style (one after the other). So if any timer callback stalls, ALL timers will stall. You need to look at each timer callback to look for any blocking code.

Another thing to remember is that threads don’t have infinite stack allocated to them. You should look at the local vars in each callback to ensure you don’t have excessive allocation requirements (all local vars go on the stack unless declared as static).

1 Like

There is no blocking code in the callbacks. While these Timers are failing my TimeAlarms based ones are working fine and i can call any funcion from the cloud and get sensor readings. I have setup a few diagnostic statements inside each callback and setup IFTT on a google sheet to see if i can find more info on when this happens. I will let you know if i find out anything.

Thank you for the information on TImers!

@Ali, TimeAlarms are run in the user thread and are based on millis(). Software Timers are run in a separate thread and operate entirely differently. Calling cloud functions within timers is not recommended. Can you post you callback code?

2 Likes

@peekay123
Declared as

Timer valueUpdate(15000, updatePageValues); // updates variables on page
Timer updateSensors(10000, getSensorData); // timer for reading sensor data
Timer updateEC_pH(20000, getTDS_EC); // timer for reading sensor data

callbacks

void updatePageValues() {

    if (pageNumber >= HOME_PAGE) {
        formatDateTime();
        dateText.setText(Time.format("%F"));

        if (whichLevel == LEVEL1_PAGE && pageNumber == LEVEL_PAGE) {
            t1Level.setText(formatTemperature(&valuesDHT1));
            t3Level.setText(formatHumidity(&valuesDHT1));
            getWaterTime(&levelOneObj);
            t2Level.setText(waterTime);
        } else if (whichLevel == LEVEL2_PAGE && pageNumber == LEVEL_PAGE) {
            t1Level.setText(formatTemperature(&valuesDHT2));
            t3Level.setText(formatHumidity(&valuesDHT2));
            getWaterTime(&levelTwoObj);
            t2Level.setText(waterTime);
        } else if (whichLevel == LEVEL3_PAGE && pageNumber == LEVEL_PAGE) {
            t1Level.setText(formatTemperature(&valuesDHT3));
            t3Level.setText(formatHumidity(&valuesDHT3));
            getWaterTime(&levelThreeObj);
            t2Level.setText(waterTime);
        } else if (whichLevel == LEVEL4_PAGE && pageNumber == LEVEL_PAGE) {
            t1Level.setText(formatTemperature(&valuesDHT4));
            t3Level.setText(formatHumidity(&valuesDHT4));
            getWaterTime(&levelFourObj);
            t2Level.setText(waterTime);
        } else if (pageNumber == RESERVOIR_PAGE) {
            convertPressureReadingToLevel();
            t0Reservoir.setText(waterLevelPercentage);
            t2Reservoir.setText(waterTemperatureCels);
            t3Reservoir.setText(pH);
            t1Reservoir.setText(tds);
        } else if (pageNumber == HOME_PAGE) {
            convertPressureReadingToLevel();
            setHomePagePics();
        }
void getSensorData() {
    getDHTdata();
    getPressureSensorData();
}

DHT data is polling from 4 DHT22 sensors. Its a copy paste of the code provided

void getPressureSensorData() {
    unsigned int data[4] = {0, 0, 0, 0};

    // Request 4 bytes of data
    Wire.requestFrom(pressureSensorAddress, 4);

    // Read 4 bytes of data
    // pressure msb, pressure lsb, temp msb, temp lsb
    if (Wire.available() == 4) {
        data[0] = Wire.read();
        data[1] = Wire.read();
        data[2] = Wire.read();
        data[3] = Wire.read();
    }`
    // Convert the data
    pressure = ((data[0] & 0x3F) * 256.0 + data[1]);
    rawPressure = pressure;
    waterTemp = ((data[2] & 0xFF) * 256.0 + (data[3] & 0xE0)) / 32;

    pressure = ((pressure - 1638.0) / ((13107.0) / 10.0));
    waterTempC = ((waterTemp * 200.0) / 2048) - 50.0;
    waterTempF = (waterTempC * 1.8) + 32;

    // Output data to dashboard
    debugSensors(String::format("mbar %.3f, oC: %.2f, oF:%.2f", pressure, waterTempC, waterTempF));
    waterTemperatureCels = String::format("%.f C", waterTempC);
    waterTemperatureFah = String::format("%.f F", waterTempF);
    convertPressureReadingToLevel();
}

void convertPressureReadingToLevel() {
    if (settingsObj.zeroSetVal != 0 && rawPressure >= settingsObj.zeroSetVal) {
        double diff = rawPressure - settingsObj.zeroSetVal;
        double upperLimit = 100.0 / 140.0;
        waterLevelValue = diff * upperLimit;
    }
    else if(rawPressure < settingsObj.zeroSetVal)
    {
        waterLevelValue = 0;
    }
    waterLevelPercentage = String::format("%.f %%", waterLevelValue);
}
AlarmID_t readECpH = INVALID_ALARM; //Timer for delay in between readings

void stopWaitingForWaterSensors() {
    Alarm.disable(readECpH);
    Alarm.free(readECpH);
    readECpH = INVALID_ALARM;
}

/**
 * Delay workaorund for the sensor
 * @param waitSecs int with seconds of delay
 * @param function that runs after the delay
 */
void waitForWaterSensors(int waitSecs, callBack_t function) {
    stopWaitingForWaterSensors();
    readECpH = Alarm.timerOnce(waitSecs, function);
}

/**
 * This will query the EC and PH.
 * @param command is a String command
 * @param i2cAddress is the address of the sensor
 */
void requestFromWaterSensors(String command, int i2cAddress) {
    Wire.beginTransmission(i2cAddress); //call the circuit by its ID number.
    Wire.write(command); //transmit the command that was sent through the serial port.
    Wire.endTransmission(); //end the I2C data transmission.    
}

/**
 * This will receive data from the sensors
 * @param command is a String command
 * @param i2cAddress is the address of the sensor
 * @param sensorData is the char[] that holds data from the sensor
 */
void receiveFromWaterSensors(String command, int i2cAddress, char* sensorData) {
    Wire.requestFrom(i2cAddress, 48, 1); //call the circuit and request 48 bytes (this is more then we need).
    code = Wire.read(); //the first byte is the response code, we read this separately.

    while (Wire.available()) { //are there bytes to receive.
        in_char = Wire.read(); //receive a byte.
        sensorData[i] = in_char; //load this byte into our array.
        i += 1; //incur the counter for the array element.
        if (in_char == 0) { //if we see that we have been sent a null command.
            i = 0; //reset the counter i to 0.
            Wire.endTransmission(); //end the I2C data transmission.
            break; //exit the while loop.
        }
    }

    switch (code) { //switch case based on what the response code is.
        case 1: //decimal 1.
            Serial.println("Success"); //means the command was successful.
            //debug("Success");
            break; //exits the switch case.

        case 2: //decimal 2.
            Serial.println("Failed"); //means the command has failed.
            //debug("Failed");
            break; //exits the switch case.

        case 254: //decimal 254.
            Serial.println("Pending"); //means the command has not yet been finished calculating.
            //debug("Pending");
            break; //exits the switch case.

        case 255: //decimal 255.
            //Serial.println("No Data"); //means there is no further data to send.
            //debug("No Data");
            break; //exits the switch case.
    }

    if (i2cAddress == EC_ADDRESS) {
        convertECstring();
    } else if (i2cAddress == PH_ADDRESS) {
        pH = sensorData;
    }
    //debug(sensorData);
}

/**
 * receives the ph readings
 */
void getpH() {
    receiveFromWaterSensors(READ, PH_ADDRESS, pH_data);
    stopWaitingForWaterSensors();
}

/**
 * Receives the EC readings and queries for pH readings
 */
void getEC() {
    receiveFromWaterSensors(READ, EC_ADDRESS, ec_data);
    requestFromWaterSensors(READ, PH_ADDRESS);
    waitForWaterSensors(2, getpH);
}

/**
 * Takes Ec and pH readings based on the timer
 */
void getTDS_EC() {
    requestFromWaterSensors(READ, EC_ADDRESS);
    waitForWaterSensors(2, getEC);
}

What i meant by calling cloud functions was calling the sensor read functions through particle function over the cloud to see if the sensors were functioning when the timers were not.

@Ali, “polling” the DHT sensors suggests there may be blocking code waiting on data. The catch with your code is the “hidden” actions performed by the functions be called. For example, updatePageValues() calls the Nextion display functions which then send data to (most likely) Serial1. Any user code sending to Serial1 at the same will cause conflicts and possibly stall the timer (and all others). Timers won’t conflict with each other (since they run one after the other) but they could conflict with user code on unique resources (eg Serial, GPIO, etc) since they run in a separate thread.

If you have three Particle.variables() available, I suggest adding a global unsigned long variable that you set to one when entering the callback and set to zero when exiting the callback. You can then monitor the variables to see which timer stalls (does not return to zero). This approach is not foolproof but it’s a start.

1 Like

Maybe “polling” is wrong wording for calling the DHT get function

while (DHT1.acquiring());
    printSensorData(&DHT1);
    getDHTvalues(&DHT1, &valuesDHT1);

The updatePageValues has a condition that works on page numbers. The page number is automatically zero when the TS is not in use and the timers stop responding when i leave them running overnight.

I will also do the Particle.variable suggestion. I have my doubts on the atlas sensors though. Maybe calling them on separate timers causes a clash. I just had everything on i2c fail. I had to unplug everything to get it to work again.

@Ali, software timers are great but you may be overloading them. You may want to consider doing the classing millis() timers in loop() instead (proven method, been around since Arduino first came to life). Same idea, callbacks become plain functions and most everything stays the same. However, thread conflicts now disappear and it is easier to lock down issues. If you still want timers, simply set flags in those timers that you sample in loop() to call what would normally be the callbacks.

That’s a nice example for a possible hang-code!
If DHT1 ever fails to return (stop acquiring) for whatever reason, you’re doomed.
Adding a timeout, or even better just checking once and bail out if still acquiring and just collect the result next time round might be worth considering.

Yes that seems like the logial thing to do, Great idea on the flags. I will set another TimeALarms based timer since that is millis() based. Let’ see if that fixes the issue.

Will change the while to an if and see if that changes anything.

@Ali, changing the while to an IF means you will only sample when the timer fires. You may want to step back and reconsider the overall architecture of your code before changing stuff. For example, is blocking in loop acceptable or not. If not, then you will need to most likely use one or more state machines to sample sensors in order to prevent blocking.

Actually sampling only when the timer fires is acceptable to the purpose. If new values are got every 10 seconds (when the timer fires) it still works for the application. Would not want anything blocking the loop since there is a bunch of critical functions running in there. One of them uses your clickButton library.

For now i have set the code to use just one Timer and put all the functions in that callback. Have set up 3 variables in each of the individual callBacks to see if i can find out if something blocks. Will leave it running this weekend and see.

Thank you for he suggestions.

Update:-

I reduced the number of Timers to just 1 and put all the sensor reading functions in that callback. The other change i made was to the while statement in the DHT sensors. It has been running fine. I will try changing back to while from if and see if that was what the real problem was.