Communications locks up after a few hours

I have an irritating problem that I hope that one of the boffins can point me in the right direction.
I am using a photon to control 2 output pins which have relays connected to them - that all works correctly. I also have an I2C BME280 connected - that works correctly.
I am communicating to particle cloud when a relay is activated (on or off). I am also communicating with the board using mqtt both to control the relays and to get data from bme280 to my own mqtt server (I also tried an online mqtt server for testing). Everything seems to work correctly until the board has been powered for a good few hours - that’s when the trouble starts!

Communication to and from the board seems to stop: no message sent to the cloud, no mqtt messages published or subscribed to. The program in the photon still works because I programmed the led on D7 to flash at 1s intervals for testing and it still does. I can ping the photon but just no other comms. I cannot pinpoint when this happens - it works for hours and then suddenly I send a signal and no action or response.

Is there anything else I could check for? Obviously I have missed something critical??

(Just some background: I have been using them for the last 3 years and currently own 40 in many projects so am not a total newbie :wink: I also use many esp8266 based projects and they have no problem with comms using my same mqtt server.)

At this stage any help is appreciated :crossed_fingers:

@peter_howells, any chance you can post your code? The part about “a good few hours” tends to suggest a possible memory issue. Which DeviceOS version are you using?

Thanks for the response. Here is a copy of the current code running - I’ve now tried to simplify it completely but it still exhibits the same problem. Using the latest version 1.5.2
I forgot to mention - resetting the device brings it back online straight away but this is a poor solution - I suppose I could reset it in code every 12 hours but would rather not.

#include <MQTT.h>
#include <CE_BME280.h>

#define SEALEVELPRESSURE_HPA (1013.25)
CE_BME280 bme; // I2C

MQTT mqtt("192.168.0.14", 1883, callback);

int relay1State = 0;
int relay2State = 0;

float bmeTemperature;
float bmeHumidity;
float bmePressure;
float Dewpoint;
float currentTime;
float sensorValue;
float tempMin = 40;
float tempMax = 0;

long timerPublishData;
long timerFlash;

bool ledToggle;

bool D3State = false;
bool lastD3State = false;
bool D4State = false;
bool lastD4State = false;
bool flagOnce = false;

String mqttStr;
String tempStr;
String humidityStr;
String pressureStr;
String dewpointStr;

String DeviceID = "MQTT Relay Board v1";
String TimeNow;

int RELAY1_ADDR = 1;
int RELAY2_ADDR = 5;

void callback(char* topic, byte* payload, unsigned int length)  // mqtt receive message
{
    char p[length + 1];
    memcpy(p, payload, length);
    p[length] = NULL;
    
    Particle.publish("mqtt_rec", p);
    
    if (!strcmp(p, "relay1/off")) relay1(0);
    else if (!strcmp(p, "relay1/on")) relay1(1);
    else if (!strcmp(p, "relay2/off")) relay2(0);
    else if (!strcmp(p, "relay2/on")) relay2(1);
    else if (!strcmp(p, "tempReset")) reset_min_max();

    delay(200);
}

void reset_min_max()
{
    tempMin = bmeTemperature;
    tempMax = bmeTemperature;
    mqtt.publish("howells/home/rb1/tempMin", String(tempMin), true);
    mqtt.publish("howells/home/rb1/tempMax", String(tempMax), true);
    Particle.publish("Temperature", "min/max reset");
}

void relay1(int state1)
{
    digitalWrite(D3, state1);
    EEPROM.put(RELAY1_ADDR, state1);
    if (state1)
    {
        mqtt.publish("howells/home/rb1/relay1/status", "on");
        if (Particle.connected()) Particle.publish("relay1", "on");
    }
    else
    {
        mqtt.publish("howells/home/rb1/relay1/status", "off");
        if (Particle.connected()) Particle.publish("relay1", "off");
    }
}

void relay2(int state2)
{
    digitalWrite(D4, state2);
    EEPROM.put(RELAY2_ADDR, state2);
    if (state2)
    {
        mqtt.publish("howells/home/rb1/relay2/status", "on");
        if (Particle.connected()) Particle.publish("relay2", "on");
    }
    else
    {
        mqtt.publish("howells/home/rb1/relay2/status", "off");
        if (Particle.connected()) Particle.publish("relay2", "off");
    }
}

void temperature_pub()
{
    if (bme.begin()) getBME280Data();
    
    tempStr = String(bmeTemperature);
    humidityStr = String(bmeHumidity);
    pressureStr = String(bmePressure);
    dewpointStr = String(Dewpoint);

    if (mqtt.isConnected())
    {
        mqtt.publish("howells/home/rb1/temperature", tempStr, true);
        mqtt.publish("howells/home/rb1/tempMin", String(tempMin), true);
        mqtt.publish("howells/home/rb1/tempMax", String(tempMax), true);
        //mqtt.publish("howells/home/rb1/humidity", humidityStr, true);
        //mqtt.publish("howells/home/rb1/pressure", pressureStr, true);
        //mqtt.publish("howells/home/rb1/dewpoint", dewpointStr, true);
    }
    else
    {
        if (Particle.connected()) Particle.publish("MQTT", "not connected");
        mqtt.connect("rVd6K8onGMGQ0ZJgnaaW");
    }
}

void setup()
{
    //Particle.disconnect();

    pinMode(D3, OUTPUT);
    pinMode(D4, OUTPUT);
    pinMode(D7, OUTPUT);
    
    mqtt.connect("rVd6K8onGMGQ0ZJgnaaW"); //connect to mqtt server
    
    if (mqtt.isConnected()) //subscribe to messages
    {
        mqtt.subscribe("howells/home/rb1/power");
        if (Particle.connected()) Particle.publish("MQTT", "subscribed");
    }
    
    // initial state of relays same as before power was lost
    EEPROM.get(RELAY1_ADDR, relay1State);
    EEPROM.get(RELAY2_ADDR, relay2State);
    if (relay1State) relay1(1);
    if (relay2State) relay2(1);
    
    if(!bme.begin()) 
    {
        if (Particle.connected()) Particle.publish(DeviceID, "BME280 Not found");
    }
    
    timerPublishData = millis();
}

void loop()
{
    if (millis() >= timerFlash + 1000)
    {
        timerFlash = millis();
        ledToggle = !ledToggle;
        digitalWrite(D7, ledToggle);
    }
    
    if (millis() > timerPublishData + 60000)
    {
        timerPublishData = millis();
        temperature_pub();
    }
    
    if (mqtt.isConnected()) {
        mqtt.loop();
    }
}

void getBME280Data()
{
    bmeTemperature = bme.readTemperature();
    
    if (bmeTemperature > tempMax) tempMax = bmeTemperature;
    if (bmeTemperature < tempMin) tempMin = bmeTemperature;

    bmeHumidity = bme.readHumidity();
    bmePressure = (bme.readPressure()/100);
    Dewpoint = bmeTemperature - ((100 - bmeHumidity)/5);
}

I kinda forget but after a while thre’s some mqtt glitch every once in a while and it disconnects… or you could have a cellular dropout… thats fine, but you never reconnect. it won’t reconnect by itself…

if(mqtt.isConnected())) // this line should have an else where it resubscribes

1 Like

One of the comments around issues like this will be (often mentioned by myself :blush:) to replace String with character arrays (aka C/C++ strings).

oh yea, theres a bunch of global strings that get changed here… if you make all those strings local inside the fxn call, you might just see this go away :slight_smile: i think you can do this too… using c strings is more efficient but kinda clunky sometimes…

Not necessarily since the underlying issue with String is heap fragmentation which will still be happening with local variables as they won’t be stack stored as “normal” local/automatic variables are.
Also creating/deleting lots of temporary/intermediate String objects will still contribute to the issue.

Thanks chaps for all the suggestions/pointers. I will make a few changes and keep monitoring. Damned if I’ll let this get the better of me.

Thanks for this suggestion. I was not aware that using type String had a problem. I have done a search and looked at the docs but cannot find any reference to this. Are you able to point me to the information regarding Strings and char arrays so that I can read up on it to better understand what is going on?
Thanks

You can browse the forum for String and “heap fragmentation” to find plenty of posts.
But snprintf() will often pop up in these. Reading up on that and also its siblings (printf(), strcpy(), strcmp(), …) should get you set on replacing String with C-strings.

1 Like

Thanks for the links @ScruffR . Hopefully I now have a better understanding although I still don’t understand the full reasoning behind why Strings should not be used.

Just to be sure, is the following code in temperature_pub() correct in the use of char - the code compiles and works but I need to be sure that what I have done is correct.

float bmeTemperature = 22.5;
long timerPublishData;

char temp[5];

void temperature_pub()
{
    snprintf(temp, sizeof(temp), "%.1f", bmeTemperature);
    Particle.publish("temperature", temp);
}

void setup()
{
    if (Particle.connected()) Particle.publish("IP_Address", WiFi.localIP().toString().c_str());
}

void loop()
{
    if (millis() > timerPublishData + 60000)
    {
        timerPublishData = millis();
        temperature_pub();
    }
}

String objects have an initial buffer for their contents of 16 bytes if the string outgrows that the object will acquire a buffer double the previous size which may result in relocation and leaving behind a 16 byte chunk of again “available” space but if it happened that some other object also required some space before freeing that space you’ll have a confined block (aka fragment).
When this happens you may have hundreds of KB free space but no consecutive block big enough for other objects that may require a few KB at once (e.g. the network stack).

That’s why this only happens after some time as the fragmentation is a gradual process.

Thanks for that explanation - makes total sense. However, your explanation also leads me to think that this is then not the problem in my original code as the few strings I had created were specifically for conversion of float values to strings for transmission via mqtt and none of then were more than 16 bytes in size ever - the strings would never outgrow the allocated 16 bytes so no fragmentation (or am I being a bit optimistic in this outlook?).

Anyway I have started rewriting my original code removing all “Strings” just in case :roll_eyes:

This still applies to temporary objects that come and go (as local variables do).
They will still leave 16 byte fragments behind.
Also any String() conversion will create temporary objects that will potentially leave fragments behind - and you have lots of them :wink:

1 Like

SOLVED
The code has been running for the last 24 hours and no lock-ups :grinning:
As per the suggestions in this thread: I removed all references to Strings and replaced them with char arrays - thanks @ScruffR .
I also stupidly forgot to resubscribe to mqtt topics after reconnecting as @ccunningham alluded to and added in this code.
Not sure which one worked but between them all is good.
Many thanks chaps :+1:

2 Likes