MQTT-TLS - disconnects at broker but client.IsConnected() is TRUE

I’m using the MQTT-TLS library with AWS IoT, using it to publish data from sensors periodically.

I am finding that everything works for ~15 minutes, after which the client disconnects (according to AWS CloudWatch). However based on the serial output the IsConnected state is still TRUE.

Here’s my code:

void setup() 
{
    
    Serial.begin(9600);

    WiFi.macAddress(mac);
    
    sprintf(macAddr,"%02x%02x%02x%02x%02x%02x", mac[0], mac[1], mac[2], mac[3], mac[4], mac[5]);
    
    if (bme.begin())
    {
        Serial.println("BME280 Sensor ready.");
    }
    else
    {
        Serial.println("BME280 Sensor ERROR!");
    }
    
    client.enableTls(
        amazonIoTRootCaPem, sizeof(amazonIoTRootCaPem),
        clientKeyCrtPem, sizeof(clientKeyCrtPem),
        clientKeyPem, sizeof(clientKeyPem));
      
    client.connect("macAddr");
    
    if (client.isConnected()) {
        Serial.printlnf("Client Connected");
    }
}

void loop() 
{
    int temp, pressure, humidity;
    
    getBMEValues(temp, pressure, humidity);
    
    timeStr = Time.format(Time.now(), TIME_FORMAT_ISO8601_FULL);
    
    Serial.printlnf("Pushing data to AWS"); 
    
    if (client.isConnected() == FALSE)
    {
        client.connect("macAddr");
    }

    if (client.isConnected()) 
    {
        //  temp
        sprintf(publishString,"[{\"device\": \"%s\", \"timestamp\": \"%s\", \"temperature\": \"%d\"}]",macAddr,timeStr.c_str(),temp);
        client.publish("outTopic/temperature", publishString, MQTT::QOS1);
        Serial.printlnf(publishString);
         
        //  humidity
        sprintf(publishString,"[{\"device\": \"%s\", \"timestamp\": \"%s\", \"humidity\": \"%d\"}]",macAddr,timeStr.c_str(),humidity);
        client.publish("outTopic/humidity", publishString, MQTT::QOS1);
        Serial.printlnf(publishString);
        
        //  pressure
        sprintf(publishString,"[{\"device\": \"%s\", \"timestamp\": \"%s\", \"pressure\": \"%d\"}]",macAddr,timeStr.c_str(),pressure);
        client.publish("outTopic/pressure", publishString, MQTT::QOS1);
        Serial.printlnf(publishString);
    }
    
    Serial.printlnf("AWS Connection:");
    Serial.printlnf(client.isConnected() ? "true" : "false");
    Serial.printlnf("Timestamp:");
    Serial.printlnf(timeStr);
    Serial.printlnf("----------------");
    
    delay(60000);
}

This is what I see in AWS CloudWatch:

2020-09-03 23:05:11.370 TRACEID:c49e90da-e503-664e-d350-d34f8173ff23 PRINCIPALID:d1ec65ce078d0dc12227bba247835173f4e255263c7028bd385918c54eeb6338 [INFO]  EVENT:MQTT Client Disconnect MESSAGE:Disconnect Status: SUCCESS

2020-09-03 23:05:11.370 TRACEID:c49e90da-e503-664e-d350-d34f8173ff23 PRINCIPALID:d1ec65ce078d0dc12227bba247835173f4e255263c7028bd385918c54eeb6338 [INFO]  EVENT:MQTT Client Disconnect MESSAGE: IpAddress: X.X.X.X SourcePort: 57807

(I swapped out the IP for X.X.X.X)

The serial output continues to show that client.isConnected() is TRUE though.

This library is contributed by a community member and hence not officially supported by Particle - particularly since there is no native TLS support for TCPClient in device OS.

I guess it’s best to ping the user who contributed it in the first place - in this case @hirotakaster

1 Like

@00_simon_00
maybe
delay(60000);
This delay is too short. Try to update to 30sec or re-connect to AWS IoT is better.

@hirotakaster, I’m not quite sure whether this makes sense :wink:

When the delay of 60 seconds is already too short how would reducing it to 30 seconds solve the issue :confused: ?

@00_simon_00, I’m not a big fan of delay() anyhow.

In order to reduce the cadence of loop() I’d rather suggest something like this

const uint32_t msLoopCadence = 60000;

void loop() {
  static uint32_t msLoop = 0;

  // do all stuff you want done at high rate

  if (millis() - msLoop <  msLoopCadence) return;
  msLopp = millis();

  // do all the stuff that only wants to be done occasionally
}

or a fully fledged FSM.

@ScruffR

There are various causes of TCP(MQTT/TLS) disconnection errors.
One of that, why “60 -> 30 sec” is several WiFi routers(or servers) are force disconnect the tcp session about less than 30 sec when that TCP session don’t being data transmission.

And MQTT Keep Alive (TCP ping-back) is called in client.loop() function but this code don’t call this function, TCP communication depends on client.pupblish().

so I think this 60sec delay maybe one of the causes of the disconnection.

I’ve used this with various delays down to 5000 - it still disconnects.

What does client.loop() do?

I’ll try using an alternative to delay, I had something like that initially but it wasn’t publishing. I’ll try again though. An alternative (assuming a cadence of 60 seconds) would be to explicitly disconnect and connect each loop.

What’s confusing is that the broker shows it disconnected but the client.IsConnected() state is true.

@00_simon_00

client.loop() func is check the MQTT keep-alive timeout and run call-back(subscribe) function.

I was just wondering, is the sample source code ( a2-example) working or not on your environments?

I see, so you mean 60 sec may be too long rather than too short.

Yes, but maybe this disconnection problem have another causes.

I tried the following:

  1. Using 30000 and 10000 for the delay param - no difference, still stops after ~10-15 mins.

  2. Using a different wifi network (same internet router though) - no difference, still stops after ~10-15 mins.

  3. Disconnecting and reconnecting each loop (this is worse, it doesn’t reconnect again after the first disconnect).

  4. Added the client.loop() - no difference, still stops after ~10-15 mins.

  5. a2-example does work.

The really weird things is, as I mentioned initially, AWS registers a disconnect but the client still thinks it’s connected and is trying to send data.

I’m going to try the alternative method that @ScruffR mentioned…

Hmmm, thanks you for your testing. WiFi & router is may not matter.
If a2-example working and not disconnected, update your app from a2-example maybe good.

BTW, when I got a networking problem with MQTT/TLS, I do next debugging.

  1. comment out MBEDTLS_DEBUG_C (src/mbedtls/config.h) and DEBUG_TLS (src/MQTT-TLS.h), you could check the MQTT/TLS debug message.
  2. Check the device’s packet capture using by Wireshark.

I’ve found that if a connection is unsuccessful I usually have to re-enable Tls before reconnecting to ensure success. Try calling client.enableTls(... before each connection attempt.

Also, I believe default keepalive for AWS IoT is 15seconds, which means that you have to either publish or call client.loop() at least once every 15 seconds. There is no disadvantage to calling client.loop() more often, as it will only ping once 15 seconds have passed (IIRC), since the AWS broker waits for 2x the timeout before terminating the connection.

Use something like this to publish regularly:

//...
uint32_t last_pub = 0;
const uint32_t pub_interval = 60000;
//...

loop() {
   client.loop();
   //...
   if (!client.isConnected()) {
      client.enableTls(
        amazonIoTRootCaPem, sizeof(amazonIoTRootCaPem),
        clientKeyCrtPem, sizeof(clientKeyCrtPem),
        clientKeyPem, sizeof(clientKeyPem));
      
       client.connect("macAddr");
   }

   if (client.isConnected() && (millis() - last_pub) > pub_interval) {
      last_pub = millis();
      // publish here...
   }
   //...
}

Give it a shot and see if this approach helps you