Electron in weak connection area

I’ve got some Electron in weak cellular connection area that sometimes they stop send data.

I’ve same hardware / software at home and it runs very well for weeks / months, then my conclusion is that I have something wrong managing the connection to the cellular / cloud.

In the latest code I’ve put all the data sent by http directly instead event on the cloud, but after a while it stops anyway.

When I’ve got the chance to have a look on the device sometime is flashing green sometime is off.

I am running in semi-automatic mode and this is the main loop. Any suggestion?

void loop() {

    // Check cellular connection
    if(!Cellular.ready() && !Cellular.connecting()) {
        printLog("Cellular not ready, trying to connect..");
        Cellular.connect();
        delay(500);
        printLog("Now is " + Time.timeStr());
    }
    
    
    // Check particle connection
    if(!Particle.connected() && Cellular.ready()) {
        printLog("Not connected, trying to connect..");
        Particle.connect();
        delay(1000);
        Particle.keepAlive(30);
    }

    // Collect wind speed every 10 seconds and compute gust
    if(millis() - windMillis >= 10000) {
        windMillis = millis();
        float _windSpeed =  readWindMPH();
        if (_windSpeed > windGustMPH) windGustMPH = _windSpeed;
        printLog("Wind reading value [" + String(_windSpeed) + "] - gust value [" + String(windGustMPH) + "]");
    }
    
    // Go sleep 4 seconds (2 loop cycles) after publishing
    if (publishMillis > 0 && (millis() - publishMillis >= 4000)) {
        printLog("Publish done, go sleep");
        sleep();    
    }

    int _nowMinute = Time.minute();
    
    // Routines when connected and never published
    if(Cellular.ready() && publishMillis == 0) {
        
        // Publish interval 
        if(_nowMinute == 0 or _nowMinute % publishInterval == 0) { 
            publishMillis = millis();
            printLog("Ready to publish...");
            httpPublishRoutine();
            thingSpeakPublishRoutine();
        }
        
        // Force publish after 2min sec of run
        if(millis() - runMillis >= 2*60*1000) {
            publishMillis = millis();
            printLog("Two minutes publish watchdog");
            httpPublishRoutine();
            thingSpeakPublishRoutine();
        }
    }
    
    // Force reset after 5 minutes of run
    if(millis() - runMillis >= 300000) {
        printLog("Ten minutes running force reset");
        System.reset();
    }

    printLog("Loop cycle");
    delay(2000);
}

Why are you calling Particle.connect() if you are in Semi-Auto? The device should manage that reconnection, and I understand that if you call Particle.connect() while a connection is already in progress there can be some issues.

What System_Mode are you running? I’ve had problems in threaded operation but they may be unrelated to yours. Also, your keepAlive may not be getting set correctly if the system firmware is the one to initiate the reconnect, that condition will never be triggered. It also won’t get set properly if you are in threaded mode since a second may not be a long enough delay. Consider moving it to the cloud_status_connected event to be safe.

Regardless, there are a couple steps you should take to ensure reliability of a remote device. First, use the application watchdog with a long timeout just to catch a really catastrophic breakdown, at a minimum. It will be challenging to use anything under a couple minutes with your code, as Cellular.connect and Particle.connect can take a long time to return in single threaded operation.

Second, you should set a couple variables to check how long it’s been since you were connected to the cloud, and if the time exceeds a set timeout value, try resetting the cellular modem. You can start with an off and an on, and then if a secondary timeout still hits, turn off the modem and reset the electron. Below is an example of how I do that:

        if (enableAggregatorTimeoutReset && (millis() - lastInputReceived_time) > lastInputReceived_rst_timeout)
        {
            debugPrint(MSG_TYPE_DEBUG, "Aggregator Data Input Timed Out, setting reset flag");
            reset_reason_code = RESET_REASON_AG_DATA_TIMEOUT;   // reset if dataset took too long to complete
            resetNow = true;
        }

        if (enableParticleCloudTimeoutReset && ((millis() - lastPingParticleCloud_time) > lastPingParticleCloud_rst_timeout))
        {
            // Particle cloud connection lost for too long, let's do a full reset.
            debugPrint(MSG_TYPE_DEBUG, "Particle Cloud Connection Timed Out, setting reset flag");
            reset_reason_code = RESET_REASON_PARTICLE_CLOUD_TIMEOUT;
            resetNow = true;
            alsoResetModem = true;
        }

        if (enableMQTTTimeoutReset && MQTT_enabled && ((millis() - lastRespMQTT_time) > lastRespMQTT_rst_timeout))
        {
            // MQTT has timed out, let's do a full reset
            debugPrint(MSG_TYPE_DEBUG, "MQTT Connection Timed Out, setting reset flag");
            reset_reason_code = RESET_REASON_MQTT_TIMEOUT;
            resetNow = true;
            alsoResetModem = true;
        }


        if (cloudResetRequest_received)
        {
            debugPrint(MSG_TYPE_DEBUG, "Cloud Reset Request received, setting reset flag");
            reset_reason_code = RESET_REASON_CLOUD_REQUEST;
            resetNow = true;
        }


        if (resetNow)
        {
            SINGLE_THREADED_BLOCK()
            {
                setCB(CB_RESETTING);
                // A reset was triggered somewhere, handling after operations complete
                #if Wiring_Cellular
                if (alsoResetModem)
                {
                    // The reset is related to cellular connectivity, so also resetting modem first

                    debugPrint(MSG_TYPE_DEBUG, "Disconnecting from Particle cloud before resetting modem...");
                    Particle.disconnect();

                    delay(100);

                    debugPrint(MSG_TYPE_DEBUG, "Resetting Modem...");
            		// 16:MT silent reset (with detach from network and saving of NVM parameters), with reset of the SIM card
            		Cellular.command(30000, "AT+CFUN=16\r\n");

                    delay(100);

            		Cellular.off();

            		delay(200);
                }
                #endif

                // Now let's reset the electron

                debugPrint(MSG_TYPE_DEBUG, "Resetting Device...");

                reset_handler();
                System.reset(reset_reason_code);
            }
        }

Third, if the device going down is unacceptable, an external hardware watchdog is a great choice. That’s a hardware change (though you can retrofit one), but it’s a clutch one for those weird cases when something truly hits the fan. Make sure to be careful how you kick it, though. I use a TPL5010 since I want some longer timeouts.

Thanks for the reply, it will take time to understand your code and apply some suggestion to mine. I am running in SEMI-AUTOMATIC mode and as per documentation

The semi-automatic mode will not attempt to connect the device to the Cloud automatically.

then I call the Particle.connect in the loop. I am not using the SYSTEM THREAD.

Regarding your points:

1st - application WD sometime ends with panic (at least some firmware release ago) but can still be an option, I will look at it. The point is to get the right stack size for the callback function to me is still black magic…

2nd - I just check the time since the boot with

// Force reset after 5 minutes of run
if(millis() - runMillis >= 300000) {
    printLog("Ten minutes running force reset");
    System.reset();
}

and I modified this code with System.sleep(SLEEP_MODE_DEEP, 5); instead of simple reset

3rd - Yes hardware timer is my last option, with a timeout of 2 hours but is an hardware change that I would like to avoid.

4th - I’ve read in version 0.8.0 there will be back probably hardware watchdog from the controller, that would be a better option for my implementation.

I’ve changed also the connection code trying to kepp track of the number of failed cloud connections with

STARTUP(System.enableFeature(FEATURE_RETAINED_MEMORY));
retained int ParticleConnectionFails = 0;

in setup() and

// Check particle connection
if(!Particle.connected() && Cellular.ready()) {
    printLog("Not connected, trying to connect..");
    Particle.connect();
    if (waitFor(Particle.connected, 90000)) {
        #ifdef THINGSMOBILE
        Particle.keepAlive(30);
        #endif
        Particle.publish("ALIVE", String(ParticleConnectionFails));
        ParticleConnectionFails = 0;
    } else {
        ParticleConnectionFails++;
    }
}

maybe I will add also

Cellular.command(30000, "AT+CFUN=16\r\n");

somewhere…

So if you keep reading in the documentation it clarifies:

The semi-automatic mode will not attempt to connect the device to the Cloud automatically. However once the device is connected to the Cloud (through some user intervention), messages will be processed automatically, as in the automatic mode above.

The semi-automatic mode is therefore much like the automatic mode, except:
When the device boots up, setup() and loop() will begin running immediately.
Once the user calls Particle.connect(), the user code will be blocked while the device attempts to negotiate a connection. This connection will block execution of loop() or setup() until either the device connects to the Cloud or an interrupt is fired that calls Particle.disconnect().

All Semi-Auto functionally does is to delay the first connect until you explicitly call it. Once you've told the System firmware to connect, it will then automatically manage connectivity in the background between calls to loop(). Thus, if for example after connecting a while later the cell network drops, the device will likely attempt a reconnect before it hits that line in your code.

1st - application WD sometime ends with panic (at least some firmware release ago) but can still be an option, I will look at it. The point is to get the right stack size for the callback function to me is still black magic…

I've been using a stack size of 4096 to be extra safe, but 2048 should be sufficient. The black magic is simply - let the watchdog timeout, and keep increasing the stack size until it doesn't overflow.

2nd - I just check the time since the boot with

So you always reset / sleep every 5 minutes? SLEEP_MODE_DEEP isn't the same as a reset, though it should reset the modem I believe. When does runMillis get set? If it's only after reset then you would hit a condition where you keep sleeping all the time and never wake up for more than one loop. I don't believe millis() is reset after sleep iirc.

3rd - Yes hardware timer is my last option, with a timeout of 2 hours but is an hardware change that I would like to avoid.

Definitely not convenient, but it's saved my butt a few times and it gives you good peace of mind if implemented correctly.

4th - I’ve read in version 0.8.0 there will be back probably hardware watchdog from the controller, that would be a better option for my implementation.

Yep, but it could take a few various forms and there's no guarantee on if/when that will actually happen that I know of.

maybe I will add also

Cellular.command(30000, "AT+CFUN=16\r\n");

somewhere…

You only want to do that before you are going to turn off the modem and reset, unless you are very confident in what that will do for you otherwise. My understanding is that resets the SIM card, and I don't know what conditions bring it back online besides a reset (might be fine, but I wouldn't just throw it anywhere).

First of all, for my application the Cloud is nothing mandatory, I mainly use it to allow me to upload new code in this stage of test. Everything is done by http calls with httpClient lib.

Can it be managed by watchdog?

Thanks for the WD stack, what code you have in? I think just a reset after a WD trigger is not enough..

No, the code publish every 10 minutes then calculate the time to sleep to the next slot and go sleep with

System.sleep(SLEEP_MODE_DEEP, _seconds, SLEEP_NETWORK_STANDBY);

It publish every 10 minutes (0 - 10 - 20 - 30 - 40 - 50) during the day and every hour at 0 during the night.
I've just realized that there is no control about the result of this calculation I will set a IF to set at maximum one hour if value is more than one hour (can't happens, but..)

runMillis is set at the end of setup() just before start the main loop

let's leave for now the option Cellular.command(30000, "AT+CFUN=16\r\n"); I dont' want to manage even more strange conditions..

Folks, in a recent post, @ScruffR found that a stack size of 1536 bytes was the minimum required for a System.reset() only WD callback. :wink:

1 Like

Can it be managed by watchdog?

Not sure what you mean by that question. The watchdog will call a function if it times out. You can do whatever you want to in that function. I would generally use it only to reset, unless you have a specific case you need to address.

Thanks for the WD stack, what code you have in? I think just a reset after a WD trigger is not enough…

Why do you think that is not enough? You could add in the code to reset the modem into the WD trigger callback.

As far as the callback, just create a void function that does what you want, and then replace System.reset with it's name (no parenthesis) in the watchdog declaration. My callback looks like this:

void watchdogReset() {
  // Function called when software watchdog is triggered
  debugPrint(MSG_TYPE_RESET, "While at Code Block: " + resetReasonDataToString(getCB(), true));

  #if Wiring_Cellular
  if (System.resetReason() == RESET_REASON_USER && System.resetReasonData() >= CB_NULL)
  {
      // The last reset was also a watchdog reset, let's reset the modem to be safe.
      debugPrint(MSG_TYPE_DEBUG, "Disconnecting from Particle cloud before resetting modem...");
      Particle.disconnect();

      delay(100);

      debugPrint(MSG_TYPE_DEBUG, "Resetting Modem...");
      // 16:MT silent reset (with detach from network and saving of NVM parameters), with reset of the SIM card
      Cellular.command(30000, "AT+CFUN=16\r\n");

      delay(100);

      Cellular.off();

      delay(200);
  }
  #endif

  reset_handler();
  System.reset(getCB());
}

If the last reset was also a watchdog reset, I will then also reset the modem before resetting. I have some code in reset_handler that is called in all software reset conditions when possible that saves some data to my sd card and closes some files. I declare the watchdog with:

// Software Application Watchdog
//    Increased stack size from default (512) TO 4096 to fix stack overflow
//    Timeout value is in ms
//    reset with wd.checkin(), also automatically reset after loop() ends
//    Currently used only on the main loop
ApplicationWatchdog wd(120000UL, watchdogReset, 4096);  // timeout of 120 seconds

Ok I probably replicate something can happens:

during the power cycle after the cellular and cloud connection I’ve disconnected the antenna and with some aluminium foil as shield I got the Electron flashing green (lost cellualr connection) with the main loop stopped (it stop to write to the serial after each loop).
With the original software it stay in this state forever (ofcourse), strage was that during my try even after remove the foil and connect the antenna it didnt get back connection.

Now I’ve changed my code with this:

void loop() {

    // Check cellular connection
    if(!Cellular.ready() && !Cellular.connecting()) {
        printLog("Cellular not ready, trying to connect..");
        Cellular.connect();
        delay(500);
        printLog("Now is " + Time.timeStr());
    }
    
    // Check particle connection
    if(!Particle.connected() && Cellular.ready()) {
        printLog("Not connected, trying to connect..");
        Particle.connect();
        #ifdef THINGSMOBILE
        Particle.keepAlive(30);
        #endif
    }

    // Collect wind speed every 10 seconds and compute gust
    if(millis() - windMillis >= 10000UL) {
        windMillis = millis();
        float _windSpeed =  readWindMPH();
        if (_windSpeed > windGustMPH) windGustMPH = _windSpeed;
        printLog("Wind reading value [" + String(_windSpeed) + "] - gust value [" + String(windGustMPH) + "]");
    }
    
    // Go sleep 4 seconds (2 loop cycles) after publishing
    if (publishMillis > 0 && (millis() - publishMillis >= 4000UL)) {
        printLog("Publish done, go sleep");
        sleep();    
    }

    int _nowMinute = Time.minute();
    
    // Routines when connected and never published
    if(Cellular.ready() && publishMillis == 0) {
        
        // Publish interval 
        if(_nowMinute == 0 or _nowMinute % publishInterval == 0) { 
            publishMillis = millis();
            printLog("Ready to publish...");
            httpPublishRoutine();
            thingSpeakPublishRoutine();
        }
        
        // Force publish after 2min sec of run
        if(millis() - runMillis >= 120000UL) {
            publishMillis = millis();
            printLog("Two minutes publish watchdog");
            httpPublishRoutine();
            thingSpeakPublishRoutine();
        }
    }
    
    // Force reset after 5 minutes of run
    //if(millis() - runMillis >= 300000 || ParticleConnectionFails > 10) {
    //    printLog("Ten minutes running force reset");
    //    hardReset = true;
    //}

    printLog("Loop cycle");
    delay(2000);
}

using application watchdog

ApplicationWatchdog wd(240000UL, watchdogReset, 4096);

with the routine

void watchdogReset() {
    Serial.println("++++ Application watchdog ++++");
    System.sleep(SLEEP_MODE_DEEP, 15);
}

I hope that SLEEP_MODE_DEEP can reset as much as possible the modem as well.

I got also this from serial

1525296997 >>> Wind reading value [0.000000] - gust value [0.000000]
1525296997 >>> Loop cycle
1525296999 >>> Loop cycle
1525297001 >>> Loop cycle
1525297003 >>> Loop cycle
++++ Application watchdog ++++
1525297442 >>> Cellular not ready, trying to connect..

watchdog fired after 439 seconds instead 240, is it possible?