Way to automatically powercycle if can't connect to mesh network?

YES, at least I've done it with Borons in the past.
A GPIO pin can pull the EN Pin low causing a complete Power Cycle (Shutdown) and immediate reboot.
Should work the same for a Xenon.

3 Likes

@Rftop awesome! I’ll be testing this tomorrow

I connected the EN Pin to D5 with a 47k Resistor.

You could also use the System.reset() function for a more managed approach :slight_smile:

here are some code snippets I use (with SYSTEM_THREAD(ENABLED) to get a handle on resets etc. You could enable the system watchdog early and have a cloud connection test in setup() or loop() that calls the reset function below - or the watchdog resets it if the cloud function hangs up.


enum enum_rebootReasons
{
    RESET_UNKNOWN,
    RESET_NETWORK_FAILED,
    RESET_TVIEW_FAILED,
    RESET_CLOUD_FAILED,
    RESET_COMMAND,
    RESET_GRACEFULL,
    RESET_REQUEST,   //reset requested manual or app/formware update
    RESET_ATTRIBUTE, // reset after attribute updates for brk or udp
    numOfResetReasons
};

const char resetReasonsText[numOfResetReasons][sizeOfResetReasonText] = {
    "Unknown",
    "Network failure",
    "tView failure",
    "Cloud failure",
    "Remote command",
    "Gracefull request",
    "Reset request",  // from button or cloud
    "Attribute reset" // reset after attribute updates for brk or udp
};

// ------------------------------------------------------------------------------
void resetDevice(int resetReason) // minimal reset called from system event or reset button
// ------------------------------------------------------------------------------
{
    Serial.print("Reset reason %s", resetReasonsText[resetReason]);
    delay(200); // allow serial buffer to flush
    System.enableReset();
    System.reset(resetReason);
}

and then call this on startup so you know what broke …

// ------------------------------------------------------------------------------
void prevResetReason() //
// ------------------------------------------------------------------------------
{
    const int maxSizeOfResetText = 59;
    const int maxSizeOfResetMsg = 60;
    char systemResetMsg[maxSizeOfResetMsg] = "Reset:Non defined reason"; // reset reason not defined in DeviceOS
    uint32_t data = System.resetReasonData();

    switch (System.resetReason())
    {
    case RESET_REASON_PIN_RESET: // Reset button or reset pin
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Button or reset pin");
        break;

    case RESET_REASON_POWER_MANAGEMENT: // Low-power management reset
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Low-power management");
        break;

    case RESET_REASON_POWER_DOWN: // Power-down reset
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Power-down");
        break;

    case RESET_REASON_POWER_BROWNOUT: // Brownout reset
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Brownout");
        break;

    case RESET_REASON_WATCHDOG: // Hardware watchdog reset
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Hardware watchdog");
        break;

    case RESET_REASON_UPDATE: // Successful firmware update
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Successful firmware update");
        break;

    case RESET_REASON_UPDATE_TIMEOUT: // Firmware update timeout
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Firmware update timeout");
        break;

    case RESET_REASON_FACTORY_RESET: // Factory reset requested
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Factory reset requested");
        break;

    case RESET_REASON_SAFE_MODE: // Safe mode requested
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Safe mode requested");
        break;

    case RESET_REASON_DFU_MODE: // DFU mode requested
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:DFU mode requested");
        break;

    case RESET_REASON_PANIC: // System panic
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:System panic %lu", data);
        break;

    case RESET_REASON_USER: // User-requested reset
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:User %s", resetReasonsText[data]);
        break;

    case RESET_REASON_UNKNOWN: // Unspecified reset reason
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Unspecified reason");
        break;

    case RESET_REASON_NONE: // Information is not available
        snprintf(systemResetMsg, maxSizeOfResetText, "Reset:Reason not available");
        break;
    default: // reset reason not defined in DeviceOS
        break;
    }
    Serial.println(systemResetMsg);
}
4 Likes

I have tried using the AWD and System.reset() and this didn’t work because I think you will find it is not actually stuck in the application.

I haven’t yet tried this as my mesh network is only 8 Xenons large! I have still noticed that there is usually one device that flashes green for a longer time than the others. But they do all connect. This is what I plan to test:

After a sleep or at startup - call Mesh.connect() then WaitFor(Mesh.ready, MESH_READY_WAIT_TIME); and if after the WaitFor () times out Mesh.ready is false then either try the System.reset() or toggle the EN pin. I currently WaitFor() and test for Mesh.ready() before making any Mesh.publish() calls.

Question - do you have a Mesh.subscribe() called before the Mesh.connect()?

I figured out a way to get those unresponsive Xenons to reset and reconnect successfully after the Boron resets its network connection or is just power cycled.

Check the string of post starting with the one post linked below to see the code I used to reset the Xenons via software by using publish and subscribe functions that still seem to work even when the Xenons are flashing green.

Electron vs. Boron – “living on the edge” – what has changed with the cellular connection?

2 Likes

This is pretty clever! Added, will test this.

One question, did your Xenons not naturally reconnect after the Gateway came back? So far, the vast majority of mine will reconnect after I push a Gateway update. It is only after a minute or two that 1 or 2 don’t seem to reconnect. Those are the only ones I really need to send the system reset. I’ll be circling back to this tomorrow and will share what I end up doing.

Here’s what I have so far, is there a more elegant way to handle the initial connection instead of a 10 second delay?

SYSTEM_THREAD(ENABLED)

int start_time;
int interval = 30; // # of seconds

void setup() {
    Serial.begin(9600);
    delay(10000); // Put in a delay so it has a chance to connect before getting into the loop
}

void loop() {
    delay(5000);
    if (!Particle.connected()) {
      // If not connected to Particle, set timer
      if (start_time == 0) {
        start_time = Time.now();
        Serial.println("Start");
      };
      if (start_time > 1) {
        // If timer too long, reset the device
        if (Time.now() > start_time + interval) {
          // Reset
      		System.reset();
      	} else {
      	    Serial.println("Offline");
      	};
      };
    } else {
        if (start_time > 1) {
            // Must reset after reconnecting
             Serial.println("Reset");
            start_time = 0;
        };
    };
}

Big delays are generally bad / indicate the wrong architecture of your code. Why do you need to give it a chance to connect before you hit loop? If you really do need to do that, simply track the time with a variable.

Why are you using Time.now()? If the device hasn’t connected to internet since power cycle, it will return an invalid value. It is only set AFTER particle is connected. You should use millis() instead to get milliseconds since reset.

Also, you never initialize start_time, so while in practice it probably is zero, that is not guaranteed. Set start_time = 0; in your setup. But also why are you only starting the timer in your loop to track the duration to connect? Just start it in setup and then increase your interval.

Here is what I would do based on what I think you are trying to do:

SYSTEM_THREAD(ENABLED);

const uint32_t connectivity_message_interval = 5000;  // 1 sec
uint32_t last_connectivity_message_time = 0;  // controls how often we send the offline message

const uint32_t connectivity_timeout = 180000UL;  // 3 min b/c 30sec is pretty short, but up to you.
uint32_t last_connectivity_change_time = 0;

bool is_particle_connected; // flag to handle the moment of connectivity change

void setup() {
    Serial.begin(9600);
    
    Mesh.on();  // potentially needed due to bug when mesh module is not already powered up.
    Mesh.connect();
    Particle.connect(); 
    
    is_particle_connected = Particle.connected();
    
    Serial.println("Finished setup");
}

void loop() {
  Particle.process();
  
  if (!Particle.connected()) {
    if (is_particle_connected) {
      // handle moment of disconnection
      is_particle_connected = false;
      last_connectivity_change_time = millis();
      Serial.println("Just Went Offline");
    }
    if ((millis() - last_connectivity_change_time) > connectivity_timeout) {
      // probably an issue connecting, we'll try to fix by resetting
      Serial.println("Resetting due to connectivity timeout...");
      delay(1000);  // allow serial message to get read out 
      System.reset();
    }
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Offline");
    }
  }
  else {
    if (!is_particle_connected) {
      // handle moment of connection
      is_particle_connected = true;
      last_connectivity_change_time = millis();
      Serial.println("Just Came Online");
    }
    // We are connected!  Time for normal connected stuff...
    if ((millis() - last_connectivity_message_time) > connectivity_message_interval) {
      last_connectivity_message_time = millis();
      Serial.println("Still Online");
    }
  }
}
1 Like

Thanks for taking a look at my code. This looks much better than what I was doing. TBH I'm learning on the fly (Python eng tinkering in hardware), so every place you think - 'Huh that looks odd,' toss it up to just trying to get C++ working :slight_smile:

Yea this seemed odd to me too so figured I should ask if there's a smarter way.

This was just a way to track how long an action has/hasn't taken place. millis() definitely looks like the better solution. Thanks for the explanation.

This was a real headache to fight. Bc you are right, there were cases it wasn't 0 and I couldn't figure out why.

Thanks again for the help!

1 Like

A lot of times the Xenons would reconnect just fine but a lot of times some or all of them would not.

It seemed to me that after the Boron would reset or loose it's cellular connection for a min and then reconnect that the Xenon's would not get updated on something and then would start flashing Green until reset manually. I created the code I shared with you to reset them without touching them and it always worked.

I have not tested with any of the last 5 Device OS software releases so I was hoping that issue would have been fixed by now but you may be seeing the same thing.

2 Likes

You’re welcome. One recommendation I would make for someone moving from Python to embedded C++ is to think of your code as a process and not a script. Scripting encourages things like delays and linearly dependent paths, whereas a process needs to maximize value out of system resources (like time & memory) and needs to be able to self-manage. You can then think of your smaller system components (functions, classes, blocks of code) kind of like sub-processes. Think, “what are the conditions where this needs to be running?” and “what is the priority of doing this relative to other things?”.

Think of it like asynchronous code where somethings are linearly processed but other things require checking in to see if the result is available or if a state has changed outside of your control. Doing these things will help you properly modularize your code and reduce unnecessary dependancies.

Using classes and functions to formalize interfaces where information and other dependencies are shared is a great way to keep things organized. Be careful with global variables and where you use them. Debugging complex interdependent embedded code is a PITA, so breaking things down into units and also running the units in independent ways whenever possible will save you a lot of headache.

Welcome to C++!

1 Like

I may have spoke too soon.

I’m pushing mass OTA updates to all of the edge devices, and it takes 5+ minutes for them all to reconnect. That is definitely taking longer than I was expecting.

Yea, it’s always been slow to OTA Xenons.

1 Like

How long does it take your for the device to reconnect after the update?

I just did another large push, and that was at least 10 minutes. I just left with some still trying to connect. I’ll come back and see if they sorted themselves.

Didn’t really test very much after a OTA update since when I was testing the OTA success rate was no where close to 100%. Hoping that has improved some since I was testing.

Does the MESH Publish & Subscribe still work even though it’s flashing green?

Good question, I need to check

Yea, usually the Xenon’s would connect to the MESH network just fine even with the flashing green LED status which just means the Xenon’s do not have a direct Internet connection through the Boron or Argon.

As long as you have that MESH connection you should be able to send a publish to reset them via code which for me would always bring the Xenon’s back online to a breathing green status.

2 Likes

I just wrote a simple heartbeat pub/sub to ping the edge devices.

  • Gateway publishes every 3 minutes
  • Edges listen

It appears that does get the newly updated devices to breathing/connected again dramatically faster.

1 Like

Are you seeing timeouts when you do bulk deploys? I’m getting 1 or 2 timeouts out of the 50 when I do bulk update over this 1 gateway.