Photon crashing within 10 hours of running

I’m having issues with my code as Photons (0.8 rc11) that’s running it eventually end up offline within 5-10 hours. Typically they end up breathing green or SOS - hard fault.

Is there anything obvious that I might be missing with this code that results in the failure? It’s set up to read current from current sensors on a separate board using I2C.

Much appreciated!

SYSTEM_THREAD(ENABLED);
STARTUP(WiFi.selectAntenna(ANT_AUTO));
SYSTEM_MODE(AUTOMATIC);

#define PUBLISH_RATE 5000
#define ADDR 0x2A

#include <vector>

static char buffer[50];
static int last_sync = 0;

unsigned int wireData[36];
static float maxCurrent = 0.0;
static int NUM_CHANNELS = 0;

void readCurrent() {
    float current = 0.0;
    for (int i = 1; i <= NUM_CHANNELS; i++) {
        Wire.beginTransmission(ADDR);
        Wire.write(0x92);
        Wire.write(0x6A);
        Wire.write(0x01);
        Wire.write(i);
        Wire.write(i);
        Wire.write(0x00);
        Wire.write(0x00);
        Wire.write((0x92 + 0x6A + 0x01 + i + i + 0x00 + 0x00) & 0xFF);
        Wire.endTransmission();

        Wire.requestFrom(ADDR, 3);
        
        current += ((Wire.read() * 65536) + (Wire.read() * 256) + Wire.read()) / 1000.0;
    }
    if (maxCurrent < current) maxCurrent = current;
}

void formatPacket() {
    sprintf(buffer, String(Time.now()) + ",current1," + String(maxCurrent, 3));
}

void setup() {
    last_sync = millis();

    Wire.begin();
    Wire.beginTransmission(ADDR);
    Wire.write(0x92);
    Wire.write(0x6A);
    Wire.write(0x02);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0xFE);
    Wire.endTransmission();
    Wire.requestFrom(ADDR, 6);
    
    if (Wire.available() == 6) {
        for (int i = 0; i < 6; i++) {
            wireData[i] = Wire.read();
        }
    }
    
    NUM_CHANNELS = wireData[2];
}

void loop() {
    readCurrent();

    if ((millis() - last_sync) > PUBLISH_RATE) {
        formatPacket();
        if (Particle.connected()) bool ack = Particle.publish("datastream", buffer, PRIVATE, WITH_ACK);
        last_sync = millis();
        maxCurrent = 0.0;
    }
    if (System.freeMemory() < 30000) System.reset();
}

This line kind of defeats the purpose of using char arrays instead of Strings. It may also be contributing to memory fragmentation. The same line can be written this way:

snprintf(buffer, sizeof(buffer), "%lu,current1,%.3f", Time.now(), maxCurrent);

Your current use of sprintf() doesn't guard against overrunning the bounds of buffer[], thus the snprintf() with the limiting sizeof(buffer) parameter. If you want to learn about the snprintf() syntax, you can read here:

http://www.cplusplus.com/reference/cstdio/printf/

Another thing I noticed is that readCurrent() is being called on every loop without any delay. So you are hitting the I2C device at about 1000Hz or at 1ms intervals. I'm not sure this is what you wanted. Why not read it only when you want to publish it?

5 Likes

Thanks @peekay123!

I made the string change and added in a delay for data sampling.

One thing I notice with photons is that using AUTO mode for the antenna causes reconnection issues. Has anyone else switched to EXTERNAL exclusively when using an antenna?

@markovchainz, did you implement the data sampling delay using millis()? If not, you should so it is non-blocking.

You will find plenty of posts where members have switched to one mode specifically for the antenna with many preferring an external antenna for the added range. Note that the antenna setting is “sticky” so it remains, regardless of firmware changes, until it is changed again.

2 Likes

@peekay123, I ended up using a delay of 2 seconds. I know for long delays it’s better to use millis() instead of calling the delay method but for short delays is there much of a difference between millis() and delay()?

Interesting enough, it seems that switching from ANT_AUTO to ANT_EXTERNAL saves considerable reconnection attempts and improved signal of our photons in steel enclosures in an industrial facility this morning. Not sure if the photon toggles frequently between the chip antenna and an external antenna but we will definetly stick with ANT_EXTERNAL.

1 Like

@markovchainz, if you consider the fact that FreeRTOS is slicing at 1ms intervals, 2 seconds represents 2000 time slices! So I always recommend using millis() delays to provide non-blocking loop() code.

That is good news regarding the antenna setting. When using an external antenna, I believe it is best to lock the antenna to EXTERNAL mode.

2 Likes

Thanks for the heads up @peekay123. We removed the delay() and replaced with millis().

A couple photons still seem spotty as we’ve had disconnections since around 11am without reconnecting. Signal strength around that location was about -68 dB so I can’t see it being a wifi issue.

We’re going to continue to investigate this, I’m trying really hard to get a couple IoT pilot projects off the ground using photons as we plan on rolling out hundreds of them across Canada/US over the next few months. I can’t imagine dealing with these issues with Photons all around North America! That wouldn’t be very fun.

As a reference, here’s our updated file since all the recommended changes:

SYSTEM_THREAD(ENABLED);
STARTUP(WiFi.selectAntenna(ANT_EXTERNAL));
SYSTEM_MODE(AUTOMATIC);

#define PUBLISH_RATE 5000
#define READ_RATE 500
#define ADDR 0x2A

#include <math.h>

static char buffer[50];
static int last_sync = 0;
static int last_read = 0;

static int channel_to_read = 1;
static float maxCurrent = 0.0;
static int NUM_CHANNELS = 0;
static int publishCurrent = 0;

static const String buffer_string = "%lu,current1,%i";

void readCurrent(int channel) {
    float current = 0.0;
    
    Wire.beginTransmission(ADDR);
    Wire.write(0x92);
    Wire.write(0x6A);
    Wire.write(0x01);
    Wire.write(channel);
    Wire.write(channel);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write((0x92 + 0x6A + 0x01 + channel + channel + 0x00 + 0x00) & 0xFF);
    Wire.endTransmission();
    Wire.requestFrom(ADDR, 3);
        
    current = ((Wire.read() * 65536) + (Wire.read() * 256) + Wire.read()) / 1000.0;
        
    maxCurrent += current;
}

void formatPacket() {
    publishCurrent = round(maxCurrent);
    snprintf(buffer, sizeof(buffer), buffer_string, Time.now(), publishCurrent);
}

void setup() {
    Serial.begin();
    last_sync = millis();
    last_read = millis();
    
    unsigned int wireData[36];

    Wire.begin();
    Wire.beginTransmission(ADDR);
    Wire.write(0x92);
    Wire.write(0x6A);
    Wire.write(0x02);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0x00);
    Wire.write(0xFE);
    Wire.endTransmission();
    Wire.requestFrom(ADDR, 6);
    
    if (Wire.available() == 6) {
        for (int i = 0; i < 6; i++) {
            wireData[i] = Wire.read();
        }
        NUM_CHANNELS = wireData[2];
    }
}

void loop() {
    if ((millis() - last_read) > READ_RATE) {
        Serial.println("reading: " + String(channel_to_read));
        readCurrent(channel_to_read);
        channel_to_read++;
        if (channel_to_read > 4) channel_to_read = 1;
        last_read = millis();
    }

    if ((millis() - last_sync) > PUBLISH_RATE) {
        formatPacket();
        if (Particle.connected()) bool ack = Particle.publish("testhook", buffer, PRIVATE, WITH_ACK);
        last_sync = millis();
        maxCurrent = 0.0;
    }
    if (System.freeMemory() < 30000) System.reset();
}

One of the photons finally came back after being offline for over 5 hours. With 0.8, seeing the diagnostics has proved to be extremely helpful for debugging.

After checking the photon’s logs… it turns out it attempted to reconnect over 700 times to the network during the offline period. This to me is odd because the RSSI was between -67 and -74 which isn’t terrible.

In the past I have noticed that resetting the wifi module if there’s been no connection for over an hour has quickly brought the photon back online but I wanted to eliminate as many user bugs as possible before blaming things on the WiFi module.

Has any one had success with toggling the wifi module rather than letting the photon do its own thing with reconnecting?

This line actually does not make the String "const" per se since it still allocated dynamic memory. Stick with:

const buffer_string[] = "%lu,current1,%i";

This will create a const c-string stored in flash.

As for the WiFi issue, several members have posted code to manage the reconnection issue which includes turning the WiFi off then on again. A search will help you find these.

2 Likes

For complete eradication of String in your code you can rewrite the above like this

Serial.printlnf("reading: %d", channel_to_read);
2 Likes

Thanks @peekay123, @ScruffR.

All Strings have been removed!

Quick status update:

We have been able to eliminate crashing issues and the photon has proven to be stable after implementing the suggestions from you guys.

Our issue is still revolving around getting on the network however. Resetting the photon doesn’t seem to make it more likely to connect and one photon attempted over 2000 reconnects last night before it was able to get on the network. Signal strength was between -70 to -75 so it’s not horrible by any means. I thought resetting the system would prevent the exponential back off from slowing down the reconnection but that hasn’t helped.

Any thoughts on this? We’re getting so close to getting our sensors off the ground and running.

3 Likes

This is some code I’ve had some success with

enum fsmCONNECTION {
    fcCHECK,
    fcSTOP,
    fcRESTART,
    fcRESET,
};
const int maxConnectionRetry = 3;

// called from loop() via a millis() timer - calling from Software Timer cause problems
void checkConnection() {
  static fsmCONNECTION state     = fcCHECK;
  static int           passCount = 0;
  static int           prevDay   = -1;

  Log.trace("%s -->> state: %d", __FUNCTION__, state);
  switch(state) // FSM for cloud connection
  {
    case fcCHECK:
      if (!Particle.connected()) {
        Log.warn("Connection lost - waiting for reconnect (%d)", ++passCount);
        if (passCount > maxConnectionRetry) 
        {
          passCount = 0;
          state = fcSTOP;
        }
      }
      else {
        Log.trace("Connection OK");
        passCount = 0;
        
        if (Time.day() != prevDay) 
        { // once a day sync RTC with cloud
          Particle.syncTime();
          prevDay = Time.day();
        }
        //state = fcCHECK; // superfluous as it already is
      }
      break;      

    case fcSTOP:
      if (!Particle.connected()) {
        Log.warn("Stopping connection");
        Particle.disconnect();  // redundant but for tidyness
        WiFi.disconnect();
        WiFi.off();
        state = fcRESTART;
      }
      else { // meanwhile regained connection, abort reset
        Log.trace("Regained connection");
        passCount = 0;
        state = fcCHECK;
      }
      break;      

    case fcRESTART:
      Log.warn("Restarting connection");
      WiFi.on();
      Particle.connect();
      passCount = 0;
      state = fcRESET;
      break;
      
    case fcRESET:
      passCount = 0;
      if (!Particle.connected()) {
        Log.error("DeepSleep 30sec and reset system to regain connection");
        System.sleep(SLEEP_MODE_DEEP, 30);
      }
      else { // meanwhile regained connection, abort reset
        Log.trace("Regained connection");
        state = fcCHECK;
      }
      break;

    default:
      Log.trace("Wrong state %d (*%d)", state, passCount);
      passCount = 0;
      state = fcCHECK;
      break;      
  }

  Log.trace("<<-- %s", __FUNCTION__);
}
4 Likes

Ah, ScruffR with some brilliant code as usual. Thanks! That’s very cool. I’m going to add that to my weather station which also suffers from occasional bouts of Nets-heimers.