User code stops running while in Manual mode around 24 hour intervals

Ok so I feel like I might be a bit like Clifford Stoll here looking for or seeing patterns that might have other explanations but anyway lets continue.

So I have two cores running nearly identical code one is connected to the cloud in automatic mode and it can run for days/weeks on end.

The other is in Manual mode and disconnected from the cloud and it stops running user code after 23.2 hours although the led keeps pulsing green.

I’m logging the elapsed milliseconds on both cores and the one disconnected from the cloud stops running user code after 83.8 million milliseconds.

3 Likes

Interesting. Would you mind posting your code? Perhaps there is a cloud call or something in there that’s going unnoticed on the connected version but not the unconnected version

@mrOmatic I’m using MANUAL_MODE and my app has been running for 15 days. However I use the WiFi differently. I keep WiFi off and bring it up every 10 mins for about 20-40 seconds. No cloud connection.

That’s useful information, thanks. I’d intended to post my code last night but didn’t get a chance.

My app, brings up Wifi, connects to the cloud briefly to sync time and then disconnects from the cloud and stays that way, It keeps is wifi connection up all the time and also maintains a MQTT connection to a local broker.

The main loop of the app is just maintaining the MQTT connection, publishing the elapsed millies as well a publishing button presses and subscribing / receiving a couple of topics.

The interesting thing is the disconnected core crashes where there connected core does not.

My hunch is that there is a some code in the spark firmware like perhaps re syncing the time etc that just assumes the cloud connection is up and then crashes the core, but i could be way out.

Interesting observation that 83.8 million millis is close to a whole day, which is 86.4 million millis. Does it always crash at exactly that time?

It’s always just less than 24 hours its around 23.2 (decimal) hours.

The other interesting thing i’ve noted that may or many not be related is that periodically my cloud connected core will stop processing user code for 5 minutes and then starts up again. Two examples in the following graphs.

You cab see millis continues to count it’s just that the user code stops and then starts again.

It happens on close to a daily cycle but i’m not sure if its the same 23.2 hours thing, i’ts harder to spot and compare on the graph, i need to add annotations to make it easier to track.

For ages I used used to think the core was crashing it was nearly 24 hours and i’d be back at my desk when i get a notification that the mqtt client had disconnected (via mqttwarn -> pushover -> pebble watch) and usually just reboot it. Took me a while to figure out that after 5 minuets the user code would start back up again.

I intend to do a version of the millies graphing just using the cloud API’s to see if a similar thing happens but i haven’t had time to get to they yet.

Thanks for sharing, that’s very interesting!

I can make a version of the firmware with “accelerated” time. This will allow us to quickly fast forward to 23.x hours to see if the problems are related to the local time on the core, or externally. This isn’t a big job so I hope to squeeze this in before the end of the week.

Are you able to compile locally from a development branch?

Another thing to check is your DHCP lease time from your router.

1 Like

bko

I checked my DHCP server (i’m running a Ubiquity edgerouter lite) and my lease times are set to 24 hours, So once again it’s close but doesn’t seem like the problem. I haven’t changed the value as I don’t want to change more then one thing at a time. But just to open another can of worms it looks like my spark cores are holding onto their IP’s after the leases have expired, even in between reboots. I haven’t really paid much attention to the IP layer as it’s just worked.

mdma

Yup I’m 100% local compile so happy to try your accelerated time test idea.

Here’s the code for my BlueSky (no clouds) varient of my smart home buttons.

/**
 ******************************************************************************
 * @file    application.cpp
 * @authors Justin Maynard
 * @version V0.2.1
 * @date    29-July-2014
 * @brief   Wireless SmartHome Buttons
 ******************************************************************************

Todo
  Support for orientation and remap button numbers
  Support for larger button pads

 ******************************************************************************
  Copyright (c) 2014 Omatic Labs, Inc.  All rights reserved.

  This program is free software; you can redistribute it and/or
  modify it under the terms of the GNU Lesser General Public
  License as published by the Free Software Foundation, either
  version 3 of the License, or (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
  Lesser General Public License for more details.

  You should have received a copy of the GNU Lesser General Public
  License along with this program; if not, see <http://www.gnu.org/licenses/>.
 ******************************************************************************
 */

/* Includes ------------------------------------------------------------------*/  
#include "application.h"
#include "PubSubClient_Hirotaka.h"
#include "Spark_NeoPixel.h"
#include "clickButton.h"
#include "TimeAlarms.h"

SYSTEM_MODE(MANUAL);

int number = 12345;
char string1[12] = "";

// allow us to use itoa() in this scope
extern char* itoa(int a, char* buffer, unsigned char radix);

// Configuration Section 

#define BROKER_IP_ADDRESS { 192, 168, 23, 20 }
#define MQTT_CLIENT_NAME "SparkButtons_blueSky"
#define MQTT_WILL_TOPIC "client/status"
#define MQTT_WILL_MESSAGE "Study BlueSky Down"

String buttonpad_name = "Study BlueSky";
String colourTopic = "study/buttons";   
String buttonTopic = "study/buttons";  


// Power Cable Orientation
// 1 = down 2 = up 3 = left 4 = right
#define ORIENTATION 1

// End Configuration Section 

// the Buttons
#if ORIENTATION == 1
  const int buttonPin1 = 3;
  const int buttonPin2 = 2;
  const int buttonPin3 = 4;
  const int buttonPin4 = 1;
#endif
#if ORIENTATION == 2
  const int buttonPin1 = 1;
  const int buttonPin2 = 4;
  const int buttonPin3 = 2;
  const int buttonPin4 = 3;
#endif
#if ORIENTATION == 3
  const int buttonPin1 = 4;
  const int buttonPin2 = 3;
  const int buttonPin3 = 1;
  const int buttonPin4 = 2;
#endif
#if ORIENTATION == 4
  const int buttonPin1 = 2;
  const int buttonPin2 = 1;
  const int buttonPin3 = 3;
  const int buttonPin4 = 4;
  
#endif

ClickButton button1(buttonPin1, LOW, CLICKBTN_PULLUP);
ClickButton button2(buttonPin2, LOW, CLICKBTN_PULLUP);
ClickButton button3(buttonPin3, LOW, CLICKBTN_PULLUP);
ClickButton button4(buttonPin4, LOW, CLICKBTN_PULLUP);
char buttonstatestring[5] = "";

// allow us to use itoa() in this scope
extern char* itoa(int a, char* buffer, unsigned char radix);

// LED's 
#define NEOPIN D0
Adafruit_NeoPixel LED = Adafruit_NeoPixel(4, NEOPIN, WS2812);

// MQTT
void callback(char* topic, byte* payload, unsigned int length) {
    // handle message arrived
  payload[length] = '\0';
  char* cstring = (char *) payload;
  long n = atol(cstring);

  String callbackTopic = String(topic);
  String TopicString = String(colourTopic + "/1/colour");
  if (callbackTopic == TopicString) {
    LED.setPixelColor(buttonPin1-1,n);
  }
  TopicString = String(colourTopic + "/2/colour");
  if (callbackTopic == TopicString) {
    LED.setPixelColor(buttonPin2-1,n);
  }
  TopicString = String(colourTopic + "/3/colour");
  if (callbackTopic == TopicString) {
    LED.setPixelColor(buttonPin3-1,n);
  }
  TopicString = String(colourTopic + "/4/colour");
  if (callbackTopic == TopicString) {
    LED.setPixelColor(buttonPin4-1,n);
  }
  LED.show();
}


byte server[] = BROKER_IP_ADDRESS;
TCPClient tcpClient;
MQTT client(server, 1883, callback);

void printDigits(int digits)
{
  Serial.print(":");
  if(digits < 10)
    Serial.print('0');
  Serial.print(digits);
}

void digitalClockDisplay()
{
  // digital clock display of the time
  Serial.print(Time.hour());
  printDigits(Time.minute());
  printDigits(Time.second());
  Serial.println(); 
}



// Sheduled functions
void Reset(){
  client.publish("client/status","Rebooting Study Scene Switch"); 
  client.loop();
  client.disconnect();
  client.loop();
  System.reset();
}

void Repeats(){ 
  itoa(millis(), string1, 10);
  client.publish("study/bluesky/runtime",string1);  
}

void MQTTSubscribe() {
  String TopicString = String(colourTopic + "/1/colour");
  char TopicChar[TopicString.length()+1];
  TopicString.toCharArray(TopicChar, TopicString.length()+1);
  client.subscribe(TopicChar);
  TopicString = String(colourTopic + "/2/colour");
  TopicString.toCharArray(TopicChar, TopicString.length()+1);
  client.subscribe(TopicChar);
  TopicString = String(colourTopic + "/3/colour");
  TopicString.toCharArray(TopicChar, TopicString.length()+1);
  client.subscribe(TopicChar);
  TopicString = String(colourTopic + "/4/colour");
  TopicString.toCharArray(TopicChar, TopicString.length()+1);
  client.subscribe(TopicChar);
}
 
void setup() {

  LED.begin();
  LED.show();
  LED.setPixelColor(0,0,0,255);
  LED.setPixelColor(1,0,0,255);
  LED.setPixelColor(2,0,0,255);
  LED.setPixelColor(3,0,0,255);
  LED.show();

  delay(500);

   WiFi.connect();
   while(WiFi.ready() == false) {
     delay(100);
    }

  LED.begin();
  LED.show();
  LED.setPixelColor(0,0,255,0);
  LED.setPixelColor(1,0,255,0);
  LED.setPixelColor(2,0,255,0);
  LED.setPixelColor(3,0,255,0);
  LED.show();

  delay(500);

  Spark.connect();
  while(Spark.connected() == false) {
     delay(100);
   }

  LED.begin();
  LED.show();
  LED.setPixelColor(0,255,0,0);
  LED.setPixelColor(1,255,0,0);
  LED.setPixelColor(2,255,0,0);
  LED.setPixelColor(3,255,0,0);
  LED.show();

   Spark.syncTime();
   Spark.process();
   Spark.disconnect();
   while(Spark.connected() == true) {
     delay(100);
   }


  Time.zone(+10);
  Serial.begin(9600);

  Alarm.timerRepeat(60, Repeats);      

  // NeoPixels
  pinMode(NEOPIN,OUTPUT);
  // Buttons
  pinMode(D1, INPUT_PULLUP);
  pinMode(D2, INPUT_PULLUP);
  pinMode(D3, INPUT_PULLUP);
  pinMode(D4, INPUT_PULLUP);

  // Setup button timers (all in milliseconds / ms)
  // (These are default if not set, but changeable for convenience)
  button1.debounceTime   = 20;   // Debounce timer in ms
  button1.multiclickTime = 250;  // Time limit for multi clicks
  button1.longClickTime  = 1000; // time until "held-down clicks" register
  button2.debounceTime   = 20;   // Debounce timer in ms
  button2.multiclickTime = 250;  // Time limit for multi clicks
  button2.longClickTime  = 1000; // time until "held-down clicks" register
  button3.debounceTime   = 20;   // Debounce timer in ms
  button3.multiclickTime = 250;  // Time limit for multi clicks
  button3.longClickTime  = 1000; // time until "held-down clicks" register
  button4.debounceTime   = 20;   // Debounce timer in ms
  button4.multiclickTime = 250;  // Time limit for multi clicks
  button4.longClickTime  = 1000; // time until "held-down clicks" register

  // Setup LED's
  LED.begin();
  LED.show();
  LED.setPixelColor(0,0,0,255);
  LED.setPixelColor(1,0,255,0);
  LED.setPixelColor(2,255,0,0);
  LED.setPixelColor(3,255,255,0);
  LED.show();
  
  // Start MQTT Connections
  client.connect(MQTT_CLIENT_NAME,MQTT_WILL_TOPIC,0,0,MQTT_WILL_MESSAGE);

  MQTTSubscribe();
  
  String startString = String(buttonpad_name + " Start");
  char startChar[startString.length()+1];
  startString.toCharArray(startChar, startString.length()+1);
  client.publish(MQTT_WILL_TOPIC,startChar);
  
}

void loop() {
  // Update button state
  button1.Update();
  button2.Update();
  button3.Update();
  button4.Update();

  if (button1.clicks != 0) {
    itoa(button1.clicks, buttonstatestring, 10);
    String TopicString = String(buttonTopic + "/1/state");
    char TopicChar[TopicString.length()+1];
    TopicString.toCharArray(TopicChar, TopicString.length()+1);
    client.publish(TopicChar,buttonstatestring);
  }
  if (button2.clicks != 0) {
    itoa(button2.clicks, buttonstatestring, 10);
    String TopicString = String(buttonTopic + "/2/state");
    char TopicChar[TopicString.length()+1];
    TopicString.toCharArray(TopicChar, TopicString.length()+1);
    client.publish(TopicChar,buttonstatestring);
  } 
  if (button3.clicks != 0){
    itoa(button3.clicks, buttonstatestring, 10);
    String TopicString = String(buttonTopic + "/3/state");
    char TopicChar[TopicString.length()+1];
    TopicString.toCharArray(TopicChar, TopicString.length()+1);
    client.publish(TopicChar,buttonstatestring);
  } 
  if (button4.clicks != 0){
    itoa(button4.clicks, buttonstatestring, 10);
    String TopicString = String(buttonTopic + "/4/state");
    char TopicChar[TopicString.length()+1];
    TopicString.toCharArray(TopicChar, TopicString.length()+1);
    client.publish(TopicChar,buttonstatestring);
  } 

  Alarm.delay(0);
   
if (client.isConnected()){
     client.loop();
  }
  else {
    if (client.connect(MQTT_CLIENT_NAME,MQTT_WILL_TOPIC,0,0,MQTT_WILL_MESSAGE)) {
        MQTTSubscribe();
        String statusString = String(buttonpad_name + " Reconnect");
        char statusChar[statusString.length()+1];
        statusString.toCharArray(statusChar, statusString.length()+1);
        client.publish(MQTT_WILL_TOPIC,statusChar);
    }
    else {
      // Don't flood the broker with connections and it's already down so we don't need to worry about a delay disconnecting the client
      delay(15*1000);
    }
  }
}

@mrOmatic Does your user code stop, or do you just stop receiving network data?

I was around this morning when the core went through its usual 5 minute down time. During this time the core LED was flashing cyan. Leading me to conclude that the user code is being blocked while the core reconnects to the spark cloud. Why it takes 5 minutes and happens every 23.2 hours is anyones guess.

Can you send a message to hello@spark.io referencing this post and also include your core’s id? Thanks!

I have this issue as well between 23 and 24 hours of continously running the spark core Publishing to MQTT. Has there been any solution ?

Thanks
Michael

I wonder if maybe you’re hitting some kind of math error around that time? Are you using any time dependent things in your code? Any chance you could post the relevant code?

Thanks!
David

1 Like

Another good thing to check is the DHCP lease time of your router. 24 hours is typical.

With the IFTTT channels and the published messages when a core/photon is connected to the cloud and when they are declared disconnected from the cloud, I know that a core with empty setup and loop firmware disconnects from the cloud every 24 hours on one of my routers.

2 Likes

I haven’t looked into the issue any further that when I last commented on the thread. I’ve been moving house and holding out for the Photon to see if that made a difference.

It looks like there area few users having similar issues like this one Code not stable

I thought after @baatsm 's post it might be related to MQTT but if @rpsmithii 's issue is the same then it’s probably not.

Also in response to the DHCP suggestions, even if it is related to DHCP the core shouldn’t reset or stop running user code for a long time if it needs to renew it’s DHCP lease.

Hi @mrOmatic

Sure and that is true, but it does depend on how you write your application. For example, I have seen a bunch of people who only open the TCP connection in setup() and this type of code can never recover from a DHCP problem.

1 Like

I am using mqtt as well. I will try to reduce my code to something reproducible and post it here.

1 Like