Can the Spark Core trigger the Reset Pin?

It will reset the STM32 which should reinitialize the CC3000. There is no RESET line on the CC3000 so the only way to hard reset that is with a power cycle. So far I have seen the watchdog timer code work during CFOD though, after which the Core connects to the Cloud… so re-initializing the CC3000 seems to work in lieu of a hard reset.

No, I meant it’s automatically kicked in the main MAIN loop, found in main.cpp. It’s the main loop that basically does this:

while(1) {
  runBackgroundTasks();
  if(online) {
    runUserSetupOnce();
    runUserLoop();
    
    // proposing to put watchdog reload here
    if(8 seconds elapsed) kickTheDog();
  }
}

Now to answer the question of why 8 seconds? I was initially thinking it would save time to not reload it every time we came out of the user code. Basically shortening the background task time before we get back to the user code again. However it appears the reload it really just one command, wrapped in a function call… which is not really much time difference compared to a counter comparison. I think we should just kick the dog each time we leave the user code, but before the background tasks are run again. This gives PLENTY of time to connect to the WLAN or CLOUD (26.208 seconds).

The only thing about resetting the ORANGE/RED breathing state back to CYAN after a fixed uptime period of 6, 12, 24 hours is… if you come back and look at your core after 25 hours, you won’t know that it ever had an issue. I agree that breathing cyan is the best “look”, but somehow you need to permanently throw some kind of status that things went bad. You can always clear this in your user code after you log the condition, but if you’re not logging it… then you need to know about it. Perhaps there could be a different way to view the IWDG_RESET though… like breathe cyan but mix in a RED blip when the cyan fades out completely. It would be subtle, just enough to get the point across.

Only problem with NOT having the IWDG enabled, is if your code (any code) locks up before you enable it… you are toast. So that’s why one of the first things you enable is typically the watchdog. And if your user code locks up immediately on entry, you’ll never loop through the background tasks enough (as in not at all) to catch the OTA update.

Basically the mode button doesn’t do anything until you hold it down for 3 seconds currently… so a shorter press and release of say between 0.250 - 2.99 seconds could reset the USER_CODE_ENABLED flag, and make the RGB breathe magenta to indicate it’s connected to the cloud, but NOT running your user code. Magenta is associated with flashing user code already so it’s a good tie in color, and breathing tends to indicated connected to cloud.

Since there is a very simple loop as depicted above, I think it makes sense to kick the dog every time through that process. But only when it’s ONLINE because of issues like CFOD, or WLAN not connecting. Once CFOD is a vague memory, we could move the kicking of the dog to just before the backgrounTasks(); but if CFOD is not an issue anymore WLAN not connecting still could be… so perhaps in the ONLINE part of the code is the best place overall. If you disable SPARK_WLAN_ENABLE, the IWDG still functions to serve as a watchdog against user code locking up. And technically CAN be disabled in user code if need be. I think there MAY be an issue with SmartConfig here since that takes a while to work, and is not considered ONLINE at that point. I have an old timing diagram that seems to indicate SmartConfig is it’s own tight loop. Anything like that obviously needs it’s own kicking of the dog within it.

How long to wait is a good question… but right now in hardware with the IWDG the longest delay is 26.208 seconds. If we wanted to wait longer than that, I can’t currently think of a good way to add a foolproof software counter that extended it. It would be susceptible to lockups. 26.2 seconds seems like PLENTY of time to connect to your WLAN, or the CLOUD. if not, it gets reset and it can try again for 26.2 seconds.

It’s kind of hard to know WHAT failed when you reset from IWDG… unless we are constantly writing the state of where the code is to non-volatile memory as it’s looping… but I think we would wear out the memory pretty fast. Even if we knew it was USER code that locked up… why should we automatically prevent it from running again? Maybe it just locks up intermittently once in a while… say, on the hour exactly because of some counter wrapping or a bad compare to some time variables. If we just reset and run the code again, the user’s code will run for a whole hour before it locks up again… maybe providing them with precious sensor data. We can latch an indication on the RGB that IWDG has occured, which should help a user to realize there is a problem potentially with their code, or network. If there was an easy way to check uptime, a user could see the IWDG indication and send a request like: https://api.spark.io/v1/devices/?access_token=xxxxx to check uptime to see when it reset last. Uptime is basically the millis(); counter, and could easily be implemented in user code as a variable as well, but wastes one of your available variables.

User code should be able to block up to the IWDG timeout value of course :wink: I understand though that if user code blocks for more than 10-15 seconds the Core will drop off the Cloud. So why wait longer than that? To allow plenty of time to get the WLAN and CLOUD connected in the first place.[quote=“zachary, post:56, topic:2693”]
A breathing red LED seems like a good signal something’s wrong. I’m not sure about whether it’s better to run the user code while the red LED breathes or to not run it. Maybe try running it with an orange breathing LED the first time, and if we fail again, breathe red and don’t run the user code.
[/quote]

I’ve experimented with orange on the RGB and it just looks yellow and or red. It’s hard to tell it’s CLEARLY orange if you never stared at all of the other colors. Perhaps the red blip idea weaved into the cyan breathing would be best? Could even blip once, twice, thrice… for number of reset times. Anything over 3 is going to start being too many blips to count, ,so you can just assume it’s reset a lot. I do think we need to run the user code until the user decides not to… pretend everyone is designing a Black Rocket… mission critical stuff.

I think the watchdog code only works properly if it’s always running… i.e. the core-firmware sets it up… and user code can augment it (make it time out faster or disable it completely) … but you should not have to figure out how to setup the watchdog in your “arduino-like” code. You have better things to worry about! :smile: Understanding the codes is key though to knowing what’s working and what’s not working.

I like the red led blip idea and beep up to 3 times max to indicate reset times.

If its easy to implement then make a quick button press on the Spark Core clear this Watch Dog error message.

Keep up the Excellent work, I feel like were almost there and on to other exciting things LOL :smiley:

@BDub

How do you think your Watchdog would have handled this situation where the main loop is still running but the CC3000 will not reconnect to the WiFi network once its shows up again after being gone for 90 mins?

Here is a video showing the issue. I see its often when I leave with my phone/Hotspot and then come back about an hour later.

Usually the main look will stop running when this happens and the blue LED will not flash indicating a successful hand off to Xivley. And possibly the main loop would have stopped running eventually which would have triggered the Watchdog circuit.

Watched your video… With my watchdog code, your Spark Core would have reset itself in 26.208 seconds (assuming perfect 40kHz LSI clock) after not being connected to the WLAN, and it would have reconnected to your network :wink:

Please do give my test code a try, it’s current with the Core-Firmware and Core-Common-Lib as of right now 2/11 but you’ll need to grab Spark’s Core-Communications-Lib before you build:

I’ve seen this reboot two cases of “CFOD on power up” as well so this is promising.

https://github.com/technobly/core-firmware/tree/watchdog_timer_fix
https://github.com/technobly/core-common-lib/tree/watchdog_timer_fix

Check out the application.cpp for examples of how to detect and clear IWDG status, and an example of it resetting itself due to a hard loop in the user code.

I still would like to figure out why just resetting the IWDG_SYSTEM_RESET does not restore the CYAN breathing, you have to force it back with LED_SetRGBColor(RGB_COLOR_CYAN); I need to understand how the Timing_Decrement() function in main.cpp works better I guess. Obviously the code I put in there updates the RGB from CYAN to RED, but won’t go back… so does it only get run once somehow?

Also I’ll need to figure out a good place to put the RED blip code, and see how that will look and function.

And finally intercepting the Mode button to lock-out user code. Might have to get creative with this one. I think it will work without having to use non-volatile memory or resetting the Core.

I’m ready to try out your solution. But the only problem is that I have never flashed new firmware via my Windows 8 PC to the Spark Core yet. I can’t really find any plain English guide that helps us mere mortals figure out how to do it either.

I’m good with computers, I just need some easy to follow quick guide to get this fix loaded on the Spark Core and then I’ll know how to do it from here on out.

If you could help me with this then I would be greateful and then I could begin providing feedback on how its working for me also. I’ll stress test it under my mobile wifi hotspot setup.

I still haven’t gotten around to doing this, but check out this video!

With this, you could effectively still just compile via command line if you wanted to… however if you are going through the trouble you might as well take advantage of the IDE.

The Core-Firmware readme is also very useful to double check things that you are doing: device-os/README.md at master · particle-iot/device-os · GitHub

You should also know some basic GIT/GITHUB type stuff, or just grab the Download Zip’s for each Repo if you just want to grab this one version of firmware.

Ok after a few hours I have successfully flashed your code to the Spark Core! Man that was a long drawn out process LOL

So I see your code in the CPP file, I let it run in a constant loop and it did its thing.

I tried to add my previous code for sending data to Xivley to your CPP code but it would not compile in Netbeans. I then tried to load just the Xivley code with none of your CPP example code and would not compile either in Netbeans.

So then I just used the Spark web IDE and loaded the same Xivley main loop that I have been running for the past week which contains none of your example code. Did this update from the Spark IDE update over write the your watchdog flash I just loaded or did it only update the cpp page with my main loop and your watchdog is still working?

Thanks for any advice. I’m learning lots here :smiley:

I flashed your watchdog code to the spark.

Your code in the CCP ran fine.

I uploaded my Xivley Code from Sparks online IDE because it would not compile in Netbeans. It looks like the watchdog is not running because its not fixing the network error messages.

So here is the code I’m trying to run with the Watchdog code. If I have to load everything via Netbeans then how do I get it to compile in Netbeans? Fun stuff!

define FEED_ID "1234" //note: fake id here.. 

define XIVELY_API_KEY “1234” //note: fake key here

TCPClient client;

int reading = 0;
int ledD = D7;

int count = 0;
int total_temp = 0;
int temp_calc = 0;
unsigned long LastUpTime = 0;
unsigned long LastCloudCheck = 0;
char whichApp[64] = “READ TEMPERATURE with XIVELY”;

// This routine runs only once upon reset
void setup()
{
//Register our Spark function here
Spark.variable(“whichapp”, &whichApp, STRING);
Spark.variable(“reading”, &reading, INT);
Spark.function(“degres”, tempCalculation);
Spark.function(“volt”, analogReading);
pinMode(A7, INPUT);
pinMode(ledD, OUTPUT);
ledStatus(5, 100); //Blink
}

void loop()
{
reading = analogRead(A7);
temp_calc = (reading*3.3/4095)*100 - 50;

if (millis()-LastUpTime>1000)
{
if (count <= 5) {
total_temp += temp_calc;
count++;
}
else {
xivelyTemp(total_temp/count); //Send the average of the last 5 readings
count = 0;
total_temp = 0;
}
LastUpTime = millis();
}

if (millis()-LastCloudCheck > 1000605) { //check every 5 min to see if the connection still exists
if(!Spark.connected()) Spark.connect();
LastCloudCheck = millis();
}
}

void xivelyTemp(int temperature) {

ledStatus(5, 100);
//Serial.println(“Connecting to server…”);
if (client.connect(“api.xively.com”, 8081))
{
// Connection succesful, update datastreams
client.print("{");
client.print(" “method” : “put”,");
client.print(" “resource” : “/feeds/”);
client.print(FEED_ID);
client.print("",");
client.print(" “params” : {},");
client.print(" “headers” : {“X-ApiKey”:"");
client.print(XIVELY_API_KEY);
client.print(""},");
client.print(" “body” :");
client.print(" {");
client.print(" “version” : “1.0.0”,");
client.print(" “datastreams” : [");
client.print(" {");
client.print(" “id” : “bedroom_temp”,");
client.print(" “current_value” : “”);
client.print(temperature-8); //adjustment for some weird reason…
client.print(""");
client.print(" }");
client.print(" ]");
client.print(" },");
client.print(" “token” : “0x123abc”");
client.print("}");
client.println();

ledStatus(3, 1000);
}
else
{
// Connection failed
//Serial.println(“connection failed”);
ledStatus(3, 2000);
}

if (client.available())
{
// Read response
//char c = client.read();
//Serial.print©;
}

if (!client.connected())
{
//Serial.println();
//Serial.println(“disconnecting.”);
client.stop();
}

client.flush();
client.stop();
}

void ledStatus(int x, int t)
{
for (int j = 0; j <= x-1; j++)
{
digitalWrite(ledD, HIGH);
delay(t);
digitalWrite(ledD, LOW);
delay(t);
}
}

int tempCalculation(String command) {
int tempCalc = (reading*3.3/4095)*100 - 50;
return tempCalc;
}

int analogReading(String command) {
return reading;
}

Where I would go:

  • change breathing cyan to breathing yellow/orange (universally accepted as warning) and put the red blinks at the cycle end to signal 1,2,3 and 3+ (4 levels) watchdog controlled resets
  • two taps on the MODE button resets the blocks counter and the breathing color goes back to cyan
  • three taps on the MODE button in an SOS like pattern (short, long, short) will set the BLOCK_USER_CODE flag, no matter if the watchdog was user enabled or not, then the led starts breathing magenta (universally accepted as danger) after cloud connection; do it again and everything goes back to normal
  • user activates the watchdog around user code explicitly through an API call which sets the appropriate flag IWDG_ENABLE and the maximum user code elapse time WATCHDOG_TIMEOUT, like for Arduino
  • leave a factory reset as the solution to the locked :spark:, you explicitly activated the watchdog and you explicitly (involuntarily maybe) flashed some code blocking your :spark:
  • the flashing firmware procedure should reset everything back to it’s normal values

An alternative is to calculate how fast the :spark: is self resetting trying to stay on the safe side AND to prevent a total lock:

  • because we are storing somewhere the watchdog reset count to guide the blinking, I suppose we are already using NVM so it should be possible to store an additional number BLOCK_USER_CODE_ELAPSED containing the amount of milliseconds elapsed while running user code between the occurred resets so we can stop user code execution (BLOCK_USER_CODE flag) if the last X resets (say 10) occurred within the last Y seconds (say 120, meaning a reset is occurring every 12 seconds)
  • the user API call should allow to explicitly set the BLOCK_USER_CODE_ELAPSED parameter
  • the two taps on the MODE button should reset the BLOCK_USER_CODE_ELAPSED as well

@RWB just add these lines to the top of your application.cpp:

#include <application.h>

void xivelyTemp(int temperature);
void ledStatus(int x, int t);
int tempCalculation(String command);
int analogReading(String command);

and here’s how you can make your code look nice in the forum:

@rlogiacco good suggestions… here is some feedback:

  • breathing a solid color is just easier because we don’t have to touch more of the code that operates the LED, it’s sprinkled in a lot of places setting and resetting various modes (FADE vs BLINK vs TIMING of BLINK/FADE). Adding a new variation on that will be a job in itself, I’d rather focus on other parts first, and have that accepted as a pull request first.
  • you have basically 3 seconds to get something done as far as the mode button goes, I like the idea of a few taps to lock out user code, and a couple to unlock it. Better than just one tap which someone might curiously do, but even if one tap only unlocked the locked user code execution, for the most part just a curious one tap wouldn’t change the current state of the spark. If it takes too long to get 3 taps reliably, the a 2 tap lock, 1 tap unlock would work as well.
  • I don’t think you want to enable watchdog through user code… you can disable it if you want, but it should be the first thing that is enabled before any code is run… in case any code at any point locks up. In my first example of modifying the watchdog, I’m clearing it inside the ONLINE loop, so if we never get online it will reset as well.
  • The STM32 has it’s own register that keeps track of a watchdog reset. It’s a volatile register, but when watchdog timeout occurs the STM32 does not lose power so this internal register can be safe. IWDG is pretty much like a software reset, but due to an independent hardware timer.
  • We can definitely save the state of things in NVM, just need to negotiate a space with the spark team :slight_smile: Lots of things we could log as diagnostics here, and show examples in user code how to interpret these logs and take action. These logs should probably only be cleared through an exposed function, to make it more difficult to lose information.

Yep its finally working!!! :smiley:

With your Help @BDub I got the Xivley code to compile in Netbeans and was able to flash the spark core. I quikly forced the most common wifi states that cause the Spark Core to lockup and the watchdog automatically recovered from all of them in about 30 seconds which is awesome!

When it recovers for a Watchdog rest the LED breathes RED. How can I set the breathing red to go back to breathing Cyan after 6 hours has passed which is just a personal preference for me till they get everything else fixed. Did you have issues with getting it switching back to breathing Cyan? Can’t remember.

So far it looks like we have a fix that works, I keep running this for days and days without touching the reset button. I’ll have Xivley graphs to backup the up time.

Thank you so much for your help with all this!

1 Like

@RWB that’s great! Keep in mind that while I like that this is working, the rest of the code should detect a more gracefully handle fixing problems before the watchdog occurs. So when that day comes… the watchdog can lie in wait of a new bug to attack!

Here’s an example of waiting 6 hours and checking and clearing the watchdog. You can add these bits to your code:

#include <application.h>

uint32_t lastReset = 0; // last known reset time

void setup()
{
  lastReset = millis(); // We just powered up 
}

void loop() {
  // Wait 6 hours before attempting to detect and reset IWDG reset
  if( (millis()-lastReset) > (6*60*60*1000) ) {
    // After 6 hours

    // This is how we can detect we are running from a IWDG (Independent Watchdog) reset
    if(IWDG_SYSTEM_RESET == 1) {
      // This is how we switch breathing RED back to CYAN
      LED_SetRGBColor(RGB_COLOR_CYAN);
      IWDG_SYSTEM_RESET = 0; // reset IWDG flag for good measure
      
      // Optionally, we can log that 
      //a watchdog timeout occurred.
    }
  }
}

@BDub I was able to get the Red Breathing back to Cyan with your code. I switched it go back to Cyan after 1 min just to see it working and it did just fine so I switched it back to 6 hours.

It sounds like @david_s5 has been making significant progress on solving the real underlying cause of these issues. But even when thats fixed I feel much better that there is a working Watchdog just incase.

I’m learning so much so quickly.

I’m just happy I have a reliable Spark Core now LOL I’ve spent a good mount of my time the last 7+ days playing with this and its nice to be able to start putting some projects together that I know will stay connected without any interaction from me. Awesome!

1 Like

@BDub thanks for the feedback, I wasn’t aware of that complexities you were highlighting, let’s consider mine as end user suggestions :smile: May be those should be picked up by the Spark team for a consolidated milestone, may be they’ll just get discarded :wink:

With regards to the user activation of the watchdog I do perfectly understand your point, but to defend my point of view:

  • Arduino doesn’t have it active by default and it looks like that is the most used platform nowdays
  • If it’s active by default and you are unaware of that you might decide to put your :spark: to sleep for 30 secs and suddenly get a soft reset… weird from a new user perspective (read this as tons of support requests by unaware users)
  • If it’s disabled by default and the user activates it then he does (should) understand what is going on and how to use the watchdog
  • if the user is aware of the watchdog and he knows he needs more time to loop then he might explicitly invoke the watchdog timeout reset explicitly to prevent unwanted resets
  • all of the above is made more important if you consider the :spark: has an important quantity of codelines running within the main function: if you are unaware of the watchdog existence (most of the user base is) your code can easily interfere with the wifi and cloud capabilities. I believe this was one of the reasons why the folks at Spart had it disabled

Once again please consider my thinking as the end user perspective: I’m quite new to micro controller programming while being quite experienced on service and web development.

@BDub

The Watchdog has been up and running for almost 24 hours without any user interaction from me! It has reset the core several times but you couldn’t tell when you look at the data graphing its sending to Xivley every 5 seconds.

I set the RED led to only stay on after a watchdog event for 30 mins and then revert back to breathing Cyan since I’m watching it closely and because I gives me a better idea of when and how often its happening vs a 6 hour RED to Cyan reset delay.

1 Like

This is not really a valid argument because the Arduino does not have a complex background task set that handles a connection to the internet. Because of that, we need a hardware watchdog timer. The arduino could care less if you go into a hard loop in your code forever.

This is a good point, but not a good reason to disable the watchdog. We need to make sure the code handles this case when you call sleep(), and either disables the watchdog… or perhaps it’s handled already in hardware because it makes sense if I put the micro to sleep, I don’t want to reset in the middle of that sleep. Not sure, but either way, it’s part of the bigger picture in working in the watchdog timer fixes with the rest of the code.

This would prompt someone to learn how to use it, but also not a valid reason to keep it off. You want it ON and protecting your application from staying offline when you DON’T know any better and do things like create long delays or hard loops in your code.

Yes, however the watchdog timeout is currently set to 26.208 seconds in my code… and if you waited longer than 10 seconds the Core is going to miss it’s time to handshake with the server and they will get out of sync, forcing a reset. You COULD punch the handshake code AND watchdog timer reset in your user code… but that’s really just an example of going outside of the architecture of the Spark Core for some odd reason in your code. Maybe because you don’t want to implement a proper state machine. Either way that can be written up in an example routine to demonstrate how to do that sort of thing, if you need it.

The watchdog timer is actually CURRENTLY in ALL spark cores… ENABLED. It won’t ever be used those based on the way it’s cleared… every second through an interrupt service handler. So Spark wants it enabled, but we must help them come up with just the right “tuning” for it that works for all cases. See zachary’s posts above.

@rlogiacco I appreciate your comments and feedback… I do hope this sheds some light on the subject for you and hope we can continue to noodle on this problem! :smile:

Thanks mate, I’ll keep speaking my mind hoping anybody reading us can benefit to a certain extent :wink: On top of that I find these convo interesting and valuable.
BTW, while I’m not at all into the insides of the :spark: I do understand the watchdog and its intrinsic value, just trying to voice the average user.

Would it make any difference if I add the fact I always meant to have the watchdog active around internal code while let the user control if he wants it active around his own code?

I know the Arduino doesn’t have running code besides the user code (well, excluding the boot loader, but I wouldn’t count that) while the :spark: has plenty, but that shouldn’t be an end user concern unless he wants to.

Please consider I’m not planning to do anything which should let the watchdog kick in myself, but I’m sure somebody out there will certainly come out with some valid reason. On top of that I wasn’t aware of that 10secs limit for cloud synch… Actually that is a very good point on your perspective: if 10s is a boundary already for network code I believe the only answer I’m left with is ‘but I might want to have it while disconnected’… Corner case, I know…

@BDub Have you testing any of the new updated code that @david_s5 has worked on here Davids latest Firmware

I’m interested in trying it, I think I loaded it on the Spark Core but your watchdog is still activated so I’m not sure if I did it right.

Can we blend his recent improvements with your watchdog feature?

The reason I ask is because my core seems to be loosing the connection alot more than it used to before I loaded your code. But since it resets it it keeps me online anyway. I wondering if his new code will keep me online for longer between resets.

Let me know what you think.

I haven’t really been able to test anything that fixes CFOD because I never see CFOD… not reliably anyway.

I would wait until David works in his changes. He’s also touching the IWDG, so hopefully it all works out well! We’re discussing it in the CFOD thread.