Particle.process() vs. ApplicationWatchdog.checkin()

syrinxtech · May 2, 2018, 12:11am

From the docs:

“If the application has not exited loop, or called Particle.process() within the given timeout, or called ApplicationWatchdog.checkin(), the watchdog calls the given timeout function, which is typically System.reset.”

My question is this: Is there a fundamental difference between the two? In other words, is there a reason why I would call one vs. the other? I’m having a problem with a Photon kicking in the WD timer every minute. My loop() is getting longer and longer and I’m wondering if I need one of these somewhere near the middle.

Of course, I’m at the mercy of IFTTT which is what I use to determine uptime. Sometimes I get 5-6 emails, each 1 minute apart, indicating the Photon went offline and immediately came back online. The emails are one minute apart which is why I believe it’s tied to the WD timer. I run in SEMI-AUTOMATIC mode if that matters.

syrinxtech · May 2, 2018, 1:28am

This is what just showed up on IFTTT:

ScruffR · May 2, 2018, 9:32am

ApplicationWatchdog.checkin() does only that, while Particle.process() does mainly other things related to the cloud house keeping and does the checkin as well as “side effect”.
Also be aware, that delay() in non-SYSTEM_THREAD(ENABLED) mode does also call Particle.process() internally.

However, loop() should actually not take that long. Non-blocking coding would be good practice in any case.

syrinxtech · May 2, 2018, 9:52am

Thanks for the clarification.

I agree on the time required for loop(), which is why I can’t explain all of the many IFTTT messages spaced a minute apart. I do have a couple of minor delay() commands in loop(), usually less than 500 ms. I’ve been over the code for days, and I can’t see any obvious examples of blocking code.

I haven’t been able to rule out a flaky Internet connection yet, so I’m trying to focus on code-related issues. I also haven’t ruled out any potential h/w-related issues as well. The problem is that this device is sitting off-site and I don’t have physical access to make logging easier.

syrinxtech · May 2, 2018, 3:12pm

Well, I don’t know if putting either in an environment where it’s not needed, but I tried both wd.checkin() and Particle.process() (not at the same time) in my loop(), and both immediately crashed the whole unit and caused every Freeboard display to go entirely crazy. I lost the ability to read temp/humidity and random variables changed frequently. I’m guessing the problem is that each command was run at least every second if not faster.

Even calling either of these just once per loop() caused everything to go to hell in a handbasket. Does anyone have a short example of where they have used either command successfully?

justicefreed_amper · May 2, 2018, 3:19pm

@syrinxtech wd.checkin() wouldn’t be causing that, but Particle.process() can be a blocking call, and if you are having connection issues can take some time (I think I’ve seen it take 20 seconds once or twice, though perhaps someone more knowledgable can comment on that time value) (EDIT: I’ve realized that this is in fact not true after some more tests, it should not be blocking). A very first step you can take is to only call Particle.process() conditional on Particle.connected() returning true. I haven’t done extensive testing to know if that changes anything, but it seemed to at the time - worth a shot.

That said, random variables changing frequently is quite the error - can you explain more what you are seeing?

EDIT: Rereading your comment, you’re saying you see that issue with wd.checkin() alone, with no Particle.process()? If so, could you please share the code where you declare, initialize, and checkin the watchdog? I suspect something in the declaration or initialization perhaps.

syrinxtech · May 2, 2018, 3:27pm

Here is screenshot of Freeboard when things are “normal”:

I first tried Particle.process() in loop(), and I saw things such as:

Temp and humidity stopped reporting.
Various indicator lights would come on, sometimes all of them. Then, randomly, they would go and come back on.
Very weird “jibberish” characters would sometimes show in the text fields, such as the “Last Motion Detected” field.

I couldn’t even re-flash until I powered the unit off and back on, quickly re-flashing code with the command removed. Then, I replaced Particle.process() with wd.checkin() and the same thing happened. Sometimes I would get temp readings like 1519823982.0 or a humidity reading of 189982989%.

As soon as I take the commands out and reboot, everything is fine.

syrinxtech · May 2, 2018, 3:29pm

@justicefreed_amper, here is the wd statements:

Declaration:  ApplicationWatchdog wd(60000, System.reset, 1536);      // Watchdog timer - 60 seconds
Init??  - not sure what that means
Checkin:  wd.checkin();

syrinxtech · May 2, 2018, 3:35pm

This is a serial monitor of what it looked like when I had wd.checkin() in loop():

|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:53 AM|Calling wd.checkin()|
|5/2/2018 10:30:54 AM|Calling wd.checkin()|
|5/2/2018 10:30:54 AM|Calling wd.checkin()|
|5/2/2018 10:30:54 AM|Calling wd.checkin()|
|5/2/2018 10:30:54 AM|Calling wd.checkin()|
|5/2/2018 10:30:54 AM|Calling wd.checkin()|

justicefreed_amper · May 2, 2018, 3:38pm

Yeah that all looks correct. Are those readings from an ADC on the electron itself? A timing / code blocking issue could possibly cause a reading to be taken inappropriately, given a garbage value. The LED and character garbage is really strange though. I would suggest trying out some of your code with a bunch of things commented out, but leaving wd.checkin() in, and seeing if you can narrow down other operations’ effect on the change of those thing (leaving the variables themselves that get changed of course). Unfortunately don’t have anything else off of the top of my head.

One other thought - does this happen if you declare the watchdog but never checkin? The watchdog allocates memory for itself, and if you are pushing the memory limits that could cause some weird stuff.

syrinxtech · May 2, 2018, 3:44pm

It’s a photon, not an electron if that matters running 0.7.0. I know things were much more stable under 0.6.3 but I certainly can’t prove that in a court of law. The only reason I began looking at either of these commands is because of the constant IFTTT “unavailable” messages (see my post at the top of this thread). I get them in batches (15-20 separated by a minute or sometimes 2 minutes). Since I believe that IFTTT tests every minute, the only thing I thought of was that the wd timer was rebooting for some reason.

Regarding memory, here is the last compile numbers:

When declaring the wd but not checking in, I don’t see some of the craziness and I don’t usually lose control over the device. When I do checkin I almost immediately lose control over the device and the craziness kicks into high gear.

syrinxtech · May 2, 2018, 3:47pm

FWIW, I’m running in SEMI-AUTOMATIC mode and I have the SRR feature enabled.

In all, the main module and related libraries total 1,711 lines of code.

justicefreed_amper · May 2, 2018, 4:18pm

Yeah, memory looks like it should be fine, though it would be good to check the runtime free memory as extra assurance for all the dynamically allocated memory during runtime.

FYI, here’s the code for checkin():

   /**
	 * Lifesign that the application is still working normally.
	 */
	static void checkin()
	{
		last_checkin = current_time();
	}

which calls:

static inline system_tick_t current_time()
{
	return HAL_Timer_Get_Milli_Seconds();
}

While things are running, the watchdog thread calls this return function, which in turn calls loop, below:

os_thread_return_t ApplicationWatchdog::start(void* pointer)
{
	ApplicationWatchdog& wd = *(ApplicationWatchdog*)pointer;
	wd.loop();
	os_thread_cleanup(nullptr);
}

void ApplicationWatchdog::loop()
{
	bool done = false;
	system_tick_t now;
	while (!done) {
		HAL_Delay_Milliseconds(timeout);
		now = current_time();
		done = (now-last_checkin)>=timeout;
	}

	if (timeout>0 && timeout_fn) {
		timeout_fn();
		timeout_fn = std::function<void(void)>();
	}
}

We can assume that it’s not calling the timeout_fn() since your device doesn’t appear to be resetting (validate this by creating your own callback function for the watchdog), thus it should be in the primary while loop. I can’t see anything here that should be messing with anything. I’m not seeing anything suspect here, but thought I’d share in case it sparks something for someone else.

Since wd.checking() is already called between your application loop() calls, it must be some interaction between code that only is running during loop(). What kinds of operations have you been calling it between?

Really weird stuff, regardless.

peekay123 · May 2, 2018, 4:28pm

I agree with @justicefreed_amper regarding dynamic memory allocation. However, looking at free memory is not a good indicator of what is really happening. What is most important is the size of the largest available contiguous block of available heap. Very often, folks don’t consider that a dynamic allocation may fail and, as such, don’t deal with the exception. A classic killer of the heap is the use of Arduino Strings or std:: functions. Have you taken this into consideration in your code?

ScruffR · May 2, 2018, 5:02pm

Where does this come from? I've never experienced any "blocking" beyond a few milliseconds.

justicefreed_amper · May 2, 2018, 5:26pm

Where does this come from? I’ve never experienced any “blocking” beyond a few milliseconds.

I might be incorrect - when I recall thinking that was happening it was a long while ago. It seems I'm probably wrong after looking into it again. There had been a thread somewhere but I just ran a few tests and confirmed that at least in normal operation it isn't. Thanks for the correction.

syrinxtech · May 2, 2018, 9:40pm

@peekay123, I normally use the System.freeMemory() routine during setup(), after all the various initialization routines are done and the globals are allocated. Although as I look at this code I see that I didn’t add it. I will add it and see what is says.

As far as I know, the only Strings I’m using is 3 Freeboard variables that rarely change. I will rewrite them using char[]'s.

Topic		Replies	Views
Watchdog question Device OS	2	543	January 9, 2022
I'm trying to test the application watchdog Troubleshooting	24	5083	June 13, 2017
Newb application watchdog question Firmware	2	1921	May 27, 2016
Application Watchdog example broken Firmware	3	559	October 10, 2020
Clarification on AWD Troubleshooting	5	623	April 16, 2018

Particle.process() vs. ApplicationWatchdog.checkin()

Related topics