Blocking Particle.connect() and publish()

Without a hardware based watchdog it’s basically impossible to accomplish the full system reset once the device locks up without external hardware. The device has a hardware watchdog built in but it’s not enabled or useable for some reason which causes us to add external watchdog circuits.

I’m adding a watchdog chip to a Photon design now for this very reason.

1 Like

Interesting, did you document it anywhere, or mind sharing more information about it? Would there be any differences between the Photon and Electron with this?

The blocking issue is the one major issue still remaining in the design, both HW and FW, I worked on throughout 2018.

Check this out:

I assume it’s not the expectation we will have the STM32 Hardware Watchdog feature in 0.8.0 anytime soon?
And I suppose regardless of the watchdog used, an Electron in Deep Sleep won’t pet the dog and therefore needs to be woken up in time to do so? I’ll get started on re-creating this HW design today (likely going for a 5m interval) :slight_smile:

From what I experienced, Particle.connect() locking up a device appears to be related to limited cellular connectivity? My devices that connect easily (e.g. <30s from a cold state) usually do fine, but devices that struggle to connect often are the ones locking up as well.

You can find Watchdog chips with wider pet the dog inteverals if you do not want to wake up every 5 mins to signal it.

My understanding from the comments in your other Post was this would best be accomplished with MANUAL Mode :

But you have reasons to not want to modify your code for Manual Mode.

So, If you use an external watchdog, you still have the possibility of the Electron burning through battery for the full length of your timer interval during a Code Crash (1 hour cycle would be too long, especially considering the next connection attempt could fail/lockup).

If you perform 5 min Deep Sleep with an external watchdog as @RWB suggests, the only thing the Electron would do every wake cycle would be to Pet the H/W watchdog, increment a counter, and enter another 5 min Deep Sleep until it’s time to publish. That seems realistic without requiring too much modification of your Code and shouldn’t take too many milliseconds of precious battery power.

I guess you have to decide what’s the power cost of waking at a short interval to kick the external watchdog, verses the power lost during a LONG interval while your Electron is “locked-up”. But either way, the external watchdog “should” eliminate the need to manually reset the Electron.

Or Manual & Threaded using (waitFor(Particle.connected, connectionFail) and not have a hardware revision :sunglasses:

3 Likes

Switching to Manual mode is easier than adding more HW to my design, so I would favour that, but I wasn’t 100% sure if Manual mode is a 100% failsafe method for blocking code. Reliability is key for my application. I believe Particle.connect() and Particle.publish() are the only two blocking commands that cause issues for me right now, but it’s hard to be sure.

I give my devices up to 3mins. to connect - so a 5min. timer seems appropriate to me. A timer like 1h is too long if the code blocks more than just a few times per year. Like you said, my idea would be to have 2 timers in my FW - one that fires every x min. to pet the HW watchdog, and one that fires every 1h. to publish.

But 5min. right now is a random number - I’ll have to do the calculations to see what the most optimal number is power-wise.

1 Like

Some of the Elites would have to speak to that, but my guess is that external watchdog would be the only way to approach 100%. But for me, it’s [Manual/Threading] worked for a few projects that were extremely sensitive to wasting precious battery (primary, no recharging available). But then again, I don’t have 50 Electrons running Manual/Threaded for a decent sample size.

I don’t know what all your code does, but what impact would using Manual/Threading, but No timers, No ApplicationWatchdog, no external watchdog have? It would just be a 1 hour deep sleep, wake up, allow up-to 3 minutes for a successful connection, then go back to deep sleep for 1 hour no matter what (the 1-shot approach)? You could throw in a system.reset every 24 hours or once a week.

But there will always be a tiny chance that the cell modem can get stuck in a funny state and continue to waste power. H/W watchdog would be the best thing that I can think of to mitigate that, but you will need to recognize that situation first (I’m not sure how to).

3 Likes

That would work great. The 1h Deep Sleep is exactly what my devices do right now, so if Manual can address the blocking issue essentially nothing else will need to change about my design.

I will definitely start switching my FW to Manual, and likely will implement the HW Watchdog too, for 100%- bulletproof sake (+ it looks like a fun mini-project).

3 Likes

So I did some calculations, and want to go with a setup that resets my device every 20mins (unless the dog was petted). The type of reset doesn’t matter much, because my device should go into Deep Sleep mode right away anyway (unless the accelerometer actually detects movement).

These are the latest schematics from the thread you linked:

Schematic part 1: Circuit Protection - I already have this, using the TPS61099.
Schematic part 2: Power Control - I don’t need extra Power buttons.
Schematic part 3: Carrier Board - I don’t need a Temperature sensor or FRAM. Just the Watchdog.

Am I correct in thinking that the only part from this entire schematic I need would be the TPL5010 (Watchdog) with its 3 resistors (values appropriate for a 20m. timer)?

Looks like all you need is this with the resistor combo that gets you the delay your looking for.

4 Likes

Thanks, confirmed what I was thinking :slight_smile:

2 Likes

Just make sure you read the data sheet since there are usually some pretty good tips in them that you may miss otherwise.

4 Likes

I was curious if you (or anyone else) could explain to me if there’s any difference between these two pieces of code? With my original code, I would still have the occasional blocking Electron that required a manual reset to be done.

Original code:

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);

void loop() {
....
    if (!connecting) {
        Particle.connect();
        connecting = true;
        }
    if (Particle.connected()) {
        Particle.publish(publish, data, PRIVATE);
        ...    
        }
    else if (millis() - stateTime >= 180000) {
        Cellular.off();
        delay(2000);
        trueReset();
        break;
        }
...

Revised code:

SYSTEM_THREAD(ENABLED);
SYSTEM_MODE(MANUAL);

void loop() {
....
    if (!connecting) {
        Particle.connect();
        connecting = true;
        }
     if (waitFor(Particle.connected, 180000)) {
        Particle.publish(publish, data, PRIVATE);
        ...    
        }
    else {
        Cellular.off();
        delay(2000);
        trueReset();
        break;
        }
...

The waitFor will timeout and move to else after 3 minutes

1 Like

So when Particle.connect() ends up blocking, it still wouldn’t time out after 3 minutes would it? Seems like both pieces of code effectively do the same thing. I’ve had devices block with my Original code (System Threading + Manual) so I’m guessing FW-side there isn’t much else to do here to reduce the issue?

I used to just have the Electron go back to sleep until the next publish event when the connection timed out.

I was sending data every 5 mins so the wake up would not take to long.

1 Like

I’m probably missing something here, but I would think your “Revised” Code should work.
If a Publish was missed using the Revised Code, maybe it was just that the Electron couldn’t connect to the cellular network during the 3 minutes (poor signal, etc) ?

I would guess you would rather go to sleep in that case verses Reset.
It was suggested to me in another Thread to skip the “else” and go to sleep after the waitFor, since it either Published or Not (didn’t really matter). Again, I’m not sure if this helps in your Project.

When the connection attempt fails, I reset the device up to 3 times, before I put it back into Sleep mode. I did this, because my devices don’t publish much but when they do it’s pretty important they do so successfully.

Anyway, publishes being missed isn’t the real issue here to me, as that’s simply related to overall product constraints. The real problem I have is that after I call particle.connect() in my Original Code, the device occasionally blinks green endlessly for hours, sometimes until the battery goes dead alltogether. Usually the 3-minute timer described in my code kicks in correctly and prevents this, but not always.

As far as I understand, since particle.connect() is blocking, timers aren’t failproof and there isn’t a good way (?) to mitigate the issue FW side. Hence why I’m already adding in the HW Watchdog in my next version.

Note: trueReset() in my code puts the device into Deep Sleep for 30 seconds before waking it up. Any connection failures are therefore always followed by Deep Sleep.

Check the battery and voltage converter voltages during this constant flashing green connection issue because I saw that happen when the battery voltage was low and not when it was higher.

Plus it’s a cold time of year so colder temps usually cause battery voltages to drop vs warmer temps. Cellular RSSI will help determine if your the device has weak cellular signal as weather can affect signal strength also.