Electron + external watchdog (max6369) and safemode

tuxie · July 3, 2016, 3:42pm

I thought I’d ask here if you have any ideas on this tricky problem. I’m building electron based system which will be controlling some serious water amounts and magnetic valves connected to water pipes inside houses so I’d like to have external watchdog installed just in case. Max6369 seems to be very nice and I can set it to needing patting only every 60 seconds or so which means there shouldn’t be any problems during OTA software updates (other than during testing…).

However if/when the device goes into safemode the external watchdog is PITA because then I can’t safely try to flash the device via OTA update because the watchdog keeps turning the device off every now and then. Is there any good way to either detect that electron is in safemode or add the watchdog part also to the safemode?

ScruffR · July 3, 2016, 3:55pm

You might need to find a way to enable your MAX6369 from your code, which won’t happen in Safe Mode.

tuxie · July 3, 2016, 4:02pm

Yeah but that kind of defeats the purpose… I mean if I by having pin high/low I can disable it and something craps up the CPU and it happens to modify that pin, I’m all out of luck…

peekay123 · July 3, 2016, 4:05pm

@tuxie, I believe the MAX6369 has a setable startup delay but the max time is 107 seconds. You may need to adapt the system firmware to “tickle” the watchdog while in safe mode (only).

ScruffR · July 3, 2016, 4:06pm

Usually when pins are set once they don't just flip state for no good reason even if your code gets stuck

If that were the case you are out of luck anyway!
Who'd be able to guarantee that your valve pins don't just flip for no reason and you just keep tickling the watchdog since you can't know or trust the CPU for anything?

tuxie · July 3, 2016, 4:14pm

@peekay123 correct. I’d like to try to avoid having custom firmware as much as possible so I’m trying to find some toher way (if one exists)

@ScruffR I agree… However I’m not microprocessor professional and I’d like to be sure. For example bigger EMI blasts or similar could both freeze the CPU and flip states on pins as far as I know. It’s all about statistics unfortunately. As each system can control magnetic valve which is connecte to water mains to fill a tank I’d like to be extra sure just to sleep my nights well…Also as the devices are spread over several countries it’s a bit tough to go restart them if they somehow get stuck.

So is there any way to detect that the device is in safe mode by reading the state of electrons pins? Or is the only way to write custom firmware?

ScruffR · July 3, 2016, 4:24pm

In cases like this redundancy might be required.
Two or more independent systems monitoring eachother and a fallback if hickups are detected.

As said above, if you are going down the worst case scenarios you can't trust any setup.

What would you read that with? AFAIK there is no pin signalling this, only the RGB LED and I think Serial1 might be set for YModem (not sure about the latter tho')

tuxie · July 3, 2016, 4:39pm

We actually have redundancy by monitoring the system from the cloud and being able to see if it's down. Obviously not best possible solution but at least we can try to do something. Secondly there could be building automation systems with integrated leak detectors to help.

As for how the read the "safe mode status", I'd think there's a circuit that can be made should certain pins be set the certain way which would be able to control the watchdog? Not being electronics designer I don't have the foggiest of how to actually do it?

peekay123 · July 3, 2016, 5:30pm

@tuxie, I’m on my phone so no time for a full response. I have some ideas and suggestions that I’ll detail when I get back home in an hour or so.

peekay123 · July 3, 2016, 9:56pm

@tuxie, I’m going to blurt out a bunch of stuff so bear with me.

Safety is achieved at different levels. Systems should ALWAYS fail safe. I had a nice Panasonic microwave oven that recently failed in a non-safe way. When the oven was off and you opened the door, it TURNED ON!!! This scenario should never have been acceptable and a hardware interlock on the door latch was obviously software controlled, which should never have been accepted by CSA/UL.

For your system, it may be necessary to have hardware-level mechanisms to turn off all valves when the processor fails or power is cycled. You may want to use D-type flip-flops with a master reset pin to drive the valves. You could drive their master reset pins with the watchdog output to turn off all valves for example. Some classic chips for this include the 74HC174 (six flip-flops) and the 74HC273 (eight flip-flops).

For the OTA timing issue, we need to look at the OTA “stages”. With SYSTEM_THREAD(ENABLED), the user thread will run independently of the system thread. The first part of the OTA will download the actual firmware payload. Then the firmware is flashed and if all is good, the processor will reset.

To my knowledge (and to be tested), the OTA download will occur in the background, leaving the user thread running. When the OTA firmware flash begins, the firmware_update == firmware_update_begin system event will be issued. You could write a handler for that event to do one last “tickle” of the watchdog (and possibly putting all your valves in the OFF position), giving the system 60 seconds to flash and reset. Since the user thread will start right after the bootloader passes control to the system firmware, you could start refreshing the watchdog right away. So, if the OTA download is done in the background AND the system event can be leveraged AND the flash/reset can take less than 60 seconds then this should work.

Now, let’s assume the firmware update failed. In that case, a firmware_update == firmware_update_failed system event will be issued. You could write a handler that triggers a software timer that keeps the watchdog active during Safe Mode operation (again, to be tested) since software timers run in their own thread.

Another way to manage the watchdog is to use a hardware timer based interrupt (with SparkIntervalTimer). The ISR would be responsible for keeping the watchdog refreshed. This ISR will run independently of any system or user code so the only way it can stop is if the STM32 freezes. You could add timing logic in the ISR such that if the user code does not refresh a value within a fixed interval, the ISR will NOT refresh the watchdog pin, eventually causing a reset. The ISR could even turn off all the valves just before the watchdog resets.

These ideas are more a stream of consciousness thing than proven advice. However, I believe there is a good foundation for building a solid solution.

tuxie · July 4, 2016, 6:12am

Whoa, thanks @peekay123 A lot of good ideas there.

I really like the idea of having the watchdog control the magnetic valves (which are by the way normally off type so if not controlled actively to high they should be off). This might be the easiest method to use, albeit not the best for the whole system because a lockup would still need to be handled manually, but at least we could be sure that there can not be any major leaks this way which is my major headache in this point.

I agree with you that OTA most likely is not a problem. I’ll test out the software and hardware timers. If they work while in safe mode, then I think we have a pretty much perfect solution right there. Though it would have to work from cold start also so if the timers need to be started before entering safe mode while havig power it would work.

Anyways thanks a lot for great ideas! I’ll try to remember to report back once I’ve tested the timers!

tuxie · July 6, 2016, 9:15am

At least if I call the System.enterSafeMode(); the hardware timers seem to clear and stop working. So I guess the only way to continue is to control the magnetic valves with watchdog and hope that it’s good enough!

Topic		Replies	Views
External Watchdog and Sleep - Advice needed Troubleshooting	28	4950	October 12, 2017
Improving Electron Reliability - External Watchdog Timer Project Share	28	5949	September 20, 2019
External watchdog and the reset pin HIGH period Hardware	3	1148	April 28, 2017
Sample code for TPL5010 Troubleshooting	14	3417	May 25, 2018
[Solved] Speed Up SOS reset - interfering with Hardware Watchdog Timer Firmware	1	664	October 11, 2018

Electron + external watchdog (max6369) and safemode

Related topics