@tuxie, I’m going to blurt out a bunch of stuff so bear with me.
Safety is achieved at different levels. Systems should ALWAYS fail safe. I had a nice Panasonic microwave oven that recently failed in a non-safe way. When the oven was off and you opened the door, it TURNED ON!!! This scenario should never have been acceptable and a hardware interlock on the door latch was obviously software controlled, which should never have been accepted by CSA/UL.
For your system, it may be necessary to have hardware-level mechanisms to turn off all valves when the processor fails or power is cycled. You may want to use D-type flip-flops with a master reset pin to drive the valves. You could drive their master reset pins with the watchdog output to turn off all valves for example. Some classic chips for this include the 74HC174 (six flip-flops) and the 74HC273 (eight flip-flops).
For the OTA timing issue, we need to look at the OTA “stages”. With SYSTEM_THREAD(ENABLED)
, the user thread will run independently of the system thread. The first part of the OTA will download the actual firmware payload. Then the firmware is flashed and if all is good, the processor will reset.
To my knowledge (and to be tested), the OTA download will occur in the background, leaving the user thread running. When the OTA firmware flash begins, the firmware_update == firmware_update_begin
system event will be issued. You could write a handler for that event to do one last “tickle” of the watchdog (and possibly putting all your valves in the OFF position), giving the system 60 seconds to flash and reset. Since the user thread will start right after the bootloader passes control to the system firmware, you could start refreshing the watchdog right away. So, if the OTA download is done in the background AND the system event can be leveraged AND the flash/reset can take less than 60 seconds then this should work.
Now, let’s assume the firmware update failed. In that case, a firmware_update == firmware_update_failed
system event will be issued. You could write a handler that triggers a software timer that keeps the watchdog active during Safe Mode operation (again, to be tested) since software timers run in their own thread.
Another way to manage the watchdog is to use a hardware timer based interrupt (with SparkIntervalTimer). The ISR would be responsible for keeping the watchdog refreshed. This ISR will run independently of any system or user code so the only way it can stop is if the STM32 freezes. You could add timing logic in the ISR such that if the user code does not refresh a value within a fixed interval, the ISR will NOT refresh the watchdog pin, eventually causing a reset. The ISR could even turn off all the valves just before the watchdog resets.
These ideas are more a stream of consciousness thing than proven advice. However, I believe there is a good foundation for building a solid solution. 