I’m running two Photons on 0.4.4. At random time through the day the photons go offline and online. This happens anywhere from 3 to 10 times per day.
The weird part is that once the Photon comes back online, it does not call void setup(). During the “reset” the photon continues running as planned, and there is no SOS light or anything.
More than a reset, it seems like the Photon is loosing connection to the cloud and once it connect again it triggers the offline/online event. This is just my opinion but would love to hear what other think.
You are correct, the most common cause of this behaviour is a problem somewhere between the device and the cloud.
For brief interruptions, you may not even see an offline event, only an online one.
There can be numerous causes of this, the vast majority of which are beyond the control of either you or particle (e.g., your ISP, their connection(s) to the internet, the interbnet as a whole, etc etc.)
Another really common thing is that your DHCP lease expires and you have to wait while it is gets renewed. This happens to me every 24 hours, but if I wanted to I could configure my router to give out longer address leases. Not really a problem in a robust system that recovers gracefully.
Yes, this can be a problem, depending on how you are delaying. The actual delay(10000); function handles servicing the cloud while it is waiting, but a loop that you are running that is longer than about 10 seconds (the cloud time-out), will cause problems. You can help this by adding a call to the cloud service routine when you have time for it, which is Spark.process(); on Photon and slightly different right now on a Core.
Thanks @bko. I don’t have many delays, but I do have a lot of copying and parsing of strings. I will add some Spark.process() in the slower pieces of code.
What about it in webhook handlers? Here is a extreme example, let’s say I have a handler that takes 20 seconds to run. But it has Spark.process() every 4 seconds. Would that work well?
I am not sure, but I would urge you to think of your webhook handler more like an interrupt handler: Do the minimum amount of work possible to copy the data or whatever, and set a flag that there is work to be done. Then in loop() you can read the flag and do the work.
Can I suggest renaming the thread to something that is less ominous sounding ?
I re-iterate - this is expected/unavoidable behaviour; and besides, it sounds like the thread is swiftly pivoting to something more about recommended programming practices.
Hello @AndyW and @bko. Just wanted to report back that after heavily reducing the amount of code on the handler, the number of disconnects has been reduced ~40.