Cloud Webhooks were not triggering


My Boron devices publish updates every day, but from Apr 15, particle was not receiving the publish topics from device and hence not triggering the webhook to server. After restarting, today it starts working again, no change in code on Boron etc. Then another device same thing happened on Apr 24, had to restart that too. Why does it happened? Some sort of session expired for devices for MQtt server ? Do i need to periodically restart my devices?

Interestingly device was receiving the data and performing actions like opening relays based on the data published to device from particle API all that time.

Any one does have any clue or encountered this issue?

That looks like the integration log. If you looked at the event log while a device was in this state, does it publish? I suspect the answer is no, but it would still be good to know for sure. Since the event log does not have history, you won't be able to tell what happened in the past, only if it's happening now.

When you trigger things from the cloud side, is it subscribing to events on device, or using functions or variables?

What is the purpose of MQTT? Are you also running a MQTT client in your code? (The Particle cloud uses CoAP, not MQTT.)

It would be very unusual for the device to be able to receive data from the cloud but not be able to publish, because they share the same CoAP data session. I can't say it's impossible, but it would be very unlikely.

You don't need to periodically reset your device, but I often do have mine automatically reset once a week. This is mainly to deal with accidental memory leaks or heap fragmentation, which could cause your firmware or Device OS to behave unexpectedly if the device ran out of memory.

One thing that can prevent publishing is a bug causing it to exceed the rate limit. If this occurs, all of the publishes will be discarded. One case where this can occur is if you miscalculate your publish interval by using millis() but not using the form millis() - lastTime >= interval. If you try to remember the next time instead of the last time, when the millis() counter rolls over after 49 days, the logic fails and the device will attempt to publish on every check, and typically get rate limited.

You should also check the device vitals for devices in this state to see if there are unusually high error counts.

I was able to open and close relays. But the relay status, battery status or any kind of publish was not happening. Ping to device was working so i thought there may be issue on my end but could not find it. This is the first time i encountered this in years.

For rate limit control i do this
if( millis()>nextPublish)
{ nextPublish=millis()+1000;
publish }

I guess we found the issue, Thanks @rickkas7 .This will fail on role over.