Understanding why a webhook sleeps

One device pushing to my server was throwing a 500 error earlier, resulting in the entire webhook going to sleep. However, after a while, some events started to get through successfully. There haven’t been any errors for hours, yet still most (~75%) of the webhooks are listed as “sleep”. I have read through the firmware documentation section on webhooks several times and have come up empty handed.

Why do webhooks go to sleep? How can I get them out of sleep? and is there any way to bypass this functionality?

Let me ping someone that might be able to help, @rickkas7 or @ParticleD are you able to assist?

Webhook sleep occurs when errors occur contacting a given webhook endpoint.

Previously, there was a fixed maximum rate for a given server URL, and it needed to be manually increased if the load exceeded that rate.

Now, the rate can go as high as necessary, subject to throttling because of errors. Once a given host URL starts throwing errors, a cool-down period is attempted, throttling the rate of requests to that webhook. When your webhook is throttled, that’s what’s marked as sleep. Those events are not retried later, as that would just exacerbate an overloading situation.

The problem should resolve itself in a few minutes as rolling averages are used to calculate throttling. And it’s not specifically for your webhook; it’s determined by host URL. Of course it’s also possible that things unrelated to load could cause errors, so the method is not perfect, but it appears to be more effective than the old fixed rate limit.

3 Likes

@rickkas7 Do you know if there is any difference in throttling when a server returns a 4xx error code vs a 5xx error code? Would throttling only occur when there are 5xx server errors?

We’re having issues with sleep lately, and looking at the logs it seems that we’re returning a lot of 500 error codes, when really the issues are due to 4xx type problems. I’m hoping that just switching to a 4xx code would solve our troubles with too many webhook sleeps?

Also, do you know if having issues at one API endpoint on the same server could affect endpoints? For example, if on the same server I have www.example.com/api/task1 and www.example.com/api/task2, would errors at one endpoint cause sleep at the other as well?