While looking into an issue with my AWS Lambda webhook handler that triggered massive response delays, I noticed some odd behavior. About 1-3% of all webhooks from my product are failing because of timeout errors. While I would expect the error if the response took more than 5 sec. My responses normally have an average delay of around 1 sec and a max delay of < 4 sec. Has anyone else seen this behavior?
Here's an example from this morning. The response delay was around 1.5 seconds in my logs. Well within the 5 sec limit. Just quickly comparing the logs, there are several other calls with larger delays that were successful.
Here is the matching AWS logs (9:30 ET is 13:30 UTC):
It’s not a big issue as the requests are still successful from my perspective, but it does create more traffic on both your and our servers because of the retries.