While looking into an issue with my AWS Lambda webhook handler that triggered massive response delays, I noticed some odd behavior. About 1-3% of all webhooks from my product are failing because of timeout errors. While I would expect the error if the response took more than 5 sec. My responses normally have an average delay of around 1 sec and a max delay of < 4 sec. Has anyone else seen this behavior?
Here’s an example from this morning. The response delay was around 1.5 seconds in my logs. Well within the 5 sec limit. Just quickly comparing the logs, there are several other calls with larger delays that were successful.
Here is the matching AWS logs (9:30 ET is 13:30 UTC):
Continuing the discussion from Any adjusting for timeout?: