What does ESOCKETTIMEDOUT mean?

mterrill · February 3, 2017, 3:07am

Hi,

{"name":"hook-error/pushtemps/0","data":"ESOCKETTIMEDOUT","ttl":"60","published_at":"2017-02-03T03:08:02.706Z","coreid":"particle-internal"}

Does it mean the upstream webhook destination timed out?

Does it mean it couldn’t send the response to a downstream photon?

Why aren’t there transaction ids to link hook-sent with hook-error or hook-response?

Mark

kennethlimcp · February 3, 2017, 7:57am

@bryce

mterrill · February 3, 2017, 7:59am

Hi, I sent an email to support before.

It’s really starting to drive me crazy. I don’t know if its on my end, or Particle trying to send the ‘yep, heres the hook response’ to a photon that may or may not be online.

There’s a strong likelihood that the issue is on my end, as I’ve changed some lambda things to massage statuscodes towards particle so they don’t trip the retry. However that particular error don’t make sense and its filling my particle subscribe mine trail

mterrill · February 3, 2017, 8:01am

oh, and knowing what the particle webhook request (upstream to webhook destination) / response (to photon) timeouts are would help…

EDIT: it could be as simple as if my upstream AWS lambda takes 5 seconds (what its set to as its timeout), the particle webhook is giving up at say 3 seconds and throwing a esockettimedout

EDIT2: If I change my lambda timeout to 3 seconds then I get to see that return as a hook-response. Going to try 4 now…

EDIT3: 5 seconds seemed to be what was breaking particle webhook. I now can see it cleanly returning a timeout without triggering esockettimedout

bryce · February 7, 2017, 4:40pm

5 seconds is indeed the timeout when issuing the HTTP request to your server. There would never be an error sending the response if the Photon was offline, because it is a PUB/SUB system. If the Photon is not online, it simply would not see the response.

mterrill · February 7, 2017, 9:13pm

It’d be good to a) expand the ESOCKETTIMEDOUT error message to indicate webhook destination did not complete in 5 seconds
b) detail in those docs we discussed via email all the error messages (which I presume is a neat list somewhere at a top of a code file)

Not to beat anyone up, but for context: It was quite frustrating and many hours consumed to progressively hunt down and solve this issue. Where is the error coming from? Who timed out? Particle photon, particle webhook system, AWS API gateway or AWS lambda? It took me quite a while to prove Particle wasn’t at fault, there was just a tight constraint and an ambiguous error message.

5 seconds isn’t a lot of time, I imagine a number of people are processing webhooks and sending push notifications and that process can easily chew 4 seconds. Knowing the constraints would inform a different (and more complicated) task creation design pattern on reception of a particle webhook.

I’m really looking forward to being able to issue a publish to a non-particle destination using the TLS cert of my choosing!

jeiden · July 18, 2017, 5:12pm

Just to expand on the conversation here, I did some looking into this issue today. There are two different timeouts that can occur when delivering a webhook/integration: ETIMEDOUT and ESOCKETTIMEDOUT.

There are two main types of timeouts: connection timeouts (ETIMEDOUT) and read timeouts (ESOCKETTIMEDOUT). A connect timeout occurs if the timeout is hit while Particle is attempting to establish a connection to a remote machine (the webhook destination). A read timeout occurs any time the server is too slow to send back a part of the response.

In both scenarios, the timeout value is set to 5 seconds. Note that in either case, the failed request will be retried. Our system will retry 3 times quickly, then sleeps and retries 3 times more, up to 10 total times.

mterrill · August 12, 2017, 1:53am

Thanks @jeiden, I hope the docu has been updated.

You may also find a support email thread or community thread where I suggested that doing a timeout after 5 seconds and retrying is not the best approach. When I last looked at it, anything but a 200 status code triggers the system to try again which is obviously not standards based and has in the past caused the particle system to ignore that webhook for my fleet. The code was giving a valid error to a client, then the particle webhook system would keep thrashing it but as it kept getting the error code back it would try again and then mark the webhook as invalid for the whole fleet.

That’s just silly. I heard the reason was for people who were developing and/or didn’t know how to keep a webservice up that it would help them. Well, it doesn’t help anyone else and flies in the face of logic and web standards.

So, two good options: Publish an AWS style best practice architecture guide to show folk exactly how to setup a job queue system external to Particle to quickly accept a job and acknowledge it always as a 200 (which is what I’ve done), or fix your system so it handles 500/400/300 status codes appropriately. Also 5 seconds is pretty short if folk don’t quite yet understand that they need a job system middleman if they’re using the webhook to send push notifications for instance (which easily takes a few seconds of processing/acknowledgement).

Topic		Replies	Views
Particle Webhook Socket Timeout in Error Log Cloud electron	2	1338	September 17, 2019
Electron and Webhooks ESOCKETTIMEDOUT Troubleshooting	7	1999	October 3, 2017
Webhook "sleep" errors.... help! Cloud electron	16	768	October 24, 2019
ETIMEDOUT, Errors and Sleep Cloud	9	1056	February 13, 2020
[Solved] ESOCKETTIMEDOUT errors Troubleshooting	8	2857	November 9, 2017

What does ESOCKETTIMEDOUT mean?

Related topics