System issue? Webhooks failing but status.particle.io says everything is green

Hi,

I’ve been using my device for months now, and tonight there were two webhook issues (one which pushes temps, one that polls for config to be sent).

I resolved the poll by adding device_id to the hook response template and what the firmware was looking for, per Subscribe to a Webhook response triggered by a specific device. To be honest, this doesn’t make sense as it was happily working for months. Its an improvement so lets move on. [edit: per below, it now seems the delete/add process may have been the magic fix]

THEN I notice that temps are not being pushed properly, ie they were not showing in my user interface, the database, the API gateway, no where. I have a habit of leaving a terminal window on particle subscribe mine, so I know no particle subscribe errors were being bounced back. An example post:

Code: Particle.publish("pushtemps", publishString, 60, PRIVATE);
On the cli subscribe:
{"name":"pushtemps","data":"{\"f1\":32.00,\"f1t\":130,\"f2\":32.00,\"f2t\":180,\"f3\":32.00,\"f3t\":160,\"pit\":352.81,\"pt\":366,\"ph\":378,\"pl\":313,\"sp\":366.0,\"fan\":0,\"mode\":\"LIDOPEN\"}","ttl":"60","published_at":"2016-02-16T10:14:19.788Z","coreid":"1b003c00024734335555555555555"}
No error was observed bouncing back, and no hook-response!

This particular webhook posts through AWS HTTP gateway to do some magic in lambda. The AWS API / lambda stats showed the only request that has hit there is my manual test request via Postman (successful). Combined with no hook-response on the cli, this pretty clearly indicates to me that the webhook publish is not going where its meant to and its dying within the particle cloud.

status.particle.io says everything is peachy, but I don’t think so!

Hi,
There definitely has been some issue. I deleted my webhook (56a59b5f5e244c7f1c18e573) and created a new one (56c2fa87c5eb65d21dd0dcf6). [ps, it may simply have been this delete / add webhook process that got the ‘get’ working in the OP]

I would heartily encourage the Particle sys admins to research this and let me know the outcome as its clear you have an issue. I’m uncomfortable with the concept that when I start selling the units next month that Particle has undiagnosed cloud issues.

Let me know if you need further information, happy to help.

Hi @mterrill,

Hmm, looking into this.

Thanks,.
David

You can watch the dashboard logs for a hook-sent event that tells you the :particle: server has sent a webhook request should this happen again.

Hi All / @mterrill,

Thanks for the ping! Despite serving billions of hooks and operating for months without apparent issue, our webhooks system is still new, and bugs do happen. The nice thing is we wear pagers (virtual pagers), we’re friendly, and we’re pretty fast about fixing bugs. :slight_smile:

We have automated systems that monitor all our services 24/7, and most services that face the outside world like webhooks also have extra tests built in. Most hooks were being delivered normally, based on what these instruments show me.

I’m sorry that your hooks weren’t being delivered, sometimes the service might mute or stop a hook entirely if enough error responses come back from the target server, or if rate limits were exceeded. Generally in that sort of case, an error event would accompany that. But if removing / re-adding the hook fixed things, that would seem to be evidence that it was muted or having issues for some reason.

I’m not exactly sure yet what went wrong with your hooks, but I’d like to dig in and learn more, can you reach out to us at hello@particle.io ?

In the meantime I’ve kicked the service that was hosting your hooks, which I think will help with the issue you were seeing earlier today.

Thanks!
David

1 Like

Strangely enough, I think i’m getting the issue again.

{“name”:“gettargetsv2”,“data”:“null”,“ttl”:“120”,“published_at”:“2016-04-01T13:06:58.564Z”,“coreid”:“3f0035001347343432313031”}

tried on two photons, the other one hasn’t had any firmware updates and both fail. If I call the associated webhook manually, it returns the data as expected from the firebase webservice.

Guys, I deleted the webhook and created it again and instantly worked. You have a problem. I do love your support, but as someone who has done his time in sysadmin and programming I’d ask you please look into it as you definitely have a problem. Cached thread to firebase? silently dropping the connection outbound? silently dropping my webhook request?

It’s frustrating because its now after midnight and I’ve spent an hour chasing my tail (new pcb, big code changes to enable softap timeout) and finally remembered the issue I had the other month. I should have had my smoker fired up by now cooking pulled pork by now :slight_smile:

The particle monitor mine history as I deleted webhook (56c2eb98c5eb65d21dd0dcf2) and added it back (56fe736a3eb1d4781cd3d156).

{“name”:“gettargetsv2”,“data”:“null”,“ttl”:“120”,“published_at”:“2016-04-01T13:10:43.818Z”,“coreid”:“3f0035001347343432313031”}
{“name”:“debug”,“data”:“no config received”,“ttl”:“60”,“published_at”:“2016-04-01T13:10:58.320Z”,“coreid”:“3f0035001347343432313031”}
{“name”:“gettargetsv2”,“data”:“null”,“ttl”:“120”,“published_at”:“2016-04-01T13:10:58.590Z”,“coreid”:“3f0035001347343432313031”}
{“name”:“debug”,“data”:“no config received”,“ttl”:“60”,“published_at”:“2016-04-01T13:11:13.318Z”,“coreid”:“3f0035001347343432313031”}
{“name”:“gettargetsv2”,“data”:“null”,“ttl”:“120”,“published_at”:“2016-04-01T13:11:13.587Z”,“coreid”:“3f0035001347343432313031”}
{“name”:“hook-sent/gettargetsv2”,“data”:“undefined”,“ttl”:“60”,“published_at”:“2016-04-01T13:11:13.635Z”,“coreid”:“particle-internal”}
{“name”:“hook-response/gettargetsv2_3f0035001347343432313031/0”,“data”:“true,358,223,261,200,200,160”,“ttl”:“60”,“published_at”:“2016-04-01T13:11:13.768Z”,“coreid”:“particle-internal”}

2 Likes

Hi @mterrill,

Thanks for reporting! I’ve been investigating this but it’s hard to reproduce the issue on my end since it only happens one in a few thousand hooks every few weeks. This is my top priority and I’ll be devoting next week to fixing this issue.

Thanks,
David

1 Like

@Dave, I have a hook on a Photon that worked well for about a month, and hasn’t worked even once since. I deleted and recreated the hook but it still never works. It fires only once per week at a specific time of day and nothing for about two months now. In the dashboard I see the publish coming from the Photon, but nothing from the hook. I’m about to give up on the hook and do a direct client access to my web server instead.

Thanks @Dave, appreciated!

Let me know if there is any further information I can provide on the webhook, I think you’ll be able to see from the webhook config what service (ie a direct REST call to firebase) its calling. I’m wondering if it may actually be a firebase thing…

Have you kind folk contacted Firebase to ensure you’re not being rate limited or blocked? Or visa versa that you’re not rating limiting my request or outbound?

Mark

1 Like

It may also be something firmware/cloud related, as you’d notice from the logs the difference between it simply logging the gettargetsv2 line (ie {“name”:“gettargetsv2”,“data”:“null”,“ttl”:“120”,“published_at”:“2016-04-01T13:10:43.818Z”,“coreid”:“3f0035001347343432313031”}) and when it actually sees that a hook was sent, and then the follow on hook response.

I saw a tweet that someone got firebase working, was that you? Did you get it working?

Thanks!
David

I’ve had firebase working for a year or so…
If you look at my webhook you’ll see how to

The issue is when the particle system drops your webhook, my comments earlier were about potential issues.
A) I assume you’re not rate limiting outbound - across all users this may manifest itself as a denied connection to me… Long shot.
B) wondered if you’d setup persistent outbound connections. These would age out eventually
C) it’s weird that the particle side of the webhook isn’t acknowledged. The convincing thing is delete and re-add and its fine

1 Like

Ah my mistake sorry, I thought this overlapped with a different question I was seeing. :slight_smile:

We do rate limit outbound ( https://docs.particle.io/guide/tools-and-features/webhooks/#limits ), but yeah otherwise I’m working on the issue you described. :slight_smile:

Thanks!
David

Hi,

I’d actually naively assumed firebase (and any client subdomain of firebaseio.com, i.e. Davesapp.firebaseio.com) would be whitelisted. Same with anything going to one of the AWS http gateway services.

That’s pretty important, are you able to confirm? IaaS / PaaS providers should simply be added to avoid heartache.

Either way, with my single unit it wouldn’t be hitting a rate limit, though would only need to sell 25 to do so

Ps, what was their issue? I could probably help

Limits should scale with device registration, but unique hostnames also get their own limit until it’s verified with us. We want to make sure the destination is okay with getting a high volume of traffic, but that limit starts at like, 120 / minute by default.

Not sure what their issue was, just saw a success tweet earlier today. :slight_smile:

Thanks!
David

If you don’t mind me saying, for PaaS services like AWS and firebase that approach doesn’t make sense. They should be whitelisted on a wildcard subdomain basis (or even string match for AWS).

They have default throttling services on their end and the scale to handle in their stride anything thrown at them.

There is a separate subdomain per client app on firebaseio.com. There are per versions instance subdomains per region for AWS. Validating the volume new unique host names of either on your side would be insane.

The flip side is you continue the current approach and client apps silently fail when they go to beyond trivial end devices, i.e. 30 devices posting temps every 15 seconds is enough to cause rate limiting.

Dug out my current CORS whitelist, which is a good start to showing some of the complexity, you may be interested in how its represented within Angular:

.config(['$sceDelegateProvider', function($sceDelegateProvider) {
        $sceDelegateProvider.resourceUrlWhitelist(
            [
                'self',
                'https://*.execute-api.us-west-2.amazonaws.com/**',
                'https://*.execute-api.us-east-1.amazonaws.com/**'

            ]);

    }])

Note, these are the two AWS HTTP gateway services that I’m aware of. I think you can only deploy to those regions currently, but give AWS a few more months and no doubt the clever kids will quietly release it to Australia, Japan, EU etc and you’ll be chasing your tail to keep up.

Anyways, if you don’t mind me saying, I really think you should compile a list of the common REST services (AWS HTTP gateway, S3, Azure and GCP’s equivalents, Firebase, etc etc) because they’re designed to receive this workload without worrying about your rate limiting - which no doubt was designed as a responsible netizen type measure so you weren’t blacklisted after a device went crazy. AWS etc are smart enough to rate limit on their side per client API.

Mark

1 Like