[Fixed] Webhooks callbacks are not reliable!

Hey all!

So, there is an old bug in the core firmware where the subscribe request isn’t sent after the session is dropped and reconnected. If your connection is dropping frequently, this can cause your subscribes to disconnect. I believe this is fixed in the newer photon firmware which will be available for cores soon as well. I suspect that is what’s causing the frequent subscribe failures you’re seeing above.

Thanks,
David

@Dave, ok but this does not address the issue of webhooks not firing when they are called from CLI and using the dashboard to monitor the response. You did mention above that you may know what’s up.

1 Like

Heya!

Totally, I suspect that’s a separate issue. My guess is that if the very first event / response doesn’t come through, but subsequent requests / responses do come through normally, that could be a side effect of the internal messaging that hooks use. The workaround would be to test run, or run it again if a response isn’t heard within some timeout. I’m going to test that more fully and build a fix for it if that’s an issue, but it might take me a sprint or two to fix.

Thanks!
David

@Dave, I can’t vouch for the first event which often doesn’t fire but mostly for subsequent events which often don’t fire for several publish events. Mind you, I have not tested this on the latest firmware on my Core-based RGBPongClock.

@BulldogLowell,

any code for me to test? Webhooks is now my new toy :smiley:

I have some magic :eyeglasses: that allows me to see what might be the issue :smile:

2 Likes

My problem is that it counts as a hit every time I make my external API request so I’d rather not have to try multiple times to get a parsed response :wink: alas, I do not have magic glasses. Let me know if you need any other info from my setup.

2 Likes

Would anyone know if it is possible that I am being rate limited (e.g. Weather Underground) because of Particle’s servers making calls to an API?

In other words, is it possible that the Weather Underground API is seeing the aggregated volume of all of our web-hooks as if they are from a single IP address?

I’m not getting consistent returns and it is troublesome.

When I make the calls from my Chrome browser… they are returned consistently. Using particle publish not so much.

Following the docs, that might indeed be possible as that API seems to be fairly popular. Perhaps @dave can elaborate?

3 Likes

Well, that may explain my problem, then.

Thanks Jordy, I missed that one… :weary:

1 Like

Hi @BulldogLowell,

I checked the logs and I’m not seeing host specific rate limiting for the weather underground, we have a default upper-bound for any host, to make sure we’re not being too aggressive. I wonder if you’re hitting your max number of webhooks per minute? (6 per minute per device), or maybe just publishing too fast? (try to average no more than 1/second). Please feel free to PM me with any details and I’m happy to look into it more deeply for you. :slight_smile:

Thanks,
David

Hi Everybody!

This has been driving me mad, so I’m very happy to report I just fixed an issue we discovered that might have been making the webhook responses less reliable. Essentially there was a configuration issue on one of the connections in the flow, and it was causing problems that were hard to detect. Can you give it a try and let me know if it’s better / worse / etc for you?

Thanks again for everyone’s help troubleshooting this, and for your patience, I’m really excited that we might be putting this particular issue to rest. :slight_smile:

Thanks!
David

5 Likes

Hi David,

Thank you for your perseverance. I started testing again last night and will continue tonight to check that we have the reliability sorted out.

Fingers crossed,

1 Like

Hi David,

I have persevered with my tests and have started seeing dropped responses. It does seem to be slightly better but it is far from reliable.

My setup is as follows:

void setup() {

    //  particle serial monitor
    Serial.begin(115200);

    //  subscribe to webhooks
    bool subscribed = Spark.subscribe("hook-response/io_", ioBridgeCommand, MY_DEVICES);
    if (!subscribed)
         Serial.println("subscription failed for iobridge");

    // and wait at least 10 seconds to allow time to connect
    delay(10000);

}

void loop() {

    Serial.println("Requesting salon temp");

    // publish the event that will trigger our first webhook
    Spark.publish("io_temp_int");

    // and wait at least 60 seconds before continuing
    delay(60000);

    // publish the following 4 events in the same manner
    Spark.publish("io_heat_on");
    delay(60000);
    Spark.publish("io_pump_on");
    delay(60000);
    Spark.publish("io_heat_off");
    delay(60000);
    Spark.publish("io_pump_off");
    delay(60000);


}

// simple response test
void ioBridgeCommand(const char *name, const char *data) {

    Serial.println("ioBRidge command response");

    String strName = String(name);
    String strData = String(data);
    Serial.println(strName);
    Serial.println(strData);

}

And the responses are:
5 replies
5 replies
3 replies
4 replies
3 replies
1 reply
3 replies
4 replies

I believe that the servers being called are reliable, I never have a problem with curl. I can say that there is some improvement though because the fact that we can receive only 1 reply and then 3 or 4 replies is better than before. My observations before were that once a webhook died it stayed dead !

I’m sorry it’s not 100% reliable but I must underline that I have found a satisfactory workaround for my setup so am no longer relying on this fix.

Hi @mayhew1955,

Thanks for posting your test results! This is very helpful, I’ll continue to work on this. As a general note though, checking for a response from your server in the subscription should allow your code / device to be confident the server was reached. I think ultimately in any reliable system there needs to be positive checks / confirmation all the way through the stack. :slight_smile:

Thanks!
David

I am still not getting responses. Did we have any movement on that?
What would you recommend? Waiting a few seconds for a response and if not, then re-issuing a call?
Is there a simple example for how you'd do this?

2 Likes

I've been struggling with this for some time too. I have a system that shuts down my Photon after processing a hook response, then wakes up 10 minutes later and attempts to get another hook response. I have had to resort to re-sending events every 15 seconds or so until I get a response. I don't have a clean dashboard screenshot of this, but I don't see responses there either most of the time. I'd say I get 1 in 3.

EDIT: After some more thought, I guess the real problem with this is that, with my electron arriving in a few months, sending an extra hook request or two, or three or FIVE is a near-showstopper for a system that is on a limited data budget. My entire architecture relies on this mechanism. I have no doubt that the team can find a better solution than we have now, I just hope that the issue gets the attention it needs.

2 Likes

Hi @ruben and @trackdork,

Definitely, especially in a bandwidth constrained environment, messages must be as reliable as possible. I’ll continue to dig into this until it’s resolved.

Thanks!
David

1 Like

Thanks @Dave, I appreciate the hard work. It’s a challenge no doubt when significant parts of the overall system are out of your control. Let the community know if there’s anything we can do to help!

Just a quick report that I finally put in a loop to keep calling the webhook unless there is a response. Bounces between 2 and 3 calls to get a response. I am fairly sure that wunderground is getting hit with requests even when I am not getting responses in reasonable time (10-25 sec) but I’ll double check tomorrow.

1 Like

Hello folks,

I’m having the same issue. My workaround that has worked well so far, is to call the webhook every 15 seconds until I get a response.

It takes anywhere from 1 to 5 calls to the webhook before I get a response. Having said that, I always get a response in a 3 minutes window.

@Dave, I have detail logs with time stamps. I will PM them to you. Hope this helps in your investigation.