Webhook response subscription lost after reset on 2.0.1

Thank you for taking a look.

Yes, just to make sure the response subscription should be in place before triggering it with the publish, to rule that out.

The integration log tells me

  1. when the publish of “upload” triggering the web hook has fired
  2. if/when the device received a web hook response that triggered a publish of “received”
  3. before the reset, the device receive a response triggering the publish of “received”
  4. after the reset, the device no longer receive a web hook response multiple times in a row

It unfortunately is quite a bit less than what is needed for debugging.

I have some SSE logging below parallel to the integration log above.

The SSE log tells me:

  1. To me the subscribe to the response looks correct
  2. To me the web hooks contain/acts the same before and after the reset
  3. The only difference is that after the reset, the device no longer publish “received”, most likely because it no longer receive the web hook response to trigger that
2021-02-12T14:49:22.199 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:49:22.493 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:49:22.493 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:49:22.689 +00:00	e00fce68ba235059f4a1588f log Received
2021-02-12T14:49:42.196 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:49:42.307 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:49:42.359 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:49:42.602 +00:00	e00fce68ba235059f4a1588f log Received
2021-02-12T14:50:02.274 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:50:02.363 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:50:02.421 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:50:02.661 +00:00	e00fce68ba235059f4a1588f log Received
2021-02-12T14:50:22.289 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:50:22.312 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:50:22.393 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:50:22.676 +00:00	e00fce68ba235059f4a1588f log Received
2021-02-12T14:50:31.799 +00:00	e00fce68ba235059f4a1588f spark/status offline
2021-02-12T14:50:31.834 +00:00	e00fce68ba235059f4a1588f spark/status online
2021-02-12T14:50:31.840 +00:00	e00fce68ba235059f4a1588f particle/device/updates/enabled true
2021-02-12T14:50:31.840 +00:00	e00fce68ba235059f4a1588f particle/device/updates/forced false
2021-02-12T14:50:38.693 +00:00	e00fce68ba235059f4a1588f particle/device/updates/pending false
2021-02-12T14:50:48.222 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:50:48.344 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:50:48.395 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:51:08.228 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:51:08.290 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:51:08.338 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V
2021-02-12T14:51:28.222 +00:00	e00fce68ba235059f4a1588f upload "data":"{\"T\":\"2021-02-02T12:51:03Z\",\"R\":-72,\"I\":60,\"D\":\"{}\"}
2021-02-12T14:51:28.296 +00:00	particle-internal        hook-sent/upload 
2021-02-12T14:51:28.364 +00:00	particle-internal        e00fce68ba235059f4a1588f/hook-response/upload/0 V

Is it normal for the integration log to not show web hook responses? If so, how can you debug web hooks without an SSE log?

@marekparticle

Same behaviour on a B-SOM dev board (powered with 12V 2.1A and 1200mAH LiPo to be sure).

OTA the above firmware and let it start up, remove power for a minute, after it is online let it send and receive 3 times, provoke RST, and afterwards it no longer receives. A serial trace confirms this.

Doing it a number of times, it does occasionally keep working, but too rare.

The web hook response from our server confirms uploads, before deleting data on the device, to mitigate historic Particle cloud outages being a showstopper for the target customer base.

I have not found a work-around, so the deployed field test devices have been downgraded to OS1.5.2.

For support:

Expected behaviour: The device both before and after a particle.function “RST” induced reset, is expected to publish log "receive " as proof of receiving a webhook response after publising “upload” every 20s.

Actual behaviour: after a particle.function “RST” induced reset, the device no longer publish log "receive " as proof of receiving a webhook response after publising “upload” every 20s.

Delay(1) significance: the expected and actual behaviour happens with or without delay(1). delay(1) is used as a code change so a SW download will “hard reset” the device and restore expected behaviour. Power cycling the device for a minute will also restore expected behavior.

I just set up an Argon, with nothing connected but power, in a Console product with the above webhook and code on OS2.0.1, and it fails in the same way. [edit: changed D22 to D6 in the code]

Loosing web hook reception was originally discovered happening in field test units without a reset. And remotely inducing a system.reset() did not restore it, but uploading slightly modified code did restore web hook reception.

Thanks for this - I’ll make sure that our team sees it!

After 10 days on 1.5.2, I have not yet seen any devices loosing webhook response reception.

I finally managed to replicate this behaviour on 2.0.1.
It does NOT happen on 1.5.2.

Taking this to the DeviceOS team.

1 Like

So far 1.5.2 is not a fix for us.

Our implementation of the excellent PublishQueueAsyncRK lib with i2c connected FRAM (MB85RC256V-FRAM-RK lib) is unreliable on 1.5.2 on our field test units.

Together with web hook response confirms from our server on uploads, we use this to mitigate historic Particle cloud outages being a showstopper for the target customer base.

Are there known errors on 1.5.2 related to i2c or known system memory leaks on 1.5.2?

As any code change restores the webhook reponse reception on OS 2.0.1, does anyone know if there a way to induce a reset from user code that has the same effect as a code change (OTA SW download)?

Not sure if this will help in the given case, but we had similar experiences in the past and a temporary workaround was to considerably change the “signature” of exposed functions/variables and subscriptions by adding a dummy item with an ever mutating name.

This way any previously stored (and possibly invalid) connection context would be ignored and a new context would be created.
This would come at the expense of extra “negotiation data” needing to be exchanged between device and cloud - which may be one of the reasons for introducing some of these changes (with potential for being overly protective :wink: ).

1 Like

Thanks, a good idea - that should fix it.

But the example below had no positive effect.

Added this before setup() :

static bool dummyer = true;

And before particle.function … I added this:

Particle.variable(String(random(10, 1000)), dummyer); 

Still learning. In console, the log under device view I used above is limited, whereas the top level event log includes webhook responses.

Engineering have identified a race condition in very rare circumstances and it’s currently being addressed. It should be pushed out pretty soon.

Great - it is needed. Loosing web hook response has also been seen happening on a device with 1.5.2 today. So we are out of options, as OS versions before 1.5.2 makes little sense with B523.

The fix is live and if your device reconnects it should be pointed to the primary server.
(Power cycle is best)
If you don’t clear the session manually it will expire once the 72hr timer runs out.

1 Like

Thanks :+1:

(I currently clear the session at every startup from device side).

Confirmed to be working for days now on B523, so we continue to get upload confirmation back from our servers before deleting uploads :+1:

1 Like

hmm … new problem introduced instead. When the webhook can not find the recipient internet address, the webhook sends OK web response back to the device that dumps the upload and data are lost.

Previously an unfound webhook address would provide a webhook error response, and data was saved until a confirm was received from our servers at a later attempt …