ISSUE: publish ACKnowledgments don't always work [was: Data lost between the electron and Particle Cloud OR not forwarded by webhooks?]

So, we have had a recurrent issue where our devices occasionally believe data has been sent and all’s good, but actually we can’t find any track of it anywhere (SUGGESTION: show more than the last 10 successes/fails in the console for a given webhook)

Here’s the part of the code that sends the data:

TransmitStatus Connector::particlePublish(char *event, uint8_t *data, int &dataLength) {
     TransmitStatus result = TRANSMIT_FAILED;
     _transmittedMessages++;
     if (!Particle.connected()) {
         Log.trace("Disconnected!");
         return TRANSMIT_FAILED;
     }
     result = Particle.publish(event, convertData(data, dataLength), 30, PRIVATE | WITH_ACK) != false ? TRANSMIT_OK : TRANSMIT_FAILED;
     Log.trace("publ. %dB: %u",dataLength,result);
     return result;
}

(if necessary for a better understanding, what convertData does is convert binary data to base64)

I read other topics about the fact the electron had no way to know if it was really connected or not… I’m wondering if it is possible that the publish function confirms sending even though the data is sent to a blackhole (AKA “UDP datagram”…)?

Any help/suggestion welcome.

This problem has been bothering us for a long time (as in close to 2 years) although we always tended to blame the code (and the coder, me), and implementing a confirmation is a bit problematic mostly because we want to stay connected as little as possible (a webhook callback could be a partial solution, confirming data reached Particle, but would that confirm it reached Google on the other side of the webhook?)

Side note, we sleep in-between the connections (to save battery, the devices are far from everything/everyone), and put everything to a rest, including the connection, so we fully reconnect each time.

Thanks,
Phil.

PS: we can’t seem to reproduce the fail on demand, and we have no access to the Serial to monitor failing devices (even less constantly)

Have you taken a look at this?


Been using it for a few months now on a life critical system and hasn’t missed a beat yet.

Thanks I saw that thread/library. I was wondering though if it works when device is put in sleep mode after each connection…

We’re also very close to the max code size (have had to cleanup code a few times already to fit things)

Ok, so little add-on.

Firstly, we use SYSTEM_THREAD(ENABLED) and SYSTEM_MODE(MANUAL)
Secondly, I just had the confirmation that even when checking we’re connected, I get an ACK and the data isn’t triggering a webhook call in Particle.

So, EITHER the webhooks have a malfunction at Particle, OR the data never reaches particle.

I suppose our only alternative/solution here is to implement the webhook response for each packet sent and check we get that return before moving forward… which means wasting the round-time from the electron to particle to google cloud and back…

Just the proof:

0000040944 [system] WARN: Resetting WLAN due to WLAN_WD_TO()
0000044233 [system] INFO: Sim Ready
0000044233 [system] INFO: ARM_WLAN_WD 1
0000046328 [system] INFO: ARM_WLAN_WD 2
0000046328 [system] INFO: CLR_WLAN_WD 1, DHCP success
0000046330 [system] INFO: Cloud: connecting
0000046336 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0000046338 [system] INFO: Loaded cloud server address and port from session data
0000046396 [system] INFO: Cloud socket connected
0000046398 [system] INFO: Starting handshake: presense_announce=0
0000046424 [comm.dtls] WARN: session has 0 uses
0000046432 [comm.dtls] WARN: skipping hello message
0000046592 [system] INFO: Send spark/device/last_reset event
0000047036 [system] INFO: Send subscriptions
0000047626 [system] INFO: Cloud connected
0000047829 [app] INFO: Connected 3G.
0000047832 [app] INFO: Saved Settings
0000047855 [app] INFO: Step #1 OK
0000047856 [app] INFO: Step #2 OK
0000048691 [app] TRACE: publ. 55B: 0
0000048691 [app] INFO: Step #3 OK
0000048713 [app] INFO: deviceId: H10013, 10013
0000049116 [app] TRACE: publ. 37B: 0
0000049116 [app] INFO: sensor #1 info sent
0000049117 [app] INFO: Step #4 OK
0000052120 [app] INFO: deviceId: H10013, 10013
0000052121 [app] TRACE: Troll data: pos=0, sent=81, sample=86
0000052121 [app] TRACE: position=81
0000052122 [app] INFO: 1/81 on SD
0000052128 [app] TRACE: position=82
0000052129 [app] INFO: 1/82 on SD
0000052135 [app] TRACE: position=83
0000052135 [app] INFO: 1/83 on SD
0000052141 [app] TRACE: position=84
0000052142 [app] INFO: 1/84 on SD
0000052148 [app] TRACE: position=85
0000052148 [app] INFO: 1/85 on SD
0000054269 [app] TRACE: publ. 82B: 0
0000054269 [app] INFO: sensor #1 data sent (old= 0, new=5)
0000054273 [app] INFO: Saved Settings
0000054313 [app] INFO: deviceId: H10013, 10013
0000054314 [app] TRACE: Troll data: pos=0, sent=86, sample=86
0000054314 [app] INFO: Step #5 OK
0000054316 [app] INFO: Transm. ended
0000054336 [app] INFO: Saved Settings
0000054339 [app] INFO: Saved Settings
0000054340 [app] INFO: Step #6 OK
0000054341 [app] INFO: End transmission
0000059342 [app] INFO: Disconnecting...
0000059405 [system] INFO: Cloud: disconnecting
0000059671 [system] INFO: Cloud: disconnected
0000059672 [app] INFO: Shutting off cellular
0000064305 [app] INFO: End transmission

this is from the log. I should have received a callback (response) from my webhook after the first 55B sent (possibly a few seconds later) as in this session (see config parameters received):

0000041450 [system] WARN: Resetting WLAN due to WLAN_WD_TO()
0000044714 [system] INFO: Sim Ready
0000044714 [system] INFO: ARM_WLAN_WD 1
0000046830 [system] INFO: ARM_WLAN_WD 2
0000046830 [system] INFO: CLR_WLAN_WD 1, DHCP success
0000046832 [system] INFO: Cloud: connecting
0000046838 [system] INFO: Read Server Address = type:1,domain:$id.udp.particle.io
0000046840 [system] INFO: Loaded cloud server address and port from session data
0000046898 [system] INFO: Cloud socket connected
0000046900 [system] INFO: Starting handshake: presense_announce=0
0000046924 [comm.dtls] WARN: session has 0 uses
0000047433 [system] INFO: Send spark/device/last_reset event
0000047862 [system] INFO: Send subscriptions
0000048636 [system] INFO: Cloud connected
0000048659 [app] INFO: Connected 3G.
0000048662 [app] INFO: Saved Settings
0000048663 [app] INFO: Step #1 OK
0000048664 [app] INFO: Step #2 OK
0000049844 [app] TRACE: publ. 55B: 0
0000049844 [app] INFO: Step #3 OK
0000049877 [app] INFO: deviceId: H10013, 10013
0000050209 [app] INFO: - config data: [!330047000f51363034323832]
0000050210 [app] INFO: - config data: [i:H10013]
0000050210 [app] INFO: ID=H10013
0000050211 [app] INFO: - config data: [o=i4]
0000050211 [app] INFO: - config data: [gl=f43.45782399999999]
0000050212 [app] INFO: - config data: [gL=f-80.513496]
0000050213 [app] INFO: - config data: [ga=f1273]
0000050216 [app] INFO: Saved Settings
0000050717 [app] TRACE: publ. 37B: 0
0000050717 [app] INFO: sensor #1 info sent
0000050718 [app] INFO: Step #4 OK
0000051105 [app] INFO: - config data: [!387443]
0000051108 [app] INFO: Saved Settings
0000053725 [app] INFO: deviceId: H10013, 10013
0000053725 [app] TRACE: Troll data: pos=0, sent=80, sample=81
0000053726 [app] TRACE: position=80
0000053727 [app] INFO: 1/80 on SD
0000056245 [app] TRACE: publ. 26B: 0
0000056245 [app] INFO: sensor #1 data sent (old= 0, new=1)
0000056249 [app] INFO: Saved Settings
0000056285 [app] INFO: deviceId: H10013, 10013
0000056285 [app] TRACE: Troll data: pos=0, sent=81, sample=81
0000056286 [app] INFO: Step #5 OK
0000056287 [app] INFO: Transm. ended
0000056308 [app] INFO: Saved Settings
0000056311 [app] INFO: Saved Settings
0000056312 [app] INFO: Step #6 OK
0000056313 [app] INFO: End transmission

Side note, I tried the syntax Particle.publish(event,data, PRIVATE, WITH_ACK) to ascertain that there was not a mishandling of the flags, but it didn’t make any difference.

PS: I didn’t paste a screen copy of the last 10 webhooks , but I checked and the calls were NOT there

Very early on in my particle journey - I figured out that the publish element was not for the faint hearted nor novice (in my case) - I used the PublishQueue Async library and have had no missing publish events (from Photons / Xenons / Argons) and I publish a lot of diagnostic information during start-up and operation. The library makes use of retained memory and so the events will queue even through a reset (haven’t tried sleep yet).

You also don’t then need to check for Particle.connected() as the library sorts that part out quite well.

I’m not sure if this will help us, and I’m really worried about the space used (our code is right under 130K). Also we only keep the electron online once per hour, for a very short period of time, mostly for the time to publish our data (from 3 to 6 messages in general), and then it goes to sleep, so I’m not sure how the async publish queue would work if we don’t give it time to send… Also we need to get an immediate reply with possible config changes and other commands for the device, so it HAS to be synchronous.

But thanks for sharing, anyway.

Update: I implemented a way to validate a response from the webhook, and de-duplication on arrivals since sometimes publish returns false, but still sends the data… I think whatever particle returns, accurate or not, we know ascertain we get a confirmation from our google cloud functions that the data was added to our batch processing queue.

Still a lot of bandwidth wasted on bad connections…

Mmmm it does seem problematic. I know you have a big code build, have you considered adding MQTT as a transport - this is very predictable as a mechanism - albeit that it introduces a different data path. I use this a dual strateg,y so I have the particle cloud for device management and the MQTT stream for data. This way if either goes down I don’t loose total functionality (there are backup particle functions for data query and the device can work without particle cloud for a while if needed) … but you do need the code space …

Shane,

The last time I checked (2 years ago more or less), MQTT was not an option on Particle devices because there’s no encryption library implemented, so you have to add the whole package, which probably uses much of the available memory.

Honestly, adding any new library is not an option anymore on our devices.

Thanks.

Does the PublishQueue Async library work for Argons? I thought they did not have retained memory that the library requires. Thank you.

@peergum - I’d like to take a crack at this, would you mind PMing me a DeviceID/timestamp pairing or two? I’d like to see what I can glean internally from an instance of of this misfire. Thanks!