We are using Particle cloud webhook to exchange data between devices (we use p1) and our backend server. Through Particle cloud console, I could check the data in the 'events" area. Yesterday I noticed some problem on the data being send back from the backend. Looking closely in “events” window, I was surprised to see some problems.
The data we sent from the backend is pretty long, over 2K bytes. I understand there is a max packet size about 512 bytes. So the whole data was separated into 4 packets. Which makes sense.
What I received on the device side was in wrong order (although the device received all data). This is confirmed by the log messages.
I then checked Particle cloud “events” console. And surprisingly I noticed that the data was indeed sent in wrong order. I am attaching 4 pictures here. You may notice that each packet has a packet number from 0 to 3, but in wrong order.
Have any of you noticed such a problem? How would you resolve this issue?
This is true. Well, it’s always been true, but it’s more likely to occur now.
Before, it only happened when lost packets were retransmitted.
Now, because the infrastructure is more distributed and redundant, there’s a chance that different chunks of a multi-part webhook response will go through different servers, which can cause them to arrive out-of-order.
The chunks (except maybe the last) will be 512 bytes, so you don’t have to worry about both variable-size and out-of-order, but you should still code for out-of-order.
This will be the behavior from now on, so it’s best to code defensively for it.
Thank you for the response and clarification. Now I understand better of the infrastructure.
What’s your suggestion on the best practice to handle this situation? I could try to buffer necessary data on device side, and then re-assemble the data based on packet number. Several questions came to my mind:
How would I know I got all the data? The last packet is typically less than 512 bytes, but I need to cover the case the last packet does have 512 bytes.
I guess I need to build a more robust protocol on top of Particle API calls. With my own protocol, I can specify data length and maybe even checksum. Do you think this is necessary?
BTW, I am curious. the OTA process handles lots of more data packets. Does the OTA process face the similar challenge? How dioes OTA handle it?
Again, really appreciate your inside knowledge. Very helpful.
When your protocol allows for partial parsing of your data it would be the best to interpret as much of the data contained in one chunk and then only keep the “fractional” entities from the beginning and the end of the chunk.
If you need the entire transmission in order to successfully parse the data you may need to provide a big enough buffer to hold the maximum length response where you should place each received chunk at an x * 512 boundary inside that buffer where x is the chunk index transmitted in the event name.
To cater for the rare case that the final chunk exactly is 512 bytes long you may want to have an end marker in your response template and check the end of each chunk for that, if it’s not there it isn’t the final chunk.
I updated the JsonParserGeneratorRK library to add a new method to handle adding data from a subscription handler
0.1.4 (2020-12-23)
Added addChunkedData() method to support subscribing to multi-part webhook response events.
This simplifies the subscription handler as well, which now looks like this:
void subscriptionHandler(const char *event, const char *data) {
jsonParser.addChunkedData(event, data);
if (jsonParser.parse()) {
// Looks valid (we received all parts)
// This printing thing is just for testing purposes, you should use the commands to
// process data and extract the parts you need
printJson(jsonParser);
// After parsing be sure to clear the data so the next set of responses will start
// fresh with no data saved.
jsonParser.clear();
}
}
I guess I need to build a more robust protocol on top of Particle API calls. With my own protocol, I can specify data length and maybe even checksum. Do you think this is necessary?
I wouldn't invest too much effort in making your own solution on top of publish and subscribe, as we'll likely be implementing a general-purpose solution in 2021.
OTA is a completely different thing because it's implemented at a lower level, inside the cloud and on the CoAP layer, not on top of events.
If you want to see an example of doing a reliable transfer over events, see the TrackerCamera project. It's in the other direction (device-to-cloud) but the protocol would work similarly in the other direction. It handles rate limiting and binary encoding of data. It also has a scheme where it uploads all of the chunks 1 per second using NO_ACK, then the server requests any that it missed until the whole file is received.
In file included from ./inc/Arduino.h:27,
from <project>/lib/GxEPD2_PP/src/GxEPD2.h:15,
from <project>/lib/GxEPD2_PP/src/GxEPD2_PP.h:1,
c:\users<username>.particle\toolchains\gcc-arm\9.2.1\arm-none-eabi\include\c++\9.2.1\bits\stl_bvector.h: In member function 'std::vector<bool, _Alloc>::size_type std::vector<bool, _Alloc>::_M_check_len(std::vector<bool, _Alloc>::size_type, const char*) const':
../wiring/inc/spark_wiring_arduino_constants.h:111:18: error: expected unqualified-id before '(' token
111 | #define max(a,b) ((a)>(b)?(a):(b))
|
The easiest workaround is after the header file that includes Arduino.h, do a
#undef max
The problem is that Arduino defines max as a C/C++ macro, which is just a pain because you can’t have a C++ method named “max” with different types after doing that.
Some of our data passed from our server could be 5K. If I need to get all data into P1 buffer, then reassemble them in the right order, it’s going to eats up 5K precious RAM space. How would you guys handle this?
It seems to me this Particle cloud problem is getting worse. Before I could get around 50% success rate to get a long data (without doing reassembling algo on device). In the recent two weeks, I never got a time to get a long data successfully. This means all our clients won’t be able to load user menu and icons.
During the development, I noticed some packets contain data less than 512 bytes. And these packets (I am pretty sure) are not the last packet. I don’t have a record to show Particle cloud, I noticed this problem on device debug log.
Yes, you need to buffer all of the data for a multi-part webhook response somewhere. Typically you store it in RAM. It could be flash memory on Gen 3 or the P1, or some sort of external storage (FRAM, flash, etc.).
The packets are much more likely to arrive out-of-order now and this will not change. The reason is that out-of-order packets previously occurred when retransmission occurred. Now, the data to a specific devices no longer goes through a central point. It’s distributed across multiple servers. This provides redundancy and eliminates a single point of failure, but also means there is no guarantee the events will arrive in order. But there was no guarantee they would arrive in order before, either. They just by chance mostly did.
All of the parts of a multi-part response will be 512 bytes. Thus you don’t need to account for both out-of-order and random size, which is harder to deal with, especially with flash memory storage.
There is no end-of-transmission indicator. In some cases, you can use the smaller than 512 byte response as the end marker, unless your data is actually a multiple of 512 bytes. Some protocols like JSON can determine the end when the data is parseable, as only a full transmission will pass the test.
Thank you for the suggestion. We are sending Icon raw data across, not luck as JSON that can detect the end packet. I guess we have to put an end marker there.
I will try to use P1 external flash to buffer the data.