Particle Cloud to Electron communication breaks down when network is slow

We have a few dozen Electrons operating on a 3rd party (Zantel) network off the coast of Tanzania.

They are all running version 0.6.1 of firmware

The Zantel network (and all other networks for that matter) are heavily overburdened in these areas. This results in moderate to severe network latency issues.

It would seem that when the latency is on the more severe end of the spectrum, the Particle Cloud API stops working properly with our fleet.

I have posted about this issue in another thread, but thought that this phenomenon should exist in its own dedicated thread.

Particle.function() calls from cloud to Electron "failing" (but not actually failing)

_
One example of how latency breaks the Particle to Electron communication system is when calling a Particle.function() that has been exposed to the cloud. In my app, I have a cloud function which when called sets a flag that queues a Particle.publish(). This means that I know if a cloud function call is successful because I will see a subsequent Particle.publish() that gets triggered and sent to the Particle servers.

Often (~10% of the time), when I call that function, the Particle console responds (after about 10 seconds or so) saying that the device could not be reached. Then (5-10 seconds later), my device will publish the triggered publish, which means that in reality the function call did go through.

Since my app needs to be able to tell if a function call was successfully executed, and since the Particle API will report that the function did not successfully execute (even though it did), this means that I cannot rely on the Particle API to report whether a function call went through successfully. This means that I have had to program an ACK publish protocol that acknowledges Particle.function() calls. This means a bigger program size (which is a big deal since my program size has already reached its maximum size) and also means more cellular data to support my app using Particle.function() calls.

I think the solution here is to make the Particle Cloud API wait for longer before giving up on a function call. I would recommend 20-30 seconds minimum based on the latency I have been observing.

Is this Particle.function() timeout deficiency something that can be changed for my product at the product level? If not, is this something the the Particle development team is going to address at a system-wide level?
_

Particle.variable get requests "failing"

_
Similar to the problem with calling Particle.function() timing out, many times (~50% of the time) when I try to query the value of a Particle.variable() on one of my devices, the result is an error message saying the device can't be reached. I believe that this is the same problem of the Particle.variable() GET request timing out. I have not been able to gather some Serial debug logs proving that the variable get request made it to the device since I am not in Tanzania at the moment. I will try to do so.

Is this Particle.variable() timeout deficiency something that can be changed for my product at the product level? If not, is this something the the Particle development team is going to address at a system-wide level?
_

OTA updates failing due to network latency

(See link to other thread, above)

@zachary

2 Likes

Let me ping someone that might be able to help, @rickkas7 or @blave

are you able to assist?

I’m actually really happy you mentioned this. We often theorize about the possibility of slow connections to devices when trying to optimize for API responsiveness, but we have almost never had a real use case to point to. Thank you!

FYI @ctarwater per our recent conversation on long blocking API calls.

@jaza_tom we don’t currently have the ability to override these timeouts for particular products or devices, but that’s a great feature request. cc @jeiden @jberi

For now, your pub-sub workaround is the way to go. We’ll try to get this on the roadmap in 2018. As far as I can imagine it will be a cloud-only feature, not dependent on a firmware version. Likely something you adjust in the console or for a first MVP just with an API call.

3 Likes

Awesome!

I’ll post back here if/when I catch my development device exhibiting the function call success/failure phenomenon and am able to invoke the test function. That should tell me if the timeout is due to my function handler or if it is purely a network latency phenomenon.