Particle.subscribe across Cell service interruptions

We are using Electrons in an application where they often go out of Cell phone coverage. In normal operation, the application publishes data every 2 minutes and receives an event in response to each publish.

Firmware version 0.6.0
Firmware uses SYSTEM_THREAD(ENABLED)
In the setup, there is a Particle.subscribe(….) //to catch the events
In the Loop, the publish code looks like this

QueueData //Place data in the send Queue
if(Particle.connected)
{
if(Queue != empty)
{
Particle.publish (……….)
}
}

As far as we can tell this all works but we have some questions:

A. Does the Particle.subscribe survive forever across Cell disconnects and connects?
B. Is there a timeout in the Cloud that destroys the Particle.subscribe object when there is no cell connection for some period of time.
C. Can the same Particle.subscribe be executed more than once without causing confusion. e.g before every Particle.publish.
D. What are the costs of executing a Particle.subscribe? Is loop execution suspended until it completes.

Comments?

During one test, where one Electron was out of Cell range for 30+ minutes, we saw the following after Cell connection recovered.

  1. The Particle.publish worked as expected - the data was received by the host. The published data contains debug data around what is happening in the electron - state of the Electron’s state machine.
  2. Events sent by the host were not received by the Electron for about 40 minutes. They just started to showing up.
  3. After that time the events continued to be received for the remainder of the test – another 5 hours.

A second Electron sitting next to the first worked as expected – when cell connection recovered both Particle.publish and the corresponding events start to work at the same time.

Any ideas / suggestions?

Since the cloud doesn't actually know whether or not a particular Electron is currently online or not, yes. But since the Electron might realise it fell of the network and needs to reconnect, it would "refresh" the subscription on reconnect.

Not that I know.

That depends on your payload. Since the order of "delivery" of events and the resulting subscription callback is non-deterministic (e.g. due to packet routing) you need to have some means in the payload to correlate event and subscription.

First part not easily answered (e.g. due to possible retries).
For the second part, not really suspended but - IIRC - Particle.function() & Particle.subscribe() callbacks are services in a synchronous manner (between iterations of loop() or when Particle.process() is called) - each time one callback at a time.

But in regards to your observations, maybe @rickkas7 has some additional input.

Thank you. I assumed the Particle.subscribe automatically recovered after an interrupting of Cell service.

The published packets all have a serial numbers and the corresponding event has the same serial number. This resolves any ordering issues in the packets arriving at the host. The serial number are also used to recover any packet that are "lost". It is rare that publish data does not reach the host. The reply events are more likely to get "lost" but the recovery process make sure it all comes out in the wash and no packets are lost.

I have experience another case where the data from a “publish” is received by the host but the corresponding events are sent by the host but not received by the assettracker = each publish contains a serial number and corresponding event contains the same serial number so they can be matched up.
The test ran successfully for some 15 hours - publish / event pair happening as expected
Then for 11 hours - “publish” data was received by the host but the corresponding event were not received by the assettracker.
If the event is not received within 15 seconds after the publish, the data is re-published up to 5 times.
The host throws the duplicate data away - no data is lost but cell bandwidth is used up.
For the next 34 hours (test was stopped) - publish / event pair works as expected.
For the duration of the test the device was sitting on a desk beside a glass door and never moved with 5V always applied to VIN.

This is the second time that I have observed this behavior - on two different asset trackers - during 100’s of hours of testing in all type of conditions - all using assettrackers and firmware 0.60

Questions:
Any ideas what is causing this behavior?
Any ideas how to force a recovery? Would a reset / restart solve it?

Thank you