Publish & subscribe semantics & documentation

I want to write a publish-subscribe app involving several cores. There isn’t enough documentation (yet?) to know how much work the application programmer must do.

Is publish-subscribe reliable? If the publish call does not return an error am I sure the publish has happened? If I subscribe am I guaranteed to see all the events or am I only likely to see all events? Will I see all the events in the correct order? Will I see duplicates? Or do I have to write my own ACK-NAK did-I-miss-anything protocol on top?

Must I already be subscribed to see a published event? What is the TTL value used for? If I set TTL to 10mins can I subscribe 9 minutes later and see the event?

Bump. Reliable or not? Are the failure semantics at-most-once or at-least-once?

Although I don’t think we’ve put the pub/sub through any rigorous testing in the field, I believe it should be reliable at this time. @Dave would probably be able to provide a more official answer, but the design of the system should guarantee that you see all events, assuming that both the publish and subscribe are successful, and that the communications with the Cloud are not interrupted.

Events might not be transmitted in the correct order (i.e. order is not intentionally preserved), but the system processes messages very quickly, so in real-world circumstances you might be able to assume that order is preserved; hasn’t been tested under any extreme use cases.

You will not see duplicates.

You must already subscribed to see a published events. In the future, the TTL will be used to allow you to get data that was recently published; this will happen through a different mechanism than subscribe.

The current messaging system is quite reliable, since we use TCP, and messages are timestamped as they arrive, you should have reasonable assurance of order and fidelity. As a developer, you should set TTL appropriately although it is reserved for future use. Published events are throttled as they are sent from the core, and limited to a burst of 4 per second, on an average of 60 per minute, and will be dropped if you exceed that limit. Currently we don’t rate limit subscribers, but we will probably limit that to something sane like ~1000 a minute or so to avoid huge floods. Generally if you’re subscribing to your own event feed you should get everything, but you may not get all events if you’re watching the firehose.

Our initial use of TTL / events will probably be a “what was the last unexpired thing I published using this event name.” So you could subscribe 9 minutes after a 10 min TTL, and get the most recent message only kind of thing.

Down the road we may support stream resuming (send me messages since TIMESTAMP / ID), and caching, but we haven’t explored that yet, and the best way to receive an event will probably be a webhook, since that is internal / always on. There are some resource implications there, so it might take some time to find a solution that works for most of the community. :slight_smile:

Edit: In general, we’ve seen very good message fidelity within the cloud and between cores with good connections, and we’re adding services to fill in the gaps when your core is offline / sleeping.

I hope that helps!

Thanks,
David

1 Like

I like what I read, and in a day or two I’ll try and fold the non-speculative stuff into the docs, if someone hasn’t done so before me.