Pub/sub losing event between 2 cores

RobertM · March 3, 2015, 12:01am

I have two cores, one called A, sub’d to ping_1 and the other called B and sub’d to ping_2. Core A publishes ping_2 to send triggers to Core B and B publishes ping_1 to trigger Core A.

What I see happen is that when the cores start I can notice both cores connect to WiFi successfully, then connect to the Spark cloud successfully. At this moment they can send triggers to the other and everything works. If I wait 30 minutes and try to send a trigger again from either it never arrives. Now I have verified that the publishing core IS connected to the Spark could when the call to publish is made, but the other core doesn’t seem to acknowledge it.

I wanted to ask if there are any known timeouts, DoS protection, or any other reason a sub request would be dropped from the Cloud service? Also, is that a way to see a log, from the Clouds perspective, of what events where triggered to and from a device?

Moors7 · March 3, 2015, 12:13am

Not entirely sure why it’s doing that. Perhaps @Dave could help out.
In the mean time, you can see all you published events by opening this URL in your browser:
https://api.spark.io/v1/devices/events/?access_token=ACCESSTOKEN_HERE

Dave · March 3, 2015, 12:25am

Hey All,

I suspect you're running into this issue:

There are some workarounds available, but I don't think the firmware team has had a chance to look into this yet, but hopefully soon!

Thanks!
David

RobertM · March 3, 2015, 12:54am

Thank you both for the speedy replies.

@Moors7 that’s great thanks!

@Dave that may be what’s happening indeed. I’m not quite ready to implement the brute force method described, but I’ll keep messing around with implementations and see if I can’t get the same effect another way.

I’ll try to remember to update this thread with any findings.

Robert

RobertM · March 3, 2015, 6:54pm

I was messing around with this last night under the notion that I’m being disconnected from the Cloud (my guess was due to inactivity). So I introduced some heartbeats using Spark.process and going as low as a 100ms heart beat I was still seeing the issue.

I am doing some LED flair that is animating a neopixel for a number of seconds, but I am using delay() in there so I assume Spark.process() is being maintained. Is that true even if my delay calls are 1-5 ms in the parameter?

All this led me to a couple more questions about the cloud service. Does someone know the actually timeout value? If I’m disconnected from the cloud service and call Spark.process() will that reconnect me or do I need an explicit Spark.connect at that point (I noted that Spark.connected() was ‘true’ when I wasn’t seeing sub events).

Muskie · March 3, 2015, 7:06pm

Hello @RobertM,

I have been having the same issue and @Dave has indicated that it is on the list of things to fix. In my case I have a remote core that deep sleeps for 5 minutes and then publishes some sensor data. I have two local cores that subscribe to these publications and all usually works well for a time (could be a few minutes, could be an hour). However there does come a time when both subscribers stop getting publications. I have noticed that at least in some cases this follows a reconnect to the cloud (flashing green, flashing cyan, etc.) and then the next publication is missed. I seem to have ISP problems with maintaining my Internet connection and that may be why the local cores lose it.

I implemented the nuclear option and have made my local cores detect no updates in 14 minutes, at which time they do a system reset. This then allows the next publication to reach both of them. It is far from ideal but works for me in the interim. Let’s hope the firmware folks eventually find a more permanent fix.

Regards,
Muskie

RobertM · March 3, 2015, 7:39pm

Thanks for the reply @Muskie, your issue sounds similar to mine indeed.

I did find a solution for my issue thanks to all the insights among the people in this post. I found that I can simply Spark.disconnect then immediately Spark.connect again polling with a delay until Spark.connected is true. Once the connection is made I immediately re-sub to my message and this seems to work, I do all this once a minute. There is a point at which I obviously can’t send or receive but that’s handled in my other code by testing against Spark.connected, which is easy enough. I’m sure there is still another amount of time which I am connected but the sub may not be fully registered, but I’m willing to accept that.

Obviously when a firmware update is made that will be a better solution.

RobertM · March 4, 2015, 12:57am

Just as an update: I’m guessing that there’s some cloud code that denies too many attempts in a day or something. My cores aren’t able to connect to the cloud anymore – hope I’m not blacklisted

RobertM · March 4, 2015, 9:51pm

I’ve totally revamped and shrunk my code to test the pub/sub functionality and I’m curious if any part of it works. I’ve got a setup that does a subscribe on one core and the other that publishes on a digital pin pulled low. Ive made these messages public and void of data. I get similar outcome to my original post, I get messages for about 10-15 mins then silence until hard reset. This seems to imply an extremely basic function of Spark Cloud is broken. When I read @Dave 's response I thought it was specific to the more complex private messages, using MY_DEVICES. Is this more general, and if so, what are people doing with their sparkcores – what does work? I’d like to believe my code is wrong, so if anyone has a pub/sub code example I’d greatly appreciate a link to it.

Thanks all.

hine · March 4, 2015, 11:53pm

I’ve had the same thing happen as Robert for months. At an interval of about 10-15 minutes my server is no longer able to receive published events. I have 1-5 cores connected at any one time and as soon as the server recognizes that it is not receiving the published events every 15 seconds as it should, then I recall spark.onEvent and things are back to normal. 10-15 minutes later… repeat.

RobertM · March 5, 2015, 7:43pm

This is likely to be my final update to this thread, but I have conceded and implemented the System.reset option discussed in @Dave 's post. The only minor modification I did was an attempt to try to get the cores to reset at the same (real world) time.

So rather than a countdown to millis() + interval, I’m doing something like this in setup:

reset_time = millis() + reset_interval - (millis() % reset_interval);

Then in the loop

if (millis() > reset_time) { System.reset(); }

@Dave Is there a place/thread to watch for firmware update?

onkie · March 6, 2015, 10:48am

I use PubNub for Arduino, Spark, Android, .Net and Javascript. Register and 1.000.000 messages/month free.

C/C++ https://github.com/pubnub/c

Dave · March 9, 2015, 3:04am

Hi @RobertM,

Thanks for posting, I’m setting aside some time this sprint to look more into the pub/sub performance, and we’ve scheduled some maintenance to upgrade some dependencies in the next week or so. I’ll update this thread with progress as we go.

Thanks,
David

Topic		Replies	Views
Losing Spark.subscribe() connection/subscription Troubleshooting	17	6920	January 7, 2016
Spurious Subscribe failures Troubleshooting	29	5715	September 8, 2015
Private Event Publishing Cloud	8	2866	September 25, 2014
Spark.subscribe help Cloud	4	1550	November 3, 2014
Cloud publish/subscribe down this morning? Troubleshooting	9	1243	February 23, 2015

Pub/sub losing event between 2 cores

Related topics