API to Core response time [patched]

Hey Guys,

The issue I’m seeing is that when a core is reset sometimes it takes a minute before api requests go through to the core – meaning function calls and other requests aren’t going through during that window.

edit: corrected to match reports, and I’m working on a fix now.

Thanks,
David

I think the core status and listing of functions/variables are messed up during each reset.

1 Like

It isn’t a “more than once in a minute or two” for me. Every time I reset I have to wait until the CLI “spark list” detects lists it as offline then reset and see if it is connected. Usually after waiting that long it will connect in 1 or 2 tries.

From that point it seems stable, until I reset (usually due to flashing it).

To be clear, I am never getting a good connection immediately after a reset even if the previous reset was many minutes ago.

1 Like

I have two cores breathing cyan (for hours) that report as not connected by the cloud and my published events are not coming through.

[EDIT] After reseting them, they are now connected true and publishing.

2 Likes

Update

Built a fix, and testing has gone well, rolling out to production soon
1 Like

Bad Gateway 502 and I have been reconnected solid now for a good 2 hours now. Maybe it has passed or maybe I just have a better cloud connection now.

Thanks Spark Team for responding so fast. Hope the fix cleans up the connections.

1 Like

Update

Still working on a patch for this, sorry about the delay
Oct 9, 00:06 CDT

Update

The patch looks good, but we'll wait roll it out so the ops team can monitor the change more closely tomorrow morning. Based on testing and reports this issue is not impacting most deployed cores, only temporarily impacting cores being frequently reset. It's about 3AM CST now, I expect we should have it deployed around noon CST today. Thanks again to everyone who reported issues!
1 Like

Hey All,

I’ve been working on a patch for this since I posted earlier (about 5 hours ago). I have a patch I feel pretty good about and tested well on our staging environment, but it’s 3am for me here, and the airbnb I’m at has a really terrible connection. I’ll test this again when I’ve slept and deploy it first thing tomorrow morning. Thanks again for reporting issues, I appreciate it!

Thanks,
David

3 Likes

I am still having problems with my core, very hit and miss flashing the core since last night (10/8) and now this morning 10/9.

The fix/patch has been made but yet to be rolled-out. See Dave’s comments above.

Hey Guys,

I’ve deployed the patch, but I will keep improving it. Right now your core might take about ~5-10 seconds after it connects before the cloud is really confident about where your core is connected, after that it will heal and let messages through. I’m going to get that back down to as instantaneous as possible again. So if your api hit doesn’t get through right away, it should in about 5-10 seconds currently. I’m going to resolve the statuspage issue, but I’ll keep working on it.

Thanks!
David

1 Like

Thanks Dave, Kenneth, for jumping on this. Really shows your professionalism.

1 Like

Hey Guys,

I rolled out another patch that attempts to shorten the time it takes for a core to start getting api commands after it comes online. All my testing has shown it to be an improvement, but please let me know if you see anything weird or see the problems / issues you were encountering yesterday.

Thanks,
David

(cc: @jgoggins)

3 Likes

Connecting to the cloud really fast and appears to be very stable. All the curl commands executing fast too. Really nice job @Dave.

2 Likes

Hi again,
Unfortunately, today I have still the same problems… Is there someone else with these problems?
For example… If I try this script http://docs.spark.io/shields/#relay-shield-setting-up-the-relay-shield nothing happens when I execute the command to switch on the relay. I receive a Timeout error. Sometimes the relay switch’s on after a while. But I receive also this timeout error.
Any suggestions?
Thx

Hey All,

I’m still investigating an issue where sometimes the cloud has trouble deciding where a core is connected. This is my top priority right now, and I’m hoping to get this resolved soon, but it might take me a few days. I deployed another patch about 20-30 minutes ago that I think will help, but I’ll keep working on it. Please ping me if you’re stuck and you need urgent help. I probably can’t be available after 5pm CST today, but I’ll try to be around before then, and again tomorrow.

Thanks,
David

Hey Guys,

I pushed another small fix just now that I think should put this bug to bed for now. Thanks again for your help, and please do let us know if something seems off / slower than normal. :slight_smile:

Thanks,
David

Hi Dave,
thx for your reply. I tested the sparky yesterday evening. It`s working fine now.
Great work :wink:

1 Like