API to Core response time [patched]

Dave · October 9, 2014, 2:54am

Hey Guys,

The issue I’m seeing is that when a core is reset sometimes it takes a minute before api requests go through to the core – meaning function calls and other requests aren’t going through during that window.

edit: corrected to match reports, and I’m working on a fix now.

Thanks,
David

kennethlimcp · October 9, 2014, 2:58am

I think the core status and listing of functions/variables are messed up during each reset.

davidh · October 9, 2014, 2:58am

It isn’t a “more than once in a minute or two” for me. Every time I reset I have to wait until the CLI “spark list” detects lists it as offline then reset and see if it is connected. Usually after waiting that long it will connect in 1 or 2 tries.

From that point it seems stable, until I reset (usually due to flashing it).

To be clear, I am never getting a good connection immediately after a reset even if the previous reset was many minutes ago.

bko · October 9, 2014, 3:21am

I have two cores breathing cyan (for hours) that report as not connected by the cloud and my published events are not coming through.

[EDIT] After reseting them, they are now connected true and publishing.

kennethlimcp · October 9, 2014, 3:53am

Update

Built a fix, and testing has gone well, rolling out to production soon

SomeFixItDude · October 9, 2014, 3:54am

Bad Gateway 502 and I have been reconnected solid now for a good 2 hours now. Maybe it has passed or maybe I just have a better cloud connection now.

Thanks Spark Team for responding so fast. Hope the fix cleans up the connections.

kennethlimcp · October 9, 2014, 5:08am

Update

Still working on a patch for this, sorry about the delay
Oct 9, 00:06 CDT

kennethlimcp · October 9, 2014, 7:59am

Update

The patch looks good, but we'll wait roll it out so the ops team can monitor the change more closely tomorrow morning. Based on testing and reports this issue is not impacting most deployed cores, only temporarily impacting cores being frequently reset. It's about 3AM CST now, I expect we should have it deployed around noon CST today. Thanks again to everyone who reported issues!

Dave · October 9, 2014, 7:59am

Hey All,

I’ve been working on a patch for this since I posted earlier (about 5 hours ago). I have a patch I feel pretty good about and tested well on our staging environment, but it’s 3am for me here, and the airbnb I’m at has a really terrible connection. I’ll test this again when I’ve slept and deploy it first thing tomorrow morning. Thanks again for reporting issues, I appreciate it!

Thanks,
David

dougleppard · October 9, 2014, 1:54pm

I am still having problems with my core, very hit and miss flashing the core since last night (10/8) and now this morning 10/9.

kennethlimcp · October 9, 2014, 1:56pm

The fix/patch has been made but yet to be rolled-out. See Dave’s comments above.

Dave · October 9, 2014, 5:43pm

Hey Guys,

I’ve deployed the patch, but I will keep improving it. Right now your core might take about ~5-10 seconds after it connects before the cloud is really confident about where your core is connected, after that it will heal and let messages through. I’m going to get that back down to as instantaneous as possible again. So if your api hit doesn’t get through right away, it should in about 5-10 seconds currently. I’m going to resolve the statuspage issue, but I’ll keep working on it.

Thanks!
David

jonathanmastin · October 9, 2014, 6:03pm

Thanks Dave, Kenneth, for jumping on this. Really shows your professionalism.

Dave · October 9, 2014, 10:13pm

Hey Guys,

I rolled out another patch that attempts to shorten the time it takes for a core to start getting api commands after it comes online. All my testing has shown it to be an improvement, but please let me know if you see anything weird or see the problems / issues you were encountering yesterday.

Thanks,
David

(cc: @jgoggins)

SomeFixItDude · October 10, 2014, 12:01am

Connecting to the cloud really fast and appears to be very stable. All the curl commands executing fast too. Really nice job @Dave.

triplea · October 11, 2014, 2:13pm

Hi again,
Unfortunately, today I have still the same problems… Is there someone else with these problems?
For example… If I try this script http://docs.spark.io/shields/#relay-shield-setting-up-the-relay-shield nothing happens when I execute the command to switch on the relay. I receive a Timeout error. Sometimes the relay switch’s on after a while. But I receive also this timeout error.
Any suggestions?
Thx

Dave · October 11, 2014, 6:36pm

Hey All,

I’m still investigating an issue where sometimes the cloud has trouble deciding where a core is connected. This is my top priority right now, and I’m hoping to get this resolved soon, but it might take me a few days. I deployed another patch about 20-30 minutes ago that I think will help, but I’ll keep working on it. Please ping me if you’re stuck and you need urgent help. I probably can’t be available after 5pm CST today, but I’ll try to be around before then, and again tomorrow.

Thanks,
David

Dave · October 13, 2014, 1:30am

Hey Guys,

I pushed another small fix just now that I think should put this bug to bed for now. Thanks again for your help, and please do let us know if something seems off / slower than normal.

Thanks,
David

triplea · October 13, 2014, 8:40am

Hi Dave,
thx for your reply. I tested the sparky yesterday evening. It`s working fine now.
Great work

Topic		Replies	Views
Dropping the connection to spark cloud Troubleshooting	94	13117	February 21, 2016
Core won't connect, and the usual restore/reset methods aren't working Troubleshooting	33	6054	April 2, 2014
Pulsing cyan but can't flash from /build Troubleshooting	30	5442	March 22, 2015
[SOLVED] Can't activate and claim my core Troubleshooting	29	6156	August 2, 2014
API / Cloud down? Troubleshooting	9	1344	April 5, 2015

API to Core response time [patched]

Related topics