Database Systems Upgrade

zachary · June 2, 2014, 3:55am

Greetings Spark Users! Announcement!

In the never-ending journey to massive scale for systems, another milestone has arrived! On Tuesday night we’ll be upgrading our database to a clustered deployment with clean failover. The impact will be minimal, but I wanted to give everyone a heads-up.

Beginning at approximately midnight in Minneapolis (10pm Tuesday June 4 in San Francisco, 1am Wednesday June 5 in New York, 5am UTC, 6am in London, 1pm Wednesday in China) we will take the web IDE down and put up a maintenance page. If everything goes smoothly, we expect the web IDE to be down for about 10 minutes.

The Device Service and api.spark.io will continue to function normally throughout the transition, however, a small number of database writes during the transition may not transfer to the new system. If you happen to claim or rename a Core via the API (including the mobile app or the CLI) during the maintenance window, you may have to claim/rename it again when everything’s up and running on the new system.

When we restart the Device Service and point it at the new database, any Cores you have online will drop and immediately reestablish their connection to the Spark Cloud. If you blink, you’ll miss it.

Don’t hesitate to let us know if you have any questions or concerns. @jgoggins and I will be on point during the transition. I’ll post here when everything’s back to normal.

Cheers, and may the be with you!

wgbartley · June 2, 2014, 4:07am

You just jinxed it!

zachary · June 4, 2014, 4:32am

Heads-up — maintenance in half an hour.

zachary · June 4, 2014, 5:03am

Maintenance in progress.

kennethlimcp · June 4, 2014, 5:05am

Just want to say that the monitoring notification is working proper.

So excited like I’m part of this (remotely)

zachary · June 4, 2014, 5:09am

All finished. Thanks for playing our game—you win!

kennethlimcp · June 4, 2014, 5:12am

@zachary, it would be so awesome if you and @jgoggins just do a quick run through of what happened for learning purpose when you guys are free.

It’s late already!

jgoggins · June 4, 2014, 5:53am

Hey @kennethlimcp , no problem. I’ll see if I can provide a little more technical detail for learning purposes.

In a nutshell, as @zachary explained, we leveled up our database to be faster and more fault-tolerant.

To prepare we did lots of testing to see how different aspects of the cloud function with a clustered setup. We did a bunch of things like test how long it takes for data to replicate across cluster nodes and confirmed that a config change coupled with a restart does what we expect. We primed complex commands and wrote a little play-by- play checklist of when to execute what (where full programmatic automation was impractical). We had the rollback commands primed in case things went south. We conducted a dry run of the migration several times to confirm things behaved as expected and to get a realistic feel for how long it would take.

Upgrades can be kind of fun when you front load the work so you know what’s gonna happen and you are prepared to respond if things don’t work the way to expect (or maybe I’m a weirdo ).

I’m glad to be on a beefier setup! Glad you see you were following along!

kennethlimcp · June 4, 2014, 5:59am

So much planning, smooth upgrade, endless fun!

My kind of enjoyment

wgbartley · June 4, 2014, 2:37pm

What do you mean by that? Going South is way better than going north. North is where things freeze and get bitter.

jgoggins · June 4, 2014, 3:59pm

@wgbartley, haha, it’s true. Dang, especially true after last winter. That sentence should have read:

“We had the rollback commands primed in case things became a Minnesota winter actively trying to eat your face off and kill you with -50 °F wind chills and 5 feet of snow.”

Topic		Replies	Views
Spark Cloud performance, security, and reliability improvements rolling out Monday 11am CST Cloud	7	1585	March 23, 2015
Amazon Web Services Maintenance	8	1724	October 1, 2014
Spark Cloud Outtage	16	1844	May 5, 2014
Scalability and Reliability of Spark Cloud Cloud	10	1948	May 27, 2014
Sparkulator - https://www.spark.io/build is dead?! [SOLVED] Cloud	6	916	March 31, 2014

Database Systems Upgrade

Related topics