Database Systems Upgrade

Greetings Spark Users! Announcement!

In the never-ending journey to massive scale for :spark: systems, another milestone has arrived! On Tuesday night we’ll be upgrading our database to a clustered deployment with clean failover. The impact will be minimal, but I wanted to give everyone a heads-up.

Beginning at approximately midnight in Minneapolis (10pm Tuesday June 4 in San Francisco, 1am Wednesday June 5 in New York, 5am UTC, 6am in London, 1pm Wednesday in China) we will take the web IDE down and put up a maintenance page. If everything goes smoothly, we expect the web IDE to be down for about 10 minutes.

The Device Service and api.spark.io will continue to function normally throughout the transition, however, a small number of database writes during the transition may not transfer to the new system. If you happen to claim or rename a Core via the API (including the mobile app or the CLI) during the maintenance window, you may have to claim/rename it again when everything’s up and running on the new system. :wink:

When we restart the Device Service and point it at the new database, any Cores you have online will drop and immediately reestablish their connection to the Spark Cloud. If you blink, you’ll miss it.

Don’t hesitate to let us know if you have any questions or concerns. @jgoggins and I will be on point during the transition. I’ll post here when everything’s back to normal.

Cheers, and may the :spark: be with you!

4 Likes

You just jinxed it! :frowning:

2 Likes

Heads-up — maintenance in half an hour. :wrench:

1 Like

Maintenance in progress.

1 Like

Just want to say that the monitoring notification is working proper.

So excited like I’m part of this (remotely) :smiley:

2 Likes

All finished. Thanks for playing our game—you win! :sparkling_heart:

1 Like

@zachary, it would be so awesome if you and @jgoggins just do a quick run through of what happened for learning purpose when you guys are free.

It’s late already! :slight_smile:

Hey @kennethlimcp , no problem. I’ll see if I can provide a little more technical detail for learning purposes.

In a nutshell, as @zachary explained, we leveled up our database to be faster and more fault-tolerant.

To prepare we did lots of testing to see how different aspects of the cloud function with a clustered setup. We did a bunch of things like test how long it takes for data to replicate across cluster nodes and confirmed that a config change coupled with a restart does what we expect. We primed complex commands and wrote a little play-by- play checklist of when to execute what (where full programmatic automation was impractical). We had the rollback commands primed in case things went south. We conducted a dry run of the migration several times to confirm things behaved as expected and to get a realistic feel for how long it would take.

Upgrades can be kind of fun when you front load the work so you know what’s gonna happen and you are prepared to respond if things don’t work the way to expect (or maybe I’m a weirdo :smile:).

I’m glad to be on a beefier setup! Glad you see you were following along!

4 Likes

So much planning, smooth upgrade, endless fun!

My kind of enjoyment :smiley:

What do you mean by that? Going South is way better than going north. North is where things freeze and get bitter. :stuck_out_tongue:

1 Like

@wgbartley, haha, it’s true. Dang, especially true after last winter. That sentence should have read:

“We had the rollback commands primed in case things became a Minnesota winter actively trying to eat your face off and kill you with -50 °F wind chills and 5 feet of snow.”

1 Like