Todays cloud outage

Will there be a detailed postmortem of the outage that happened today?

Yes, we will post as soon as it is available, please keep an eye on status.particle.io.
We are still working through the database migration.

In short, our database provider had entire clusters fall offline, which included our secondary.
Our team implemented a disaster recovery plan and we are slowly building up the database again.

The 2G/3G Boron with 1.5.2 on my desk is stil rapid blinking Cyan for hours (not re-connected to the cloud despite good signal strength).

According to https://status.particle.io only the build.particle.io service is not “operational” but has status “major outage”.

Is the fact that devices can not re-connected to the cloud after an outage usually indicated this way on the status page? (trying to learn from this)

Hi @thrmttnw - we have updated our status page to reflect a partial outage with our device service and degraded webhook performance. Please continue to monitor the status page for updates!

In Console on the vitals graphs, before reconnecting there was a wide gap. Now that the Boron on my desk is back online again :+1: there is no gap in the graph anymore.

Is it expected behaviour that vitals does not reflect multiple hours of outage?

I have devices that are passing traffic and can be pinged through the dashboard but their Last Handshake and Last Heard times are not updating. Is that due to the outage?

After yesterday collapse my electron seems not to receive any webhook responses. I see the request, I get the response but it’s not getting to the electron. The electron is connected to the cloud, it reacts to API calls without any problems. The only issue is the webhook response. Does anybody else experience the same behavior?

Hi @gkujawsk - we’d really like to take a closer look, can you please open a support ticket (support.particle.io) with maximum detail about the webhook itself, its expected behavior, and any related deviceIDs? Thank you!

With pleasure, but I cannot sign in at support.particle.io. I tried the password reset procedure, without the success. I even tried to sign up one more time, but I got message that the email is already taken, thus I assume it has to be something related to the authentication process itself. However, I was able to login using the same credentials to console.particle.io and the community.particle.io. Particle cli tool logged me in properly as well.

In the meantime I’ve tried to delete the webhook and recreate it, but still my app does not receive the webook response.

You shouldn’t need to log in to support.particle.io - are you being prompted to?

1 Like

Sorry! You’re right. First what I saw was the Sign in button, so I tried to login in!
I will submit a ticket in a minute.

1 Like

Hi @marekparticle ,
After yesterday issue, some of my electrons still working fine but they are not in sleep function, even the data it reach 60 MB which is not true as i normally spend 5 to 7 MB per month, however, i have two electrons which they are in sleep mode (sleep for 4 hours), since yesterday they didn’t wake, 11 hours now not heard from them , any idea ? shall i give a bit of time or do i have to go physically to reset them?

Many Thanks

Hi @majj_11 - this is worth a Support Ticket (support.particle.io) as well. Please include the specific Device IDs in your request!

1 Like

Hoping device vitals graphs will some day be changed to be useful.

What do you mean @thrmttnw? Happy to learn more about how to make vitals as useful as possible for you.

The vitals graphs are useful, I just intuitively expected device vitals to reflect a multi hour outage. To me that seems to be the most vital information for users of the platform.

The nature of udp makes it challenging but the vitals history download had the information of unsuccessful connection attempts that could be used for a visual graph.

Hey folks – wanted to circle back on this thread to reiterate our intention to publish a full post-mortem on the Cloud downtime last week. Once it is available our team will continue to be available to engage in discussion and answer additional questions/concerns.

Thank you for your patience!

1 Like

Hi everyone,

Closing the loop here. Particle has published a post-mortem for this incident. You may find it here at https://blog.particle.io/platform-cloud-incident-postmortem/

Hi @mstanley, thanks to the team for the write up and the honesty.

One question of personal interest, how did you generate the timeline graphic towards the bottom of the page? By hand or is there a tool you used?

I often have to write similar major incident post-mortems and I really like the look of that.

Hey @DaveH – this was created by one of our talented designers in Adobe Illustrator. I will pass on the compliments :smile:

1 Like