DDO unavailable: Saturday March 30th

Kessaran

Well-known member
I have to laugh at all of you who are all in on the "a good data center is foolproof! They must have cheaped out!" Two real world examples, both Microsoft Azure: One time they were doing cleanup and had a script to delete old SQL VM instance backups. Unfortunately the script was written wrong on instead of deleting "old backups" it deleted the live customer VMs. Every SQL instance running on the whole East Coast node went poof. Oopsie. That was a three hour outage for my company. Another real case, they somehow had a whole rack of management servers that was totally outside all the fault tolerance mechanisms. No backups, no secondary power supply, not even scripted into the VM management systems. And then they had a power failure, and all the actual customer machines were fine, they failed over fine, but this rack of network routers, VM managers, etc went down hard. And when it got back on power and rebooted, there was nothing to tell the master VM manager that these servers were supposed to be core infrastructure, and it just saw a bunch of high power servers and started spinning up high price tier customer VMs on it, overwriting all the critical VMs that weren't backed up. That one didn't hit my company, but it was a 24+ outage for a lot of Azure customers.
By that logic, Microsoft Azure was operational ~3 hours after the incident. Even the 2nd outage you explained was up ~24 hours later. It's been nearly 36 hours and a video game data center still isn't operational. This is part of their jobs and they aren't doing them. When your companies entirely livelihood depends on the games being operational and making money by keeping customers happy, a critical failure like this that prevents happy customers should be addressed immediately with the full workforce.
 

Sylvado

Well-known member
people that pay a subscription for this game should really reconsider it. down time for ~2days is mind boggling. No rollback, lack of contingency plan and mitigation plan, how is their issue management process this bad, escalation path for issues. why are you paying a sub fee, if they can't intelligently manage a game?
I am VIP and will continue VIP. It is not always just roll back and restart. I support applications for one of the largest and most heavily regulated companies in the world. I have been on command center calls that went on for 48 hours with dozens of people working the problem. No one on this forum knows the scope of the issue so stop playing IT hero.
 

rohmer

Well-known member
I have to laugh at all of you who are all in on the "a good data center is foolproof! They must have cheaped out!" Two real world examples, both Microsoft Azure: One time they were doing cleanup and had a script to delete old SQL VM instance backups. Unfortunately the script was written wrong on instead of deleting "old backups" it deleted the live customer VMs. Every SQL instance running on the whole East Coast node went poof. Oopsie. That was a three hour outage for my company. Another real case, they somehow had a whole rack of management servers that was totally outside all the fault tolerance mechanisms. No backups, no secondary power supply, not even scripted into the VM management systems. And then they had a power failure, and all the actual customer machines were fine, they failed over fine, but this rack of network routers, VM managers, etc went down hard. And when it got back on power and rebooted, there was nothing to tell the master VM manager that these servers were supposed to be core infrastructure, and it just saw a bunch of high power servers and started spinning up high price tier customer VMs on it, overwriting all the critical VMs that weren't backed up. That one didn't hit my company, but it was a 24+ outage for a lot of Azure customers.
Redic.

The number of times there are outtages at cloud services can be counted on one hand
 

New friend

New member
I am VIP and will continue VIP. It is not always just roll back and restart. I support applications for one of the largest and most heavily regulated companies in the world. I have been on command center calls that went on for 48 hours with dozens of people working the problem. No one on this forum knows the scope of the issue so stop playing IT hero.
oh cool
 

Jack Jarvis Esquire

Well-known member
I am VIP and will continue VIP. It is not always just roll back and restart. I support applications for one of the largest and most heavily regulated companies in the world. I have been on command center calls that went on for 48 hours with dozens of people working the problem. No one on this forum knows the scope of the issue so stop playing IT hero.
You forgot to shout "FORE!" 🙄👍
 

Dandonk

This is not the title you're looking for
O that we now had here but one ten thousand of those men in England that do no work today!
 

Kalsang

New member
I want to hear the spilling of blood, byle, and the lamintation of the women. I want to see blood, gore and gut's, see veins in my teeth, eat dead burn't bodies.... you know, play DDO!
That's the answer! We should all go to Alice's Restaurant and chill out for a while. So, how many of you old-fogey nerds get that reference?
 
Last edited:

Kessaran

Well-known member
Full workforce? How would you expect the marketing team to help with a issue like this
Full workforce as in everyone in the respective department working on it. Sadly I just remembered that it's SSG we're talking about and they probably have 2 people with under 5 years of experience working their entire networking department.
 

Col Kurtz

Well-known member
Can I just point out, I don't have any of these these problems with golf.

Those are MUCH worse! 🙄👍
just like skiing...we can get weather delays and cancellations. I should prob be skiing right now, but visibility looks pretty bad up in the local mountains right now.

on side note: I may have to start yelling ''FORE" when I drop off a cornice in your honor ;)
 

Toede

Well-known member
By that logic, Microsoft Azure was operational ~3 hours after the incident. Even the 2nd outage you explained was up ~24 hours later. It's been nearly 36 hours and a video game data center still isn't operational. This is part of their jobs and they aren't doing them. When your companies entirely livelihood depends on the games being operational and making money by keeping customers happy, a critical failure like this that prevents happy customers should be addressed immediately with the full workforce.
Sorry, fiscally irresponsible
 

Episkopos

Lawful Good Never Looked This Evil
This image was (allegedly) smuggled from the datacenter.

61286792_2561043663914078_103115490397782016_n.jpg


Things look bleak.
 
Top