DDO unavailable: Saturday March 30th

TavernTails

Tuesday Trivia Host on DDOstream
Power surge issues in "proper" datacenters should be rare, if not unheard of, but there's always the 0.0001 percent scenario. If surge suppression failed and multiple pieces of hardware got fried, physical hardware replacement takes time.
This is what has me curious. I have run data centers for a large outsourcer and currently host servers in two major data center providers. There is SO much redundancy before the power gets to the servers, I cannot fathom how this occurred. I feel for the engineers and developers, for I have been there trying to restore service after a major disaster. It is NOT a good place to be :(
I do know that I will be reviewing my company's incident response plan, offsite backups, and disaster recovery program on Monday. Now I am paranoid...
 

pame12

Well-known member
This is what has me curious. I have run data centers for a large outsourcer and currently host servers in two major data center providers. There is SO much redundancy before the power gets to the servers, I cannot fathom how this occurred. I feel for the engineers and developers, for I have been there trying to restore service after a major disaster. It is NOT a good place to be :(
I do know that I will be reviewing my company's incident response plan, offsite backups, and disaster recovery program on Monday. Now I am paranoid...
I think in SSG's case, they're just using an old/bad service provider. How much redundancy is there in some no name server rack renting place? Probably none.
 

Sylla

Well-known member
We are continuing to work to reopen the game worlds. As soon as I know more I will let you know.

Just say it wont be up until late tuesday. Without any context or details provided, im just assuming that "the fix" is impossible atm.

a) responsible SSG staff is on holidays
b) some other party required to assist is on holidays
c) both a and b are correct
 

Hutoth

The Hatchery
We are continuing to work to reopen the game worlds. As soon as I know more I will let you know.
giphy.gif
 

Frantik

Well-known member
When does speculation become reality? The idiots don't know, they just react to the sound of another fool banging a drum.
 

wreck

Member
hopefully the extended offline time is being used to put old mechanics back in the game like when you get a negative level it unequips your at-level items

back to the glory
 

Smelt

Well-known member
I think a lot of you just don't get it... SSG are losing money big time this weekend with additional expenses. They will be trying to solve the issue afap.
Hmm.... Maybe it's you that just doesn't get it. Having worked in senior positions in procurement for years I can confidently predict that, assuming this is a third party (out sourced data centre) issue SSG are losing precisely zero dollars by not working on the problem themselves over a bank holiday weekend. Why you may ask, because they has written into the contract with the data centre compensation for loss of service, this will not include wages for their own employees to come in over said bank holiday weekend. The compensation figure is set at the top estimates or historical figures for a specified time period. This is usually set at an hourly rate with any part of an hour counting as a whole hour once things are up and running again.

If this is an in house issue they will have insurance policies to claim off of, which are likely to include the extra wages and other expenditures and so it is likely that SSG may be working on it, although it is highly unlikely that they are losing out during this unplanned down time in either scenario.

The only real risk to them is losing paying customers because of an extended down time. As most of us here have suffered this sort of thing on multiple occasion's SSG probably feel this risk is low. This is backed up by the lack of feed back from Cordovan, who as community liason officer is unlikely to be involved in actually fixing the problem even if they are working on it.

All of this is, of course, the best case scenario. If they are actively and feverishly working on the problem because their procurement team failed to add a compensation clause into third party operations contracts, or forgot to get insurance for their own infrastructure, and they still don't know what the cause of the situation is that is really bad. If they do know, but feel that just repeating we are working on the problem, is better than telling us what is really going on..... that is really bad.

Of course3 the real problem is probably that one or more of the hamsters died and all the pet shops are shut over the Easter weekend so they can't get a replacement until Tuesday.
 

TavernTails

Tuesday Trivia Host on DDOstream
I think in SSG's case, they're just using an old/bad service provider. How much redundancy is there in some no name server rack renting place? Probably none.
You may be correct. While I am glad to hypothesize, I don't want to speculate and have someone say, " Hey, THE TUESDAY DDOSTREAM GUY SAID THEIR SERVERS WERE HOSTED IN SOME DUDE'S BASEMENT!!1!!" For the record, I did not (lol).

I just hope they've been able to get some sleep during all this. I spent four days straight in a network operations center once, sleeping only briefly in my office, and it SUCKED.
 

Kathwynn

Well-known member
We are continuing to work to reopen the game worlds. As soon as I know more I will let you know.
Alrighty then. No eta on the eta.. I really do hope you can figure this out and soon. Two days and my hands are starting to shake.. sweating uncontrollably.. I need my fix.. You do not understand here.. II mean it has gotten bad. I had an actual conversation with my wife. Clean up the yard. This has got to end. My gods man, if this continues I might have to start taking walks outside. OUTSIDE! Wearing clothes.
I do not care what you have to do, but for gods sweet sake I might even have a conversation with my neighbor. Does anyone really want to see that happen? Please fix it soon.. This is going to lead to bad things like being social and actually interacting with people in the outside world. Please fix it soon..
 

Jummby

Well-known member
I really hope that the people actually working on fixing the problem aren't stopping to post updates here and on DDO's social media pages.

This kind of ties back into the point I was making in a previous post about proper DR plan starting at the executive level and flowing down to all levels of a company.

DR 101 says that you have personnel (usually management or executive level) dedicated solely to communicating with your customers during an incident.

The way my team handles it is thus:
1) We decide which of the system admins/network engineers is best suited to actually fix the problem whether that be doing the work themselves or working with the appropriate vendor support, etc. Fixing the problem then becomes their sole focus.​
2) We then pick a different admin/engineer to unobtrusively monitor the "fixer's progress". Their primary task is to be the intermediary communicator between the "fixer" and the personnel tasked with communicating updates to the rest of the company, customers, media, etc. They handle translating the technical jargon into layman's terms for the public communications team.​
3) The public communications personnel takes it from there.​

This method makes it much, much easier to provide regular updates on an outage to management and our customers without slowing down the actual fixing of the problem.

From that perspective, I'd honestly expect Cordovan, Tolero, and maybe Severlin to be handling communications to their customers. While Cordovan's provided a few updates, the frequency is a bit lacking in my opinion. Looks like he could use some help from the executive team.
It's a holiday, wait till tomorrow it seems.
 
  • Like
Reactions: DBZ
Top