DDO unavailable: Saturday March 30th

Sweyn · Mar 30, 2024

Kessaran said:
It's fair in the eyes of the players who have to deal with it constantly from every update or hotfix or weekly reset.

It doesn't matter how you perceive it or what happened in the past; you're wrong. Unless you have evidence that it's SSG's fault, it's unfair to blame them for a problem that you know nothing about.

droid327 · Mar 30, 2024

Sweyn said:
How do you know the issue isn't outside of the data center's control? Such as a power outage? Or a construction company cutting a cord? Etc? What basis are you going off of to make the assumption that this is SSG's or their data center's fault?

Well that would either be a 'one time issue' or an issue that the data center would need to work out with its service providers so they can guarantee reliable service to its customers

Sweyn said:
It doesn't matter how you perceive it or what happened in the past; you're wrong. Unless you have evidence that it's SSG's fault, it's unfair to blame them for a problem that you know nothing about.

Its not blame, its merely accountability. When it comes to SSG's customers, the buck stops with SSG. Businesses dont get to make excuses, they only get to find solutions.

Volarr · Mar 30, 2024

Kessaran said:
To be fair, 99% of the time it's SSG's fault. When you make consistent mistakes for 10+ years you are forever guilty until proven innocent.

Why does there have to be someone at fault. i think thats faulty thinking

wreck · Mar 30, 2024

the night is dark and full of terrors

Dude · Mar 30, 2024

Sweyn said:
Assuming it's their fault without any basis or evidence is not fair. It's quite literally the opposite of fairness.

Haven't you seen their post count?!?!?!?!?! They know!

CymTyr · Mar 30, 2024

I would be horrified if there weren't redundant backups of the information saved so that at worst, we'd lose 4-6 hours or even up to a day's worth of progress. It's 99.99% certain the rumor on X about getting rolled back to January is just that - a rumor.

cjlopes · Mar 30, 2024

I suppose at this time.... We won't get any hope since it's late.... Unless someone say they ll b working until problem is solved

Sweyn · Mar 30, 2024

droid327 said:
Well that would either be a 'one time issue' or an issue that the data center would need to work out with its service providers so they can guarantee reliable service to its customers

Please explain how a Data Center would be able to guarantee (your words) that a storm or a construction company hitting an underground cable miles away is a "one time issue". While I 100% agree with your premise of accountability and finding solutions, it would be naive to make assumptions on an event with no factual information to back them up.

droid327 · Mar 30, 2024

Sweyn said:
Please explain how a Data Center would be able to guarantee (your words) that a storm or a construction company hitting an underground cable miles away is a "one time issue". While I 100% agree with your premise of accountability and finding solutions, it would be naive to make assumptions on an event with no factual information to back them up.

Because those things dont tend to happen twice the same way

Hence the term 'one time issue'

Frantik · Mar 30, 2024

Sweyn said:
Please explain how a Data Center would be able to guarantee (your words) that a storm or a construction company hitting an underground cable miles away is a "one time issue". While I 100% agree with your premise of accountability and finding solutions, it would be naive to make assumptions on an event with no factual information to back them up.

Agreed about accidental cable damage but lightning (storms) are mitigated by competant design features. It's when you cheap out, cut corners and let the statistical-model accountants in that servers get fried.

Margona · Mar 30, 2024

Almghandi said:
easter monday is a holiday somewhere?

Sweden.

nobodynobody1426 · Mar 30, 2024

So cool story while we wait, and it's definitely relatable as something similar happened to us a few weekends ago.

Few Saturdays back, I started getting flooded with Pagerduty alerts right after my Gym workout. Got back to the house and got on the work laptop and checked and we have a complete service outage of all production systems. Got on the horn and it's all hands on deck. Primary datacenter was frozen, secondary was active but in a very weird state.

Our storage guys started digging and it turns out their super expensive cross DC synchronous storage devices had gotten into an argument over who was in charge of the LUN's and the primary datacenter system decided to force all storage pools offline. This is functionally the same as ripping your hard drive out while your computer is still, only worse since the hypervisor thought the world was ending. Immediately got Dell (they owned it) on the call as a P1 and they were able to remote in and see the arrays frozen. Dug deeper and it's the result of some bug that they were supposed to get fixed but people had kept kicking it down the road. Took Dell + our storage folks over six hours to unfreeze the storage devices, apply the update, declare the primary DC as master forcing the second to discard all storage changes since the problem began. Then we had to bring everything back online in sequence, which took several more hours. Finally smoke test and validate everything good then declare incident over and go to bed. Storage guys had to write a very length report that Monday and our CIO + Ops Director chewed out some Dell operations manager who had been the one to say the bug wouldn't' effect us and we could wait to apply the fix.

Daemos · Mar 30, 2024

OMG, can we please get back to talking about Hamsters and ChatGPT stories?!?

Frantik · Mar 30, 2024

Margona said:
Sweden.

I think most European countries.

Bunker · Mar 30, 2024

There is nothing wrong with outsourcing to run a business. The end game is to profit (most companies), and if you require outsourcing, let's say a data center, that can be a good choice. It comes down to the T/C with everything in regard to working with other companies to make successful product.

If indeed SSG is not up because of a third-party provider, then based on thier contract, SSG will be compensated accordingly. It's just business.

We the experts (internet geniuses like me or anyone else) can always debate a better way. And there is always a better way in the aftermath of an incident. But in the moment, hopefully there is enough coverage for SSG to keep the lights on, so our addictive urges are protected for a little bit longer.

Dude · Mar 30, 2024

nobodynobody1426 said:
So cool story while we wait, and it's definitely relatable as something similar happened to us a few weekends ago.

Few Saturdays back, I started getting flooded with Pagerduty alerts right after my Gym workout. Got back to the house and got on the work laptop and checked and we have a complete service outage of all production systems. Got on the horn and it's all hands on deck. Primary datacenter was frozen, secondary was active but in a very weird state.

Our storage guys started digging and it turns out their super expensive cross DC synchronous storage devices had gotten into an argument over who was in charge of the LUN's and the primary datacenter system decided to force all storage pools offline. This is functionally the same as ripping your hard drive out while your computer is still, only worse since the hypervisor thought the world was ending. Immediately got Dell (they owned it) on the call as a P1 and they were able to remote in and see the arrays frozen. Dug deeper and it's the result of some bug that they were supposed to get fixed but people had kept kicking it down the road. Took Dell + our storage folks over six hours to unfreeze the storage devices, apply the update, declare the primary DC as master forcing the second to discard all storage changes since the problem began. Then we had to bring everything back online in sequence, which took several more hours. Finally smoke test and validate everything good then declare incident over and go to bed. Storage guys had to write a very length report that Monday and our CIO + Ops Director chewed out some Dell operations manager who had been the one to say the bug wouldn't' effect us and we could wait to apply the fix.

So you're saying they failed their DC check?

GimpyPaw · Mar 30, 2024

Sweyn said:
Please explain how a Data Center would be able to guarantee (your words) that a storm or a construction company hitting an underground cable miles away is a "one time issue". While I 100% agree with your premise of accountability and finding solutions, it would be naive to make assumptions on an event with no factual information to back them up.

You are correct that assigning blame would be premature at this point, but based on past events and patterns one can certainly be justified in having suspicions concerning the cause.

CaseStringer · Mar 30, 2024

They rolled snake-eyes!

Sweyn · Mar 30, 2024

droid327 said:
Because those things dont tend to happen twice the same way

Hence the term 'one time issue'

"Don't tend to" and "Don't" are not the same. So again, is a company supposed to make a guarantee (often with financial implications) to another company based on how something "tends" to be?

Just how SSG's terms of service that you agreed to says "Daybreak does not ensure continuous or error-free access or availability of any Daybreak Game(s)."

Daybreak (and by extension SSG) does not guarantee continuous or error-free access to DDO, just as likely as a data center would never agree to continuous or error-free access to their facilities.

nobodynobody1426 · Mar 30, 2024

Sweyn said:
Please explain how a Data Center would be able to guarantee (your words) that a storm or a construction company hitting an underground cable miles away is a "one time issue". While I 100% agree with your premise of accountability and finding solutions, it would be naive to make assumptions on an event with no factual information to back them up.

Hmm DC's are put in places, so it really depends on the location and who's running it. Well known brands like Equinix always have ridiculous levels of redundancy. There isn't a single of anything, much less running the whole thing on one fiber connection. Each is equipped with it's own onsite 3MW generator and enough fuel for 72hours of continuous operation at maximum output, refueling allows it to run indefinitely. On the other hand, some backroom with a big AC that someone calls a "Datacenter" and only has one fiber to the an ISP, yeah that's not gonna survive anything.

DDO unavailable: Saturday March 30th

Well-known member

Well-known member

Well-known member

Member

Well-known member

Member

Member

Well-known member

Well-known member

Well-known member

Member

Well-known member

Active member

Well-known member

Well-known member

Well-known member

Member

Member

Well-known member

Well-known member