Our data center, SVColo (http://svcolo.com/), lost power (and apparently all generators as well) this afternoon, causing our site to be completely unavailable for a couple of hours. We apologize for the extended outage.
K.D.,
Matt Kaufman,
Berta Holland,
Nimaa,
philippe,
John E. Bredehoft,
healingkid,
Gordon Saunders,
Alex Luft,
Lysender,
Iván Abrego,
David Wolf,
Hutch Carpenter,
FFing Enigma,
Jim Braux-Zin,
Diana Lewis,
PI,
Jemm,
Phil Glockner,
Craig Eddy,
Sardar Mohkim Khan,
James Myatt,
nadim,
Steve C, Team Marina,
阿石,
ɐ ɯıʞ sıɹɥɔ,
Seth Deval Garrison,
yasin ghasemi,
Joe Dawson,
Bob Morris (polizeros),
americanm,
Daniel Rowley,
Max Schulze,
Zu from AOD,
M F,
Kazutaka Ogaki,
stanjourdan,
Simon Wicks,
Susan Beebe,
Keith Harrison,
Dobromir Hadzhiev,
Rob Diana,
eugenio,
Kol Tregaskes,
Brad Kligerman,
Varun Mahajan,
رامین,
Carlos M. Gomes,
Bastian,
Jason Williams,
Shinya ICHINOHE,
edythe,
Amani,
Shannon Jiménez,
Mitch,
Lu Tao,
(jeff)isageek,
kuchin,
NaHi,
DGentry,
Wayne Sutton,
LPH™ and his dog P™,
spharmonic,
Jeanette Bosman,
Richard Walker,
imabonehead,
jacek,
Tudor Bosman,
mjc,
Vishy,
Ken Gidley,
Dan owns Comicsforge.com,
David Cook,
Craig Bailey,
Rachel Lea Fox,
Robert J Taylor,
Scoble, Alex Scoble,
Ho John Lee,
AJ Batac,
Josh Haley,
Steven Perez,
David Vasileff,
MikeAmundsen,
viki saigal,
gburd,
Atul Arora,
Akiva,
zizukabi,
Zaki Manian,
Grant Bierman,
Capn' One Eye - adrift,
make some noise,
GOTTi,
Jonathan Hardesty,
guruvan (Rob Nelson),
Andrew,
Just Mrs. V,
Jeff,
AJ Kohn,
zeroinfluencer,
phil baumann,
RAPatton,
Bill Sodeman,
Mike Reynolds,
Andrew Trinh,
Mitchell Tsai,
patrick,
Sam Grover,
Dan Hsiao,
Mona Nomura,
Nick Martin,
Mel Buckpitt,
Mark Krynsky,
Charlie Anzman,
Alfredo 亜瑠布れっど,
Penguin Sparkles,
LANjackal,
Carlos Ayala,
MG Siegler,
Nathan Chase,
Jesper Lind,
Derrick,
metalerik,
Jandy,
Rochelle,
Morton Fox,
Ninh Nguyen,
John Fu,
Aaron Draczynski,
and
Aloof Schipperke
liked this
Did you miss us?
- Paul Buchheit
This outage impacted our site as well as a number of other web sites hosted at SVColo. We are obviously fairly frustrated by the incident, and we are working hard to get everything else back online now.
- Bret Taylor
I was worried it was the End of Days!!!
- Rochelle
That's ok not your fault, great work and yes we did miss you
- Kim Landwehr
World productivity just went back down.
- Amit Patel
Just don't let it happen again. ;)
- Derrick
Did we miss you?! You just wanted to make sure we appreciated you, right? *whew*
- Jandy
yeah!
- Nathan Chase
I thought you got infected by Twitter-Flu
- John Spyers
Holy crap, I never thought I'd miss FF as much as I did. It was like the sun vanished from the sky. Thanks for returning, phew :)
- LANjackal
Hmmm are you guys thinking about DR? Dual homing? Failovers?
- EricaJoy
so much for SVCOLO's "All power is conditioned by facility-wide redundant UPS systems to assure 100% reliability.", eh?
- Nathan Chase
OMG that is all
- BEX
yay! you are back!1!
- Lindsay
Welcome back Bret. Heard there was a run on anti-compulsive meds worldwide. Whew
- Charlie Anzman
It sure seemed like a LONG time. I'd be wanting some money back from my provider for not meeting the contract!
- CAJ was here
Good to have you back!
- Sam Grover
It amazing how lost I am without Friendfeed. Nice that you're back!
- Rahul Das
Glad you're back...and most importantly still in one piece. Do you anticipate any issues with feed importing? I just manually refreshed Twitter and all seems well.
- Mark Krynsky
Great to have you guys back. Lots of weblinks/posts from surfing the past 3 hrs.
- Mitchell Tsai
#fffail was no fun. Glad you're back!
- Bill Sodeman
funnily enough this is not the first time i have heard of a datacenter with supposedly highly available N+1, A-side and B-side, UPSed & diesel-generated power just going totally offline because someone tripped over a plug.
- Karim
Yes, I missed FriendFeed! And what happened to the SVColo emergency generator plan? Jeez.
- AJ Kohn
Glad they and you are back online!
- Andrew
The FriendFeed Outage caused me to get hooked on the #annoyatrekkie topic of Twitter. Glad to see you back!
- manielse (Mark Nielsen)
YES we missed you! ;-) Glad you are back up, and that as soon as I was back I could look here and see the explanation. Things happen, but I love it that there never seems to be a question that you'll give us the low-down. Thanks! :-)
- guruvan (Rob Nelson)
And double thanks for the Tweet to announce the outage ;) (a tweet announcing restoral would be uber-cool)
- guruvan (Rob Nelson)
Thanks, Madhav, we are fixing right now. Sorry for the trouble.
- Bret Taylor
nah, they knew we'd just be F5ing until they came back up ;)
- metalerik
Search still down?
- Jesper Lind
The IM bot doesn't seem to be up either.
- Rahul Das
I guess this will convince investors of the need for a second datacenter site
- guruvan (Rob Nelson)
Karim: the generators almost never work for anything short of telcos carrying 911 traffic..
- guruvan (Rob Nelson)
Cool, some people were saying it was a different power system, thanks for the update.
- Dan owns Comicsforge.com
Rob, that gibes with my anecdotal experience :-) but why is that? what is the point of telling customers you don't have single points of failure in your datacenter when you do...? mishegoss.
- Karim
Karim, there are ALWAYS single points of failure. If nothing else, Earth is a single point of failure :). As for the generator, supposedly that's what's powering things right now!! (which has me somewhat frightened)
- Paul Buchheit
Paul: very true....And Karim, it's probably because the generator backups are rarely tested, certainly not regularly, and partially for this reason...the failovers don't work and they don't want to take customers down because a test went wrong
- guruvan (Rob Nelson)
I hope, Paul, that this will cause you guys to be able to go to the investors and get things happening on two or three sites, so we only see performance degradations in future calamities (since disaster will always strike)
- guruvan (Rob Nelson)
yay!
- MikeAmundsen
We need double-likey's for posts like this
- Charlie Anzman
A likely story, Bret.
- Steven Perez
Yes, we missed you!
- Anne Bouey
We podcasted about you in the dark!
- Josh Haley
This is what you get for not subscribing to me. Don't trifle again, Mr. 'Taylor'.
- Akiva
Fess up Bret ... this was just a test to see how starved we'd get without our hourly toke ;)
- AJ Kohn
If the colo facility isn't testing their generators on a weekly basis, someone needs to lose their job.
- Scoble, Alex Scoble
and I was gonna suggest that it went down because I was AFK... I guess I don't qualify for the 'I survived the great FF outage of '09 and all I got was this lousy t-shirt' t-shirt.
- grant fox
During the Twitter planned outage I thought, "well, at least I still have FriendFeed". Ouch! Glad you're back. I've lived through several outages on both sides of the equation - don't wish it on ANYONE.
- Robert J Taylor
Very sad test of how addicted I've become. I was jones-ing pretty hard. Glad you're back.
- Ken Gidley
Paul, yeah, we can't eliminate SPOFs completely, but i've seen these places brag about how they have N+1 power systems, UPS, diesel generators that can run for days at peak load, priority contracts to have more diesel fuel delivered if necessary, etc. etc., and something happens and the whole data center just dies. you realize that you might be screwed in the event of Global Thermonuclear War :-) but you don't expect to be down for hours because somebody pushed the wrong button...
- Karim
Nice to have you back.
- Ignace Rodriguez de R,
strangely enough, a transformer blew up outside earlier tonight and now my lights just flickered. going to make sure my UPS is charged :-D
- Karim
Alex: Where have you ever been that they actually tested gens on a weekly (or regular even) basis. I've been around colos and CLECs for many years, and this happens all the time because there's never a test
- guruvan (Rob Nelson)
and Alex: That's where I'm going to put my next project ;-)
- guruvan (Rob Nelson)
glad to have ff back. What will happen to updates which were made during that time? will they appear on my ff site?
- Okeane
Supposedly the generators worked ok, but there was a ground-fault elsewhere in the system that blew out all the breakers. Murphy wins again.
- Paul Buchheit
I am guessing they may have to update this page: http://www.svcolo.com/power... ;-)
- Brian Sullivan
It is a fact of computing...sometimes the power just quits. IP is great - we can always rely on RFC2549 for a no-electricity transport protocol.
- guruvan (Rob Nelson)
Where you gone....I hardly noticed....OKAY! YES! Yes, I missed you! Satisfied?!
- WoH: Professor MOTHRA
glad FF is back!
- (jeff)isageek
My browser reload button is exhausted. But all systems are go now—thanks for calming all the passengers by simply telling us what's really going on—awesome. *climbs out of search.twitter dingy and back onto the mothership*
- Micah
guruvan: Most hospitals I've been to test their backup systems on a regular monthly schedule. Ironically, that makes the emergency power system in general less reliable than utility power.
- Gabe
So it that not a fail whale but a school of fail whales??
- Amani
Gabe: ok..never worked in a hospital, but most of them do in fact have proper working backup power to my understanding. why can't the telcos and datacenters? this is a very common problem for them.
- guruvan (Rob Nelson)
@Rob, not everything is that simple. did someone drive their car into the generators? :P
- mjc
Does sitting for 2 hours watching all mentions of FriendFeed on Twitter search qualify as missing you?
- Sharon McPherson
Have a fail-over version of Friendfeed on Google App Engine. I know it will take time and effort, but then it will mean built-in redundancy. And failing of both together will be a highly imbprobable event
- Varun Mahajan
Michael, I know it's not all that simple, but you would be amazed at how often this happens in professional datacenters and at CLECs and without something as understandably disastrous as cars through buildings or bombs.
- guruvan (Rob Nelson)
Rob: when I first met Rackspace's chairman, Graham Weston, we talked about a failure they had when a truck knocked out power there. Turned out the chillers and generators had a flaw in their design that kept the generators from kicking on for a few seconds. This sounds exactly like what happened here. Rackspace worked with the chiller and generator companies to design a fix, but he told me that most data centers haven't upgraded (Rackspace's have). I wonder if this is the flaw that hit friendfeed yesterday?
- Robert Scoble
I read somewhere that it had been mathematically shown that a computer program can never be 100% error-free because the error routines will have errors... Maybe it's the same type of thing here.
- Bob Morris (polizeros)
Robert: That's very interesting. Don't know if that's what happened, but I do know that most of the time when I've seen this happen in a CLEC CO it's been much more simple than that. Usually attributable to human error (and lack of testing). I am curious to how, and how often Rackspace performs tests. Finding the design flaw you mention speaks highly of the company IMO.
- guruvan (Rob Nelson)
Rob: they run tests on the power system very often. Unfortunately nothing tests a system like real life.
- Robert Scoble
No, there are factors that just don't come into play in a "controlled" test scenario, no matter how thorough you try to make the test.
- guruvan (Rob Nelson)
It was the severity and length of the outage that was the surprise - but we have had electricity outages over a vast part of the continent that lasted days and caused by small flaws so this isn't out of that realm. But this is or should have been a much smaller more controlled situation. I am guessing there may be some adjustments in contracts and some heads rolling. Who else was affected by the situation? I get the impression that svcolo is a fairly large operation.
- Brian Sullivan
Brian: if a data center goes down for two seconds it often takes an hour or longer to turn everything back on and get it all working properly again.
- Robert Scoble
In this case it was definitely "longer"
- Brian Sullivan
Very true...and often longer than that to get data systems working again, as they're often overloaded as soon as they come online (causing great difficulty in getting them to run correctly). I was very impressed with how quickly FriendFeed was able to have everything back to fully functional
- guruvan (Rob Nelson)
Brian: yup. I bet that the power came on pretty quickly, but bringing up a rack of equipment and getting everything talking to each other again can take quite a while.
- Robert Scoble
I guess they will have to adjust their power uptime claims to read 100% (except for strange circumstances) ;-)
- Brian Sullivan
no company anywhere should ever try to claim 100% uptime of anything. it's just not realistic. there's always some scenario that causes that to be off - if only by 0.0001%
- guruvan (Rob Nelson)
maybe time for a pv solar third-tier backup
- Matt Weeks
I've been with Hurricane Electric 5 years and never experienced significant downtime. Luck? Maybe. Maybe not.
- Jason Nunnelley
The good news is downtime seems an essential element of a social's' success.
- Jason Nunnelley
When this happens I always wonder if someone hit the EPO: http://www.datacenterknowledge.com/archive...
- Phil Glockner
Brian, the SVcolo page you linked to is funny. love the part where they "assure 100% reliability" for the electrical power, with a "100% facilities uptime SLA." guaranteed to never fail or you get your next pizza free? guessing their hull is also 100% impervious to icebergs. +1 Phil. though day of week is also suspect. am superstitious about planned maintenance windows on Friday afternoons -- like standing in open field with a lightning rod during a thunderstorm. ;-)
- Karim
matthew, i thought the Tier 1 data centers sucked? :-D with diversity, it sounds like you are in a Tier 4...
- Karim
I think it is bound to happen with anywebsite specially when it is growing faster than expected.
- Ashish
ashish: it's bound to happen to just about any website. until a site is the size of google, yahoo etc, it;s not only possible, but probable that it will happen. -a definitely nice link there Phil :-)
- guruvan (Rob Nelson)
I think some people got frustrated as if they paid to use Friendfeed. I wonder what would happen if websites like this eventually start to charge the users or would they be ad supported. I think the newness of the social networking will eventually fade as that of email.
- Ashish
In a sense, a lot of us do pay to use friendfeed. We pay with our time, we pay with our words and we pay with our content.
- Scoble, Alex Scoble
No customers = no job, Alex?
- CAJ was here
The data center that I audited for 3 years switches to generator power on a weekly basis to test that it's working.
- Scoble, Alex Scoble
Unfortunately, we only host online banking sites.
- Scoble, Alex Scoble
lol Alex, I was going to mention, the centers that house banks come as close to never going down as possible. it's not unheard of, but it sure is the end of someone's world if they do go down
- guruvan (Rob Nelson)
you move gradually towards Livejournal path - they had same once big, and even became annual tradition... yes, power losses became regular once-a-year thing, no matter what datacenter they were to use. And then they sold itself to Russian SUP :D
- piikummitus
welcome back :p
- healingkid