This tales starts on a bright Sunday morning. A day that the wife and I had planned on visiting our local cardroom and enjoying the great game of poker.
My first indication that our plans were soon to turn to crap occurred as soon as I began checking emails in the morning (usually the first thing I do before showering). There it was staring at me, the unthinkable; Bill’s Poker Blog was down.
I’ve been in the tech game since honeys were wearing Sassoons . . . Okay, maybe not that long but I’ve been running websites for about 20 years now. I’ve seen my fair share of technical glitches so I wasn’t ready to beat up the dog just yet (thankfully for my dog).
I tried logging into my server but it wouldn’t respond. So I logged into 1&1′s backend and rebooted the server. I fired up SSH again and voila I’m now king of my hardware again.
Bill’s Poker Blog was purring along just fine so I started to relax a bit thinking that some random glitch must have knocked some service offline and rebooting it corrected the problem. If it happened again I, of course, would do some more sleuthing but a one-time incident isn’t something to worry too much about. Freakish stuff happens sometimes.
But something, maybe even a higher power, compelled me to run one last command line on the server before I went to shower.
Hmmmm . . . well, a Seg Fault and I/O Error tend to point to problems with the hard drives. I know because I’ve lost several servers over the years and the Seg Fault error is a big red flag that you have anywhere from a few hours to a few days before the hard drive fails completely forever destroying your digital bits and pieces.
I sent an email support ticket to 1&1 telling them what happened along with some diagnostic information and my conclusion that the drive was about to head off to the great big server farm in the sky.
I showered, ate, played around with the puppy for awhile and still no response from 1&1. Hmmmm.
No problem. I called 1&1′s tech support and spoke with a friendly chap who agreed that Seg Faults are almost 100% an indication of a failing drive and he would have his technicians throw another hard drive in the machine asap.
Like I said earlier, I’ve been doing this for so long that I know how this works. I’m not the harassing or threatening customer. I know they have tons of other customers and everyone thinks their problem is the most important and-or that bullying and bluffing will get them what they want. It’s Sunday, I’m not too worried if the site is down a few hours.
But I did inquire as to how long he thought the turnaround time would be. He said about an hour and that he would call or email me when the new disk was ready. Fair enough.
I nearly dislocated my shoulder patting myself on the back for ordering a server with RAID mirroring. For those you unfamiliar with RAID, basically the server has two hard drives and one hard drive acts as an exact mirror of the other. If one of the drives fails you can throw in a new drive and all of the data from the good drive will copy over to the mirror drive and you never have to worry about losing data.
The wife and I scrapped the casino plans in light of the day’s events because I wanted to stay close . . . just in case.
We went and did some shopping (she shopped, I groaned) and went to go see 21 Jump Street.
Roughly 7 hours later I returned home somewhat puzzled at the lack of an email from 1&1 telling me my server was back up. I checked the site itself and it was still down. Hmmmm.
So I called 1&1 again and a different rep looked up my previous ticket and told me that they had replaced the drive 6 hours ago. Why no email or phone call? He had no insight into that.
So, maybe all she needs is another reboot, right?
Rebooting basically replayed my experience earlier that morning. It rebooted. The site came up with no errors or problems. And about five minutes later I started getting Seg Faults and I/O Errors again.
I call 1&1 support again and explain that this doesn’t seem to have fixed the problem and question whether or not they’re positive that the drive was swapped out. I was assured it was and he suggested I email him the errors in my log files so they could diagnose the problem further.
So, I grabbed a relevant chunk of error messages out of my log files and emailed them to the special 1&1 email address they gave me.
And then I waited. And waited. And waited.
Eventually I went to bed crossing my fingers that when I awoke everything would be right with the world.
Instead when I woke up, nothing had changed.
So I called 1&1 support again and asked what the hell was going on. I was told they never received the log files I sent. I can’t say that they did or didn’t receive the email but I cut and paste the email address from the original email and resent the error logs which they did receive. So I did send it to the correct address, that I’m sure. Whether they ignored it, deleted it, or their own internal systems are so buggy that it ate my email, I know that it was sent correctly.
Of course, now it’s Monday so I can’t just sit around waiting so I went to work and basically tried to put it out of my mind until I could get home.
When I got home I called 1&1 again and this time they told me that based on the error logs they were now thinking it might be the RAID controller and not the hard drive. Fine. Just fix it I told them and went to bed.
When I woke up on Tuesday the site was still down. I still had not received an email telling me that they had resolved the issue. However they did send me an email telling me that they had received my original support request email from Sunday and that the faulty drive had been replaced and my incident was being closed.
So I called 1&1 yet again. My conversation with the support rep and his supervisor basically sums up why you should not use 1&1 hosting, EVAR!!!
Basically, the rep told me that when the pulled the original hard drive they had wiped it clean in order to protect my privacy. Then he told me that that wasn’t the bad drive. The hard drive still in the machine was the bad drive and they had wiped my good backup.
Now I’m pissed. Really pissed. Because now I have to restore the server from backups (which I fortunately also have). That’s hours and hours of tedious configuration, installing libraries, installing software components, etc, etc. It’s a major pain in the ass because all of those changes were made over the course of two years. A little here and a little there. In aggregate it’s a lot of work to duplicate just to get your site up and running again.
It’s just like when you get a new PC and even if you have all of your documents and spreadsheets and photos and music ready to deploy on your brand new PC, there’s still the process of installing and configuring all of your software. You have to install MS Office, Photoshop, TweetDeck, Dropbox, blah, blah, blah. And each one has to be configured like you liked it configured on your old PC.
So I bluntly asked the rep, “So why should I stay with 1&1? If I have to start from complete scratch there’s zero advantage for me to stay with you guys. I might as well go shop my business around with someone else.”
He fumbled through a response but one key phrase said everything, “I’m not going to try to convince you to stay with us because, you’re right, we have not handled this situation very well.”
He then went on to explain about how f*cked up things are in support and that this is just how things are there.
In the end he offered to give me a month of hosting for free but threw it out there sort of like, “If you do decide to stay I can offer you a free month of hosting.”
I didn’t ask him to but he offered to connect me with his supervisor. I spoke with the supervisor and told him that while I feel that every rep I had dealt with acted in a professional and helpful manner, the fact that my site was offline for more than 48 hours and I was now farther away from having my site back up than I was on Sunday when it first went down (because now I have to recover from backups) was simply unacceptable.
To my surprise, he basically mimicked the rep I had just spoken to and told me that 1&1′s processes and procedures were fundamentally broken and that absent spending a lot of money and making a major overhaul, there wasn’t a very good chance of the situation changing anytime soon.
He eventually offered to comp me 3 months of free hosting and said I could use it if I stay with them or use the 3 months to help me in migrating to a new hosting provider.
By the time we were hanging up, I felt sorry for the guy. Actually, I felt sorry for all of the tech support reps at 1&1. They’re trying to fight a three-alarm fire with a squirt gun.
So if you’ve been wondering where Bill’s Poker Blog or any of my other sites have been over the last few days, now you know.
BTW, a good buddy of mine suggested Amazon’s web hosting service. Currently that’s where I’m hosting the site. It’ll be awhile before everything is back 100% but just seeing the homepage on my site brought a smile to my face.
I should also ad that I’m writing this on Wednesday. On Sunday 1&1 claimed that they had replaced both hard drives and the RAID controller in the server. According to 1&1 all I needed to do was re-image one of the new hard drives (why they don’t do it themselves after f*cking it up is still beyond me – I mean, when I leased the server the operating system was already installed so that should be the minimum requirement giving it back to me) and I’m good as new.
Not exactly. I tried re-imaging the hard drive via their control panel and the thing has been stuck on perpetual up and down mode ever since. Something is wrong with the server preventing the imaging from happening (I’m going to guess it’s the same problem that was causing my problems, i.e. they didn’t fix the problem). What makes it even worse is that because it’s not in a stable state 1&1′s recovery tools refuse to interrupt the re-imaging process.
I’ve kept a window up pinging that server for the last few days and I don’t think it’s seen more than an hour uptime. It’s dead. If I opted to stay with 1&1 I would most likely still be trying to get a working server.
Of course, this begs the question of why they didn’t/don’t offer me a new server once they wiped the existing one clean (assuming I was staying with them). I don’t own this server. I don’t care if they haul it out back and shoot it with a shotgun. I have no emotional, financial, or any other sort of interest in the hardware. Why are they so intent on fixing that server instead of resolving my problem (i.e. getting the site back up)?
I also have to question whether or not I’m owed a refund for the last two years I’ve been hosted with them. I paid for RAID in order to prevent exactly this situation. If they completely negate the benefit of having backup copies by deleting the hard drive contents before they properly diagnosed the problem then having the RAID was a completely pointless cost.
Bottom line is that if you were considering hosting at 1&1, don’t. My experience with them was great for the two years that I had this server and the several years I had another server with them but that’s when everything is working perfectly. The true measure of a hosting provider is how well they deal with problems. I don’t care if my server goes down. I care about how quickly they get it back up and how helpful they are in the process. Not only is it obvious from what happened that 1&1 is organizationally not up to the task but their own support staff sound as if they’ve given up.
It’s unfortunate because every rep I dealt with seemed knowledgable, professional, and really did try to help. It’s 1&1′s own internal policies and procedures that resulted in the failures and delays. I mean, if you’re a hosting provider and it takes you 48+ hours to respond to an email there’s something really, really wrong.