ISP Monitor Server Crash

Author: WhiteDog - Posted on: 2008-06-14 19:57 - Source

We suffered a fatal crash on our server a few days ago. This has resulted in some data loss I would like to inform you about. Please read ahead for more information.

Some background info
Our site is hosted on a dedicated server we share with a few other people. We have what one calls a "Reseller" account which gives us near to full control on the server except root access.

Timeline of events
As we believe in open communications towards our customers and users, here is is the storyline as it unfolds.

12/06/2008 11:00 - Our website goes blank.
12/06/2008 14:00 - Our host informs us that there is a problem with the RAID array on our server and needs a rebuild. Site should be back up around 15:30.
12/06/2008 15:30 - Site still down, no news from our host.
13/06/2008 09:00 - Site still down as we enter Friday the 13th. We contact our host who informs us the server is being reinstalled and a backup will be restored.
13/06/2008 15:00 - Sites are comming back online slowly. We quickly notice the files being placed back are over one month old.
13/06/2008 15:30 - We contact our host who is "surprised" and claims to investigate the issue.
14/06/2008 08:30 - We finally receive some information from our host, stating what we already knew and some more bullshit.

The Good
We have daily backups of our database so we only lost about 6 hours of speed test results. I am very happy that at least our own failsaves turn out to work.

The Bad
As we mainly work on the project online, only periodic backups are generated. We did not have a recent backup of the files but not much was changed the last month so everything should be back as it were in a few days. A recent backup was however on the server but was not downloaded offline.

The Ugly
I started a personal blog about a month ago that has now been completely wiped out. Luckily Google Cache still holds a copy and I will reconstruct the blog in the upcoming days. We also lost about a month of forum posts (and everyone registred so far). Some other websites on the server were less fortuonate however.

Mistakes the host made
- We pay for 99,9% uptime and daily backups. These criteria have clearly not been met.
- It is their task to generate backups. Blaming it on the server software (cPanel in our case) is cheap.
- When your disks crash you don't restore backups on the same disks. You replace them so you can at least try to recover data from the disks afterwards.
- We received little to no information and as i'm writing this there are still open tickets 10 hours after I submitted them. This is unacceptable.
- The server was reinstalled with a different Apache configuration and MySQL version. This results in parts of the site no longer working. We did manage to fix all issues encountered so far.

Needless to say, we are consulting our options and putting our claims together. If you are reading this and would like to offer legal advice, feel free to contact us.

The Future
We are consulting our options and will make sure this never happens again. We however do not want to make hasty decisions. More news on that as we make some decisions.


