Do not use alwayswebhosting

H

harrylands

I'll just set this up by saying a major server with over 1000 domains
went down on July 23, 2005, and the customers did not learn that nearly
all of their data was lost until August 6.

I'm posting this so that anyone looking for a domain host will have
some idea of the types of things that can go wrong, and can take steps
to minimize their exposure to evasive and deceitful web hosts in the
future. I also hope anyone involved in hosting will read this to get
an idea of how not to treat your customers.

Following is a reproduction of a public thread created by the founder
and owner of alwayswebhosting. It speaks for itself.

http://forums.alwayswebhosting.com/showthread.php?p=8127#post8127

07-25-2005, 11:44 AM
#1 Administrator
Hummer "Server farm" crash
..
THE NEWEST UPDATES CAN NOW BE FOUND AT THE VERY END OF THIS THREAD

-- ORIGINAL ANNOUNCEMENT --

Dear Clients,

At approximately 10:00 PM CST on July 23, 2005 the Hummer server
("server farm") experienced a serious failure with one of the disk
drives in the RAID array. We immediately began troubleshooting
mechanisms and procedures to rectify the problem and replace the drive,
when it became apparent that in fact 3 of the 9 drives in the RAID
array had simultaneously become bad and were no longer accessible. With
the RAID level 5 system we have in place, the system can operate
successfully with up to 2 failed drives in place (using the hot spares,
etc.) but when 3 drives go bad the RAID array goes offline.

In our nearly 10 year's experience with servers and high-availability
clusters, we have NEVER seen multiple drives in an array fail so
abruptly and at the same time. Without sufficient time and
warning/notice from the system, we were unable to swap out
replacements. Our on-site technicians even attempted swapping out
several device components, including the RAID card (twice actually) the
additional diagnostics performed with the alternate backup RAID cards
in place do confirm that we have multiple bad drives in the system.

At this time, our drives are in transit to a industry-renowned
data-recovery form just minutes away from the data center in order to
receive their assistance recovering the data and integrity of the array
by accessing the uncooperative drives and 'parity' data used by the
RAID system off the drives so we can bring the array back online as
soon as possible.

We sincerely apologize for the downtime our valued clients have
experienced, and we want you all to know we are calling all favors and
resources available to repair the system and get the web sites of our
clientele back to normal working order. We expect a drive recovery
report by the end of the business day today with details as to what
recovery options are available to us and what turnaround we can expect
on our project. We will priority-upgrade the repair to as soon as
possible at ANY cost necessary to expedite the recovery and repair and
our restoration back to normal service. As we anxiously await followup
on the drive repairs, we will notify you as soon as we have additional
news.

We also want our clients to know that we are deeply and sincerely sorry
for this disruption in service, we know your web sites are infinitely
important to you. We know that you do not expect and will not tolerate
anything less than the superior uptime that we have a long and
consistent track record of providing to you. This service failure is by
far an anomaly and we will make every necessary step to ensure our
clients are compensated and fairly credited for the downtime, and that
they can trust us with their sites in the future.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#2
!Update!
(as of 2:30 PM Pacific Standard Time, 07/26/05)

The initial analysis stages of the mission critical array in question
have gone very well, and the vendor that has been provisioned for the
recovery processes & extraction is quite confident that the data from
the RAID array will be able to be thoroughly imaged and duplicated to a
stable set of drives that will be used for our final recovery steps and
re-introduction into the server cluster, when ready.

Now, these initial stages are within a few short hours of being
completed, and at that point we will have precise percentages and a
thorough analysis of the salvageable data at our fingertips. Again,
considering what the technicians have experienced thusfar, a near (or
complete) recovery of user content should be very likely, they're
calling it 70 to 80% as far as a probability of success at this time.

Obviously that raises endless questions about the interpretation of
that particular language, and we will be able to report more precise
information about what *IS* recoverable once we have the final analysis
report from the technicians in question. Overall, the situation is
apparently quite favorable for a "thorough recovery" -- and as we press
the recovery vendor for additional detail, hopefully we'll be able to
restate that as a "complete (99.982737%) recovery." We will keep you
posted!

More precise timeframes on the projected "final" stages of the
recovery, and 'go-live' of the previous content, should be possible
later on this evening once we have the final initial analysis report on
the array.

Thank you again for your patience, we apologize again for this rough
stretch! We are doing absolutely everything we can to push these
required phases forward as fast as possible. More to come when the
information is available to us.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#3
!Update!
(as of 8:00 PM Pacific Standard Time, 07/26/05)

First bumps in the road, but no show-stoppers: the recovery techs
report that the final imaging/filesystem mapping procedures have slowed
down considerably on two members of the array, but have nearly
completed each image, regardless of the apparent damage within. That's
a good thing, as it means that it is very likely the damaged members
are able to be fully probed and are functional to some extent, though
it's going to push the final analysis results well into the
morning/midday hours tomorrow. At that point we'll be able to set out a
firm timeline for you and describe precisely what will be salvageable.

And at this point we have been given nothing but reassurances that a
complete (or very near) recovery should be considered likely. We'll let
you know as soon as we know, thank you once again for your tremendous
patience during these delicate, mission critical recovery procedures.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#4
!Update!
(as of 5:30 PM Pacific Standard Time, 07/27/05)

OK, we were hoping to have the analysis and 'going-forward' steps to
final recovery to you a few hours ago, but the recovery techs had some
issues with filesystem mapping on the final drive in the bunch.
Fortunately, the good news there is that with that drive being
discarded from the 'usable' recovery core members, that leaves us with
9 of 10 RAID members 'intact' and apparently quite viable for the
upcoming drive cloning proceedings. That's very good news, and we're
told that a particular member of their recovery staff will be staying
late to make sure we get the final analysis data in our hands before
the evening is over.

So, we'll be sharing that with you, along with our updated timelines to
a complete recovery, sometime in the next few hours. Hopefully we'll
talk again very soon! See you then.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#5
!Update!
(as of 9:30 PM Pacific Standard Time, 07/27/05)

*Recovery* procedures are now underway as the final analysis proved
quite successful- considering the complexity involved with the
filesystem restructuring, the technicians in question have projected a
72 to 96 hour window for data 'assembly' and 'cloning' based on the
filesystem mapping procedures that have been running non-stop since the
week began.

We'll elaborate further on this after discussions continue in the
morning, when additional engineers will be available at the recovery
facility for brainstorming- and we'll be sure to deliver that
information to you as quickly as we can.

Your patience is once again truly appreciated! Hopefully we'll have the
data in our hands so we can finish off piecing this cluster solution
back together again during the weekend or first thing next week. We
will be in touch as often and as soon as possible.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#6
!Update!
(as of 7:00 PM Pacific Standard Time, 07/28/05)

The recovery team is in the early-to-intermediate stages of the
restoration now, attempting to find the proper d-striping sequence
(which is essentially a rebuild of the parity portion of the RAID area,
and this sequence is a finite pattern, so the end is within sight, to
some extent.) We may have the first (though preliminary) list of
precise recoverable structure by late tomorrow (Friday), though more
than likely early on Monday, as they begin cloning procedures to disks
that we'll be using to inject the recovered data back into the HUMMER
clustered environment.

At this rate it is optimistic to set Monday evening as the earliest
point where we'll have the recovered data in our hands, while
midday-to-evening on Tuesday would be the more reasonable (instead of
optimistic) way of looking at it.

We're nearing the midway point! We should know much more about the
final recovery percentages (in terms of 'how much' and 'what's
missing', if anything), potentially within 24 to 36 hours.

We'll keep you posted. What a nasty, nasty week. We are doing our best
to provide information whenever it is available, and once again, truly
appreciate your patience in these matters.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#7
!Update!
(as of 9:00 AM Pacific Standard Time, 08/01/05)

As of now, the data recovery techs have Tuesday 08/02/05 6:00 PM as
their target restore date... IF they keep to their word and can hit
that time -- please be assured I will do everything I can to hold their
feet to the fire on this one -- we'll should have the data back online
tomorrow night. Our on-site technicians are standing by and fully
deployed to make sure we can have eveything back up as soon as possible
(hopefully within an hour or two) after receiving the drive of
recovered data.

We know this downtime has been excessive, and we feel terrible. We will
make whatever concessions and compensation necessary (i.e. refunds or
free months of service, or a percentage discount on your hosting rate
for 12 months, etc.) to show our clients that we value their business
and for sticking by us through this issue.

As well, for clients who would like us to set up your site temporarily
on another server (server5) so you can be back online by reuploading
their files and setting up their email boxes etc. while we wait for the
drive recovery, just let me know and we can surely do so asap! Just
send this request to: (e-mail address removed) and we'll get right
on it.

We're standing by to assist in any way possible, and will post here
with any news as soon as we have it!
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#8
!Update!
(as of 4:35 PM Pacific Standard Time, 08/02/05)

As we close in on the initial 'target recovery date' provided by the
data recovery firm we're using, we've been told that their lab systems
are still generating a full list of all the files on the system.
Apparently, its impossible to know exactly how long this process will
take as there is no percentage completion along the way, just the file
list output scrolling until its done... (They say its like running a
marathon without mile markers...) They stated that this process has
been underway for half a day already, but with over half a terabyte (we
have nearly 500 GB of data) on the RAID array, it is unknown how much
longer it'll exactly be. They hope it'll be within the next few hours
for the master list generation to complete, and their techs will
continue to monitor the output and *immediately* forward to us the list
of all files on the array for our review.

Furthermore, due to some bad sectors/internal failure on a specific
drive (drive 0), they will also be providing a list of 'damaged files'
hopefully early tomorrow morning. This list will detail the files that
are damaged or fall on bad sectors or the most damaged of the disks, or
are otherwise not initially recoverable. Of course, if the files on the
'damaged list' are non-critical (i.e. system library files, or system
logs) we will continue with the restoration of the master list minus
the damaged list. So, "if the lists look good it will take the 'better
part of a day' to copy the data across" -- which they further estimate
to be in the neighborhood of 7 or 8 hours to copy the data across,
regardless of how many files we are recovering. And with that, we
should have a brand new 400 GB hard drive with the "restored data ready
for pickup by the end of the day tomorrow."

If there are any critical files (email, customer files, etc.) on the
damaged list, we will still pursue the 'easier' recovery data of the
master list of recoverable files to get it back as soon as possible,
but will also have the lab try other additional options available to
them, that I'm told include "attempting to splice old data back in
place, and other hard disk operations to attempt to recover files
further from the damaged list, which will take more time, in the 12-24
hour time window" but *ONLY* if we deem the files on the damaged list
to be critical.

In summary, we want our clients to know that we are doing everything
possible to get this issue behind us as soon as possible and get
everyone back up to normal operations. We'll waiting for the master
list (hopefully tonight or early in the morning) and the damaged list
as well. If all goes as planned with no hiccups, we should have the
recovery drive tomorrow night.

Thank you for your continued patience, we all know it has been an
extremely difficult time for all affected.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH
#9
!Update!
(as of 10:25 AM Pacific Standard Time, 08/03/05)

It seems like hummer's recovery 'will be done when its done', the data
recovery lab is doing everything they can, but simply are forced to
wait as their systems continue to process and output the master list of
files before the final step of pulling the data off the drives can be
completed.

I cannot be unfair continue to make deadlines to our clients (that are
given to us from the recovery lab) and then have them slip by. So,
instead, we just wish to reassure customers that everything that can be
done is being done to finish processing the damaged drives. Not even
the lab can fully estimate when the currently running process of
generating the master file list (and the subsequent data copy to the
new drives) will take. So, everyone sit tight as much as possible.

Again, if you would like your site re-setup brand new on server5, just
email (e-mail address removed) requesting it. We'll re-setup your
site on the server5 and send you the login info, and then you can
re-setup your email boxes, reupload any files that you currently have
handy, and so forth. Of course, sites that are database (MySQL) powered
and are requesting this move, just let us know that you have databases
to bring over when you submit the request (and give their database
names if you have them handy) and we'll move that over as well for you.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#10
!Update!
(as of 3:45 PM Pacific Standard Time, 08/04/05)

GREAT NEWS! The imaging, rebuilding, and repairing processes have been
fully completed on the RAID array at approximately 11:00 AM PST. The
lab technicians then began preparing and verifiying the data for
extraction, which has also been completed.

The copy process is running, and the data is successfully being copied
to a new hard drive for us!

It looks like the time frame for drive pickup will be sometime
tomorrow, as as it's fluctuating speed-wise during the process. As soon
as the hard drive copy of the customer data to the recovery drive is
completed, we'll immediately have our on-site technicians (that are
standing by) pick up the drive (its only a 5 min drive from the data
center to the data recovery lab) and bring it back for installation. If
everything goes properly, it should take us a few hours to get all the
data copied back to the new system and get all the sites affected by
the crash back online!

We should all be back online tomorrow!
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#11
Update!
(as of 7:05 PM Pacific Standard Time, 08/05/05)

Hi Everyone, I've been on the phone all day pushing the data recovery
lab for a time window for the completion of the copy process and the
availability of our recovery drive for pickup and reinstallation to get
the server back on its feet again. I've been told there have been a
large amount of speed fluctuations experienced during the copying
process -- apparently from the 'destriping' of the data across the
multiple drives as it rebuilds the files. Anyways, the copy process
(initially quoted at 7-8 hours by the lab *yesterday* !) is now looking
like it'll take 8-10 more hours (even after its had over 24 hours to
copy already) and will be completed around 8:00 AM PST. The lab
technicians are continuing to monitor the copy process and ensure it is
making progress the entire time. Once the copy completes, we're told
they need approximately 30 minutes for a data verification, and then it
will be driven to our data center and dropped off for reinstall.

So, realistically, with the copy finishing in the early A.M. hours and
the verification and drop off going as planned, we should have the
drive back in our hot little hands by around 11:00 AM PST, and we'll
need about 120 minutes to restore the files to the sites and get
everything tweaked and up and running again. This *should* tentatively
bring our back online date to around 1:00 PM PST on 8/6/2005.

Of course, I can only continue to thank you all for your patience and
understanding during this process. No one at our company, at the data
center, OR the data recovery lab had any idea the recovery steps would
take anywhere as long as they did, so we were really at the mercy of
the recovery system. We will absolutely, positively make this up to our
clients, with their choice of a refund, credit for future hosting
periods, or price breaks. We're still trying to hammer out the exact
details, which we will discuss with you all after the system is back
online (so we have one less thing to worry about! )

Thank you all for everything. Your loyalty and support and especially
the kind little notes of encouragement and support really mean the
world to us, thank you so much!

We will continue to post updates here as they develop.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#12
Update!
(as of 12:30 PM Pacific Standard Time, 08/06/05)

The backup drive has arrived (and looks GREAT!) We will be spending the
next few hours copying the backups back into the client sites and
tweaking the cpanel install, and basically doing what we need to in
order to get all the sites (webmail, email, cpanel, etc.) all back up
again! Thank you for your patience, we'll be fully online shortly!!
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH

#13
Update!
(as of 4:15 PM Pacific Standard Time, 08/06/05)

We have completed reintegrating the recovery drive back into the Hummer
system, and the sites should be online again with the files that we
have available after the recovery. As of now, it looks like the data
recovery was not as fully successful as quoted by the lab. While we
were able to recover over 50 GB of files, severe damage to one of the
drives in the RAID system left a good number of files severely damaged
and unrecoverable.

What is available in each user's /home/username/ is what we have at
this time. (If your cpanel password does not work, please let us know
and we'll reset it so you can login)

At this time, we will continue to move sites to server5 (just email
(e-mail address removed) requesting a re-creation on server5 and
we'll create the site there and re-send you your welcome email). Of
course, we have ALL of the MySQL databases for all sites on Hummer, as
that was on a separate, standalone database server. We are glad to help
move their databases over, etc, to help clients get resetup.
__________________
Always Yours,

Ronnie T. Moore
Founder/Owner, AWH
 
S

Steve Easton

1. That is why you "always" have a backup copy of your web.

2. That should be a little warning on the proper set up of RAID arrays.

If set up improperly, when drive one becomes corrupt it can automatically corrupt it's backup drive,
and so on, and then the server becomes a house of cards just waiting to crash.


--
Steve Easton
Microsoft MVP FrontPage
95isalive
This site is best viewed............
........................with a computer
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top