RAID5 Volume recovery

D

Doskei

Hi all ~

I'm relatively new to RAID, and I'm hoping to find some advice here
that could pull me out of a sticky spot.
Several months ago, I finally achieved a three-year dream and
completed my RAID server project. I built a 8-drive (500gb each) 3TB
array. It's a RAID5 array on 7 SATA 3.0GB/s HDDs, with a single
designated hot-swap spare. The whole thing is tied together with an
Areca ARC-1220 PCIe controller.
Also, quick note: the OS is not on the array. It is on a separate,
small HDD.
The file server had been working fabulously for several months.
Then two days ago, I had my first hint of trouble: I was getting error
messages when trying to write to a folder that I was pretty sure I had
given myself write access to. Later that day when I decided I would
start to try to troubleshoot the issue, I went to log onto the server
itself and saw the dreaded "delayed write failed" error message
stack. The machine had stopped responding to network commands at this
point, and I could see no way to safely reboot the machine (I didn't
seem to be able to get past the errors to log in locally). So I hard-
rebooted the server.
When it came back up, the controller bios threw up a huge red
warning, telling me that one of my RAID volumes was in a non-normal
state. Went into the BIOS and checked, and it was showing as degraded
in some menus and failed in others (I can post clarifying images later
tonight). Almost all of the menu options throughout the BIOS were non-
functional, reporting that "(whatever I was trying to do) can only be
performed on an (Array / Volume / RAIDSet) in normal state." This was
true for running consistency checks, expanding the volume, etc.
What I was finally was able to glean from the few working menu
options was that six of my former seven drives were still part of the
RAIDSet / Volume / Array, but that drives 1 and 8 were showing as
simply not. Drive 1 should have been part of the package, but wasn't,
and drive 8 should have been designated as a hot spare and wasn't.
So my question is this: does anybody know how I can rebuild this
array? We're talking a 2/3 full 3TB array full of family treasures -
family photos, home movies, projects, etc etc etc. The entire reason
I built this server was so that I could be sure I would never lose any
of this data - it had all been living on a 1.5TB software RAID up
until christmas. Yet that had worked for years (albeit making me
constantly nervous), and now my supposedly redundant data storage
system has apparently committed suicide.
In any case, here's what I know: I have 6 drives out of 7 still
showing as being within the array. That should mean I have a full
copy of data - although it doesn't appear that a drive has died
(rather, that something occurred internally within the controller that
borked the array, quite probably due to my ignorance during setup), it
seems like it SHOULD be the same as losing a drive. Yet none of the
options within the BIOS are of any use to me - it wants me to get the
array into a normal state before it'll let me add a drive to it, but I
can't because it's a 7-drive array living on 6 drives in a "degraded"
state!
Anyway, I'm lost, and somewhat panicked. Any advice? Thanks
worlds in advance,

- Jesse
 
C

Cydrome Leader

In comp.arch.storage Doskei said:
Hi all ~

I'm relatively new to RAID, and I'm hoping to find some advice here
that could pull me out of a sticky spot.
Several months ago, I finally achieved a three-year dream and
completed my RAID server project. I built a 8-drive (500gb each) 3TB
array. It's a RAID5 array on 7 SATA 3.0GB/s HDDs, with a single
designated hot-swap spare. The whole thing is tied together with an
Areca ARC-1220 PCIe controller.
Also, quick note: the OS is not on the array. It is on a separate,
small HDD.
The file server had been working fabulously for several months.
Then two days ago, I had my first hint of trouble: I was getting error
messages when trying to write to a folder that I was pretty sure I had
given myself write access to. Later that day when I decided I would
start to try to troubleshoot the issue, I went to log onto the server
itself and saw the dreaded "delayed write failed" error message
stack. The machine had stopped responding to network commands at this
point, and I could see no way to safely reboot the machine (I didn't
seem to be able to get past the errors to log in locally). So I hard-
rebooted the server.
When it came back up, the controller bios threw up a huge red
warning, telling me that one of my RAID volumes was in a non-normal
state. Went into the BIOS and checked, and it was showing as degraded
in some menus and failed in others (I can post clarifying images later
tonight). Almost all of the menu options throughout the BIOS were non-
functional, reporting that "(whatever I was trying to do) can only be
performed on an (Array / Volume / RAIDSet) in normal state." This was
true for running consistency checks, expanding the volume, etc.
What I was finally was able to glean from the few working menu
options was that six of my former seven drives were still part of the
RAIDSet / Volume / Array, but that drives 1 and 8 were showing as
simply not. Drive 1 should have been part of the package, but wasn't,
and drive 8 should have been designated as a hot spare and wasn't.
So my question is this: does anybody know how I can rebuild this
array? We're talking a 2/3 full 3TB array full of family treasures -
family photos, home movies, projects, etc etc etc. The entire reason
I built this server was so that I could be sure I would never lose any
of this data - it had all been living on a 1.5TB software RAID up
until christmas. Yet that had worked for years (albeit making me
constantly nervous), and now my supposedly redundant data storage
system has apparently committed suicide.
In any case, here's what I know: I have 6 drives out of 7 still
showing as being within the array. That should mean I have a full
copy of data - although it doesn't appear that a drive has died
(rather, that something occurred internally within the controller that
borked the array, quite probably due to my ignorance during setup), it
seems like it SHOULD be the same as losing a drive. Yet none of the
options within the BIOS are of any use to me - it wants me to get the
array into a normal state before it'll let me add a drive to it, but I
can't because it's a 7-drive array living on 6 drives in a "degraded"
state!
Anyway, I'm lost, and somewhat panicked. Any advice? Thanks
worlds in advance,

- Jesse

two suggestions.

1) don't mess with the array anymore. nothing will magically fix the state
it's in. The more you fiddle, the more you may break.

2) RAIDreconstructor from runtime software may be your best bet to recover
the data as well as not get scammed by a datarecovery company.
 
M

Michael Hawes

Cydrome Leader said:
two suggestions.

1) don't mess with the array anymore. nothing will magically fix the state
it's in. The more you fiddle, the more you may break.

2) RAIDreconstructor from runtime software may be your best bet to recover
the data as well as not get scammed by a datarecovery company.
If those were the only errors it would be fully functioning but
degraded. When first drive failed, hot spare would be put in array and would
start to rebuild. Hot spare may have failed during rebuild! You may need
proffessional help.

Mike
 
P

Paul

Doskei said:
Hi all ~

I'm relatively new to RAID, and I'm hoping to find some advice here
that could pull me out of a sticky spot.
Several months ago, I finally achieved a three-year dream and
completed my RAID server project. I built a 8-drive (500gb each) 3TB
array. It's a RAID5 array on 7 SATA 3.0GB/s HDDs, with a single
designated hot-swap spare. The whole thing is tied together with an
Areca ARC-1220 PCIe controller.
Also, quick note: the OS is not on the array. It is on a separate,
small HDD.
The file server had been working fabulously for several months.
Then two days ago, I had my first hint of trouble: I was getting error
messages when trying to write to a folder that I was pretty sure I had
given myself write access to. Later that day when I decided I would
start to try to troubleshoot the issue, I went to log onto the server
itself and saw the dreaded "delayed write failed" error message
stack. The machine had stopped responding to network commands at this
point, and I could see no way to safely reboot the machine (I didn't
seem to be able to get past the errors to log in locally). So I hard-
rebooted the server.
When it came back up, the controller bios threw up a huge red
warning, telling me that one of my RAID volumes was in a non-normal
state. Went into the BIOS and checked, and it was showing as degraded
in some menus and failed in others (I can post clarifying images later
tonight). Almost all of the menu options throughout the BIOS were non-
functional, reporting that "(whatever I was trying to do) can only be
performed on an (Array / Volume / RAIDSet) in normal state." This was
true for running consistency checks, expanding the volume, etc.
What I was finally was able to glean from the few working menu
options was that six of my former seven drives were still part of the
RAIDSet / Volume / Array, but that drives 1 and 8 were showing as
simply not. Drive 1 should have been part of the package, but wasn't,
and drive 8 should have been designated as a hot spare and wasn't.
So my question is this: does anybody know how I can rebuild this
array? We're talking a 2/3 full 3TB array full of family treasures -
family photos, home movies, projects, etc etc etc. The entire reason
I built this server was so that I could be sure I would never lose any
of this data - it had all been living on a 1.5TB software RAID up
until christmas. Yet that had worked for years (albeit making me
constantly nervous), and now my supposedly redundant data storage
system has apparently committed suicide.
In any case, here's what I know: I have 6 drives out of 7 still
showing as being within the array. That should mean I have a full
copy of data - although it doesn't appear that a drive has died
(rather, that something occurred internally within the controller that
borked the array, quite probably due to my ignorance during setup), it
seems like it SHOULD be the same as losing a drive. Yet none of the
options within the BIOS are of any use to me - it wants me to get the
array into a normal state before it'll let me add a drive to it, but I
can't because it's a 7-drive array living on 6 drives in a "degraded"
state!
Anyway, I'm lost, and somewhat panicked. Any advice? Thanks
worlds in advance,

- Jesse

Isn't there a 2TB limit to array size on some OSes ?

The fact it broke, at 2/3rds of 3TB, suggests you've hit
that limit.

See page 50 of the manual here. Choices are "No", "64 bit LBA"
for certain OSes, or "4KB sector" trick to work with 32 bit
Windows. If you didn't set this up properly, that may account
for your "accident".

http://www.areca.us/support/downloa...Manual_Spec/ARC-1xx0_1xx0ML_12x1ML_Manual.zip

"For more details please download PDF file from
ftp://ftp.areca.com.tw/RaidCards/Documents/Manual_Spec/Over2TB_050721.zip "

Cleanup is obviously going to be a mess. You need somewhere for the
RAID5 recovery software you purchase, to put the recovered data. I
expect the data is intact, and all that is needed, is for some
recovery software to understand the interleave pattern.

Normally, "metadata" A.K.A. "reserved sector" on each disk, marks
the identify of the disk within the array. It is intended to support
moving the drives, to different cables on the same controller, and
still allow the controller to figure out the interleave and read
data correctly. Your recovery software is going to have to attempt
to do this, somehow. Maybe if the metadata is still visible for
each disk, in the Areca BIOS screen, you can figure out which drive
is which.

If you purchase another controller, and connect some number of
drives to it, follow the instructions for >2TB support, so that
your recovery volume is big enough for the job. This is one reason
why you may end up purchasing another Areca, because the >2TB feature
is not available on all controllers. For example, the RAID controller
on a motherboard Southbridge, probably has no provision for supporting
arrays larger than 2TB, on some Windows 32 bit OS.

Contact Areca technical support, and see if they have any suggestions.
I don't think talking to them will make the job any easier, but they
may have some suggestions for software that has corrected this
problem before.

Just a guess,
Paul
 
D

Doskei

Thanks to everyone for the suggestions. I'm going to proceed slowly,
carefully.
I suspect Paul is right, this may be related to the 2TB limit. I am
running Windows Server 2003 Standard (R2), and IT has an option from
within the disk manager, for converting a volume to LBA. I did so
before putting any data on the array, but I don't remember the menu
that pg50 displays (and thus probably didn't use it). I think I
remember NOT using the Quick Setup, and that may have been my
downfall, although why they wouldn't prompt you for >2TB when you
assemble the volume manually is beyond me.

Unfortunately, if I do end up having to find a place for 2TB of data
while I fix my array, I'm going to be pretty much boned - getting this
thing together was a project that took six months of saving, and I'm
not sure even this data is worth buying it all again. Hopefully Areca
will have some advice that doesn't involve a second server, although
to be honest one of the only downsides to purchasing this card (which
I knew going into it) was that their support is pretty abysmal. So
we'll see.

In any case, thanks very much to all of you for your advice. I am
going to a) contact Areca, then b) try installing RAIDreconstructor
and see what it has to say. I may be able to find a (very temporary)
place for 2TB of data if that's my only hope ... I'll be back with
results eventually, but really, thanks again for the help.

- Jesse
 
P

Paul

Doskei said:
Thanks to everyone for the suggestions. I'm going to proceed slowly,
carefully.
I suspect Paul is right, this may be related to the 2TB limit. I am
running Windows Server 2003 Standard (R2), and IT has an option from
within the disk manager, for converting a volume to LBA. I did so
before putting any data on the array, but I don't remember the menu
that pg50 displays (and thus probably didn't use it). I think I
remember NOT using the Quick Setup, and that may have been my
downfall, although why they wouldn't prompt you for >2TB when you
assemble the volume manually is beyond me.

Unfortunately, if I do end up having to find a place for 2TB of data
while I fix my array, I'm going to be pretty much boned - getting this
thing together was a project that took six months of saving, and I'm
not sure even this data is worth buying it all again. Hopefully Areca
will have some advice that doesn't involve a second server, although
to be honest one of the only downsides to purchasing this card (which
I knew going into it) was that their support is pretty abysmal. So
we'll see.

In any case, thanks very much to all of you for your advice. I am
going to a) contact Areca, then b) try installing RAIDreconstructor
and see what it has to say. I may be able to find a (very temporary)
place for 2TB of data if that's my only hope ... I'll be back with
results eventually, but really, thanks again for the help.

- Jesse

The cheapest solution I can think of, is to purchase a couple 1TB SATA
drives, and connect them to the motherboard connectors. And cook up
a solution that way. Depending on how many motherboard connectors
are available, you could use slightly smaller drives, and use more
of them. But first, you're going to need to determine what tool can
be used, to get the data off the RAID5.

I guess my concern is, whether the recovery software will create
less than 2TB of output (i.e. lost a few files), or create more
than 2TB of output. I've used data recovery tools before, that
create a zillion garbage fragments, which means there is a risk the
volume you put the output on, could be overwhelmed. As long as
the drives are basically physically sound though, you should have
plenty of time to try stuff. You're in better shape, than someone
who has physical damage to one or more drives. As long as the
software doesn't try to modify the source array, you should be OK.

I bet one result of this exercise, is a rethink about RAID. At
work, we have had two RAID5 failures, in the middle of the day.
A hardware RAID5 controller had a firmware bug, which wrote
zeros down near the origin of all the disks in the array. The RAID5
needed to be restored from tape, while hundreds of designers sat on
their hands and could do nothing. (The server contained software
and the database the software used.) Needless to say, they
had a rethink, and the result was a triplicated structure,
with failover. No more (visible) problems after that.

There is a second scenario you should think about. What would
happen, if your ATX power supply were to suddenly put out +15V
on the +12V rail ? All the motors on the hard drives would be
burned. All the drives would be lost simultaneously. Even
without considering the possibility of a hardware RAID
firmware bug, there are other plausible failure modes.
At least one other poster commented, that he did indeed
lose everything, due to a power supply failure. So it
can happen.

Two arrays, one a backup array, and two computers, reduces
the risk a bit. Disconnect the backup array, from AC, when
not in use. Or use trays, and store the backup drives in
a safe place. Be creative.

Paul
 
C

Cydrome Leader

In comp.arch.storage Doskei said:
Thanks to everyone for the suggestions. I'm going to proceed slowly,
carefully.
I suspect Paul is right, this may be related to the 2TB limit. I am
running Windows Server 2003 Standard (R2), and IT has an option from
within the disk manager, for converting a volume to LBA. I did so
before putting any data on the array, but I don't remember the menu
that pg50 displays (and thus probably didn't use it). I think I
remember NOT using the Quick Setup, and that may have been my
downfall, although why they wouldn't prompt you for >2TB when you
assemble the volume manually is beyond me.

Unfortunately, if I do end up having to find a place for 2TB of data
while I fix my array, I'm going to be pretty much boned - getting this
thing together was a project that took six months of saving, and I'm
not sure even this data is worth buying it all again. Hopefully Areca
will have some advice that doesn't involve a second server, although
to be honest one of the only downsides to purchasing this card (which
I knew going into it) was that their support is pretty abysmal. So
we'll see.

My suggestion is to not connect those drives back to that controller.

That last thing you want is a raid card that may or may not have flaked
out connected back to your drives.

It could do something stupid like decide to rebuild a new raid group. You
really don't know. It's not clear why your setup even failed in the first
place.

To play it safe you'd image those disks to new media and work from there.
You won't be able to back out of any changes if you don't back the disks
up first.

Again, if your files are important, the less you mess around the better
off you are until you have a solid recovery plan. The data won't self
destruct if you wait vs. grasping at straws and trying all sorts of
things at once.

Like you said, it sounds like the data should be ok, as you didn't have a
true double disk failure. The other plus is even with some NTFS
corruption, it's very easy to recover NTFS with the right tools.

finding a way to image those drives should be your first step.
 
M

Maxim S. Shatskih

Very sad story.

For me, this is a typical horror stories of "never trust onboard RAIDs or
separate RAID boxes unless by VERY serious vendors".

For instance, for me, Microsoft/Veritas and the Linux LVM authors are more
trustworthy then Areca, HighPoint or such. Compare not only the company
reputation, but the amount of public data structures knowlegde, which can help
recovery a lot.

That's why I was always considering software RAIDs like Windows Dynamic
Disk to be better then the inexpensive (additional chip on the motherboard)
hardware ones.

And, surely, RAID 5 cannot replace backups, just by definition. I have
doubts in any value of RAID 5 in SOHO environment, given that it cannot replace
backups and is slow on writes. RAID 5 is, I think, good only for
high-availability 24/7 installations, where it can provide some protection
window between backups.

Unfortunately, this is probably too late for you.
 
C

Cydrome Leader

In comp.arch.storage Maxim S. Shatskih said:
Very sad story.

For me, this is a typical horror stories of "never trust onboard RAIDs or
separate RAID boxes unless by VERY serious vendors".

There's only a few I'll trust my data to.

Adaptec for anything homebrew, HP, and for standalone controllers (this is
a bit dated) Raidtec, Infortrend and Chapparal.
For instance, for me, Microsoft/Veritas and the Linux LVM authors are more
trustworthy then Areca, HighPoint or such. Compare not only the company
reputation, but the amount of public data structures knowlegde, which can help
recovery a lot.

the windows mirroring is pretty good. Veritas seems nearly indestructible,
even when it come to operator error.
That's why I was always considering software RAIDs like Windows Dynamic
Disk to be better then the inexpensive (additional chip on the motherboard)
hardware ones.
Unfortunately, this is probably too late for you.

I'm pretty sure he can still get his data off those disks.
 
M

Maxim S. Shatskih

the windows mirroring is pretty good. Veritas seems nearly indestructible,
even when it come to operator error.

Windows Dynamic Disk is the licensed Veritas's VxVM aka Storage Foundation
(simplified version).
I'm pretty sure he can still get his data off those disks.

I hope so. After all, somebody have reverse-engineered the Areca's layout for
sure, after which the data will be restorable by "dd"+tiny Perl script or by a
tiny C app.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top