Recover data from a failed RAID5 array

  • Thread starter Martin Goldmann
  • Start date
M

Martin Goldmann

Hi!

I have an Adaptec 2410SA controller with four 160GB drives configured in a
RAID5 array for a total capacity 480GB.

The array contains a lot of very important data, which I can't bear to
loose. A lot of homemade Cubase music, pictures of my kids growing up, home
video...

The other day, one of the drives began making a unhealthy clicking noise.
Since I didn't have a spare disk at hand, I ordered a new one online.
Unfortunately, before the the new drive arrived, something strange happened.
For some reason, one of the remaining drives went offline, and the
controller will not recognize the array any longer.
I have since found out, that the first drive failed because of a loose
connection in the SATA power connector. Since the drive also has a legacy
4-pin molex connector, I now have the drive up and running again. However,
the Adaptec controller still lists the drive as failed (SMART error), and
says that the array has two missing members.

I am pretty sure that at least three of the four drives are working
properly, but that the controller will not 'give them a try'.
I can't find any options in the controller BIOS or the Adaptec Storage
Manager software which will retest the drives or try and remount the array.

I have (finally!) gained access to the Command Line Interface, but being a
novice, I'm afraid I will do something wrong and permanently erase all array
data.

I'm hoping that someone here are more familiar with the Adaptec CLI, or know
of some other way of rescuing the data. I have plenty of storage space on
another computer to backup the recovered data to
Product documentation on the controller can be found here:
http://www.adaptec.com/en-US/support/raid/sata/AAR-2410SA/

Also, please no comments on the importance of doing regular backups. I'm in
tears already!

Martin Goldmann,
Denmark
 
M

Michael Hawes

Martin Goldmann said:
Hi!

I have an Adaptec 2410SA controller with four 160GB drives configured in a
RAID5 array for a total capacity 480GB.

The array contains a lot of very important data, which I can't bear to
loose. A lot of homemade Cubase music, pictures of my kids growing up,
home video...

The other day, one of the drives began making a unhealthy clicking noise.
Since I didn't have a spare disk at hand, I ordered a new one online.
Unfortunately, before the the new drive arrived, something strange
happened. For some reason, one of the remaining drives went offline, and
the controller will not recognize the array any longer.
I have since found out, that the first drive failed because of a loose
connection in the SATA power connector. Since the drive also has a legacy
4-pin molex connector, I now have the drive up and running again. However,
the Adaptec controller still lists the drive as failed (SMART error), and
says that the array has two missing members.

I am pretty sure that at least three of the four drives are working
properly, but that the controller will not 'give them a try'.
I can't find any options in the controller BIOS or the Adaptec Storage
Manager software which will retest the drives or try and remount the
array.

I have (finally!) gained access to the Command Line Interface, but being a
novice, I'm afraid I will do something wrong and permanently erase all
array data.

I'm hoping that someone here are more familiar with the Adaptec CLI, or
know of some other way of rescuing the data. I have plenty of storage
space on another computer to backup the recovered data to
Product documentation on the controller can be found here:
http://www.adaptec.com/en-US/support/raid/sata/AAR-2410SA/

Also, please no comments on the importance of doing regular backups. I'm
in tears already!

Martin Goldmann,
Denmark
Chances are not good. When the first drive went offline it was no longer in
the array. Any data written to the other drives after this event was not
written to the failed drive and the other 3 drives became a stripe array. If
you lose a drive from a stripe array (RAID0) then you lose all the data. If
you revive the second faulty software you may find recovery program that can
rebuild the RAID0. If your DATA was THAT important it should be backed up to
external storage device, or you dhould have spare drive onsite for immeduate
replacement.

Mike.
 
M

Martin Goldmann

Thank you for your reply, Michael!


Michael Hawes said:
Chances are not good. When the first drive went offline it was no longer
in the array. Any data written to the other drives after this event was
not written to the failed drive and the other 3 drives became a stripe
array. If you lose a drive from a stripe array (RAID0) then you lose all
the data.

I don't think that much (if anything) was written to the drives after the
first drive died. Also, the second drive failing, I suspect was caused by by
some error in the controller. The drive seems to be OK, and will complete a
verify command.

If you revive the second faulty software you may find recovery program
that
can rebuild the RAID0.

I'm not sure what you're saying here. Faulty sowftware?

If your DATA was THAT important it should be backed up to external storage
device, or you dhould have spare drive onsite for immeduate
replacement.

Yes, I'm painfully aware of it. I should have made backups, and should have
had a spare drive, and a spare controller too i guess. :(

Best regards,
Martin Goldmann
 
A

Arno Wagner

Previously Martin Goldmann said:
I have an Adaptec 2410SA controller with four 160GB drives configured in a

Aha, that pice of trash. I have one of these controllers that serves
as a paperweight, since it was extremely unreliable and the management
tools were a pain. It had d tendency to kick disks from the array
without good reason and without useful diagnostics as to why it
had kicked the disk.
RAID5 array for a total capacity 480GB.
The array contains a lot of very important data, which I can't bear to
loose. A lot of homemade Cubase music, pictures of my kids growing up, home
video...
The other day, one of the drives began making a unhealthy clicking noise.
Since I didn't have a spare disk at hand, I ordered a new one online.
Unfortunately, before the the new drive arrived, something strange happened.
For some reason, one of the remaining drives went offline, and the
controller will not recognize the array any longer.

That is the reason you should either have a spare on hand or take
the array down until you have one. Sorry.
I have since found out, that the first drive failed because of a loose
connection in the SATA power connector. Since the drive also has a legacy
4-pin molex connector, I now have the drive up and running again. However,
the Adaptec controller still lists the drive as failed (SMART error), and
says that the array has two missing members.

The first drive will be out of sync, unless you mounted the whole
array as read-only during the time it was degraded.
I am pretty sure that at least three of the four drives are working
properly, but that the controller will not 'give them a try'.
I can't find any options in the controller BIOS or the Adaptec Storage
Manager software which will retest the drives or try and remount the array.
I have (finally!) gained access to the Command Line Interface, but being a
novice, I'm afraid I will do something wrong and permanently erase all array
data.

The right way to do this is to image all the drives before
continuing. And the CLI of this controller is barely usable.
I'm hoping that someone here are more familiar with the Adaptec CLI, or know
of some other way of rescuing the data. I have plenty of storage space on
another computer to backup the recovered data to
Product documentation on the controller can be found here:
http://www.adaptec.com/en-US/support/raid/sata/AAR-2410SA/
Also, please no comments on the importance of doing regular
backups. I'm in tears already!

Make image copies (sector wise) of all the 4 drives. Then you can
experiment with low risk. What you likely need to do is to get the
second drive that failed back, unless you stopped writing to the
array after the first drive failed. In that case forcing it to use
the first drive again (if possible) may solve the issue with only
minor damage to the data.

Professional data recovery might be a good option. After all,
your data is completely available on the two working drives and
the second one that failed. Repairing that one should solve the issue.
You may also be able for force the controller to accept the second
failed drive again, but from what I remember of the CLI, I
am not sure that is possible.

If you want to risk working on this yourself, the next step
would be to diagnose the second failed drive and after that the
sector-wise backups of all drives.

I would also advise to throw the controller away after this
and get a better one. 3ware, for example, has a good reputation.

Arno
 
M

Mike Tomlinson

Michael Hawes said:
Chances are not good. When the first drive went offline it was no longer in
the array. Any data written to the other drives after this event was not
written to the failed drive and the other 3 drives became a stripe array.

Are you sure about that? It should have continued as a RAID5, albeit
running in degraded mode.

If the OP can persuade the array to accept the first failed drive (the
one with the loose power connector) again, he may stand a chance of
recovering his data. It seems to me that is what he is asking.
 
A

Arno Wagner

Previously Mike Tomlinson said:
Are you sure about that? It should have continued as a RAID5, albeit
running in degraded mode.
If the OP can persuade the array to accept the first failed drive (the
one with the loose power connector) again, he may stand a chance of
recovering his data. It seems to me that is what he is asking.

The problem is that if the OP wrote anything at all to the array
after the first drive failed, adding the first drive will corrupt
the areas written to. What the OP really needs is to get the
second failed drive working ahain, as there will have been no
writes to the then broken array.

And np, a degraded RAID5 is not a striping array. Since RAID5
distributes the parity, a degraded RAID5 is a degraded RAID5
and nothing else. With RAID4 you could get something very close
to a stiping array, if the parity disk failed.

Arno
 
F

Floyd

Mike Tomlinson wrote
Are you sure about that?

Corse he is. How dare you question him.
This is Mike Hawes we are talking about.
Expert extraordinaire, *with bells on*.
It should have continued as a RAID5,

And how is RAID5 not a striped array?
albeit running in degraded mode.
If the OP can persuade the array to accept the first failed drive (the
one with the loose power connector) again, he may stand a chance of
recovering his data. It seems to me that is what he is asking.

Right, so how come you aren't telling him how to do that.
 
M

Michael Hawes

Arno Wagner said:
Previously Mike Tomlinson said:
The problem is that if the OP wrote anything at all to the array
after the first drive failed, adding the first drive will corrupt
the areas written to. What the OP really needs is to get the
second failed drive working ahain, as there will have been no
writes to the then broken array.

And np, a degraded RAID5 is not a striping array. Since RAID5
distributes the parity, a degraded RAID5 is a degraded RAID5
and nothing else. With RAID4 you could get something very close
to a stiping array, if the parity disk failed.

Arno
A degraded RAID5 distributes the data across the remaining drives
without any parity infomation. Please list the differences between that and
a RAID0 stripe set. When the second drive died all the data was lost.

Mike.
 
A

Arno Wagner

Previously Michael Hawes said:
Arno Wagner said:
A degraded RAID5 distributes the data across the remaining drives
without any parity infomation.

This is wrong. There is no data redistribution or movement when
a disk in a RAID5 fails. And thet is well, since such data movement
is a) not needed and b) would potentially cause additional
problems. Putting high load on a degraded array is a bad idea.
In fact, without hot or cold spare added, you should typically
take it down until you have a spare.
Please list the differences between that and
a RAID0 stripe set. When the second drive died all the data was lost.

It is quite true that the loss of a second drive takes down a RAID5,
similar to the loss of any drive in a RAID0. But the data is organized
differently before that happens and even afterwards.

Let me show you an example with 3 disks. Data is in layers from top
to bottom. Partity is like follows. '+' stands for xor and x<n>
for stripe n.

RAID5:

disk1 disk2 disk3
s0 s1 s0+s1
s2 s2+s3 s3
s4+s5 s4 s5

RAID5 degraded (here disk 3 has failed):

disk1 disk2
s0 s1
s2 s2+s3 since (s2+s3)+s2 = s3, s3 can be reconstucted
s4+s5 s4 since (s4+s5)+s4 = s5, s5 can be reconstructed

RAID0 with 2 disks:

disk1 disk2
s0 s1
s2 s3
s4 s5


This should make the difference clear.

Arno
 
M

Mike Tomlinson

Michael Hawes said:
A degraded RAID5 distributes the data across the remaining drives
without any parity infomation.

*If* one drive is used for parity, and *if* it is the parity drive that
has died. My understanding is that it's far more usual for parity to be
distributed among the members of the array.
 
M

Mike Tomlinson

Floyd <[email protected]> said:
Right, so how come you aren't telling him how to do that.

Because I'm not familiar with the array controller he is using and thus
cannot offer specific advice. I see you've run out of your pills again,
Folkert.
 
A

Arno Wagner

Previously Mike Tomlinson said:
*If* one drive is used for parity, and *if* it is the parity drive that
has died. My understanding is that it's far more usual for parity to be
distributed among the members of the array.

RAID4: One drive parity, rest data
RAID5: Parity is distrubuted over all drives.

Arno
 
A

[AWE] Mia Liau / bjam

I have the exact same problem. Of 4 drives, 1 drive actually failed
degrading the array, and 1 other momentarily went offline causing the
array to fail. All I want is to tell the controller to consider that
the one that went offline momentarily is actually fine!

This is the second time in two seperate systems that this has happened
to me with an Adaptec SATA RAID5 array, so I am doubting the wise-ness
of RAID5 on SATA. It will certainly be the last one I configure one.

Does anyone have an actual process for recovery?

Can the OP update us on his progress?

Thanks
Ben
 
A

Arno Wagner

Previously [AWE] Mia Liau / bjam said:
I have the exact same problem. Of 4 drives, 1 drive actually failed
degrading the array, and 1 other momentarily went offline causing the
array to fail. All I want is to tell the controller to consider that
the one that went offline momentarily is actually fine!
This is the second time in two seperate systems that this has happened
to me with an Adaptec SATA RAID5 array, so I am doubting the wise-ness
of RAID5 on SATA. It will certainly be the last one I configure one.
Does anyone have an actual process for recovery?
Can the OP update us on his progress?

The best way is to force some compatible controller to
accept the 4-dosk array. (Incidentially, Linux software
RAID has an option for just this: Do a forced array
assembly in read-only mode.) If you cannot convince the
original controller to do this, a software controller
may help. This is available in the form of RAID recovery
software and, potentially, as Linux software driver for
FakeRAID controller created arrays. (The name is dmraid,
if I remember correctly.)

BTW, you can still make backups of the 4 disks, they
just need to be sector-wise image backups.

Arno
 
O

Odie Ferrous

Martin said:
Hi!

I have an Adaptec 2410SA controller with four 160GB drives configured in a
RAID5 array for a total capacity 480GB.

The array contains a lot of very important data, which I can't bear to
loose. A lot of homemade Cubase music, pictures of my kids growing up, home
video...

The other day, one of the drives began making a unhealthy clicking noise.
Since I didn't have a spare disk at hand, I ordered a new one online.
Unfortunately, before the the new drive arrived, something strange happened.
For some reason, one of the remaining drives went offline, and the
controller will not recognize the array any longer.
I have since found out, that the first drive failed because of a loose
connection in the SATA power connector. Since the drive also has a legacy
4-pin molex connector, I now have the drive up and running again. However,
the Adaptec controller still lists the drive as failed (SMART error), and
says that the array has two missing members.

I am pretty sure that at least three of the four drives are working
properly, but that the controller will not 'give them a try'.
I can't find any options in the controller BIOS or the Adaptec Storage
Manager software which will retest the drives or try and remount the array.

I have (finally!) gained access to the Command Line Interface, but being a
novice, I'm afraid I will do something wrong and permanently erase all array
data.

I'm hoping that someone here are more familiar with the Adaptec CLI, or know
of some other way of rescuing the data. I have plenty of storage space on
another computer to backup the recovered data to
Product documentation on the controller can be found here:
http://www.adaptec.com/en-US/support/raid/sata/AAR-2410SA/

Also, please no comments on the importance of doing regular backups. I'm in
tears already!

Martin Goldmann,
Denmark

Martin,


I'm just over the water in the UK. I could help you - or you could
likely find someone local.

If your data is that important, get it done professionally. It's very
easy to make things worse by experimentation.


Duncan
 
A

Arno Wagner

Previously Odie Ferrous said:
I'm just over the water in the UK. I could help you - or you could
likely find someone local.
If your data is that important, get it done professionally. It's very
easy to make things worse by experimentation.

I second that. Don't experiment unless you have a complete
backup of the originals and you are sure that the backup is
good. This does require specific experience. It may indeed be
easier to have this done professionally, since professionals
will have the equipment, the tools and the experience.

Arno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top