Serveraid Array problem

R

rich

Hi,
I have a Netfinity 5500 with a builtin ServeraidII.
It was flashed to Bios 7.0 a month ago, working fine.
System has 4 18GB drives divided into 5 logical ones.
I have the configuration logs and details.
But yesterday, we brought down the system (Win2KS)normally.
During powerup, we noticed one of the drives did not appear to be
seated properly. So since the system was still in POST and Windows had
not started booting yet, The drive was pulled and reseated. To make a
long story short, we ended up with three of the drives being marked as
defunct. But at no time, did windows even begin to startup, so I
believe the information on the logical drives of the Array should be
clean. If I reset the controller and try to read the configuration
from the drives, I end up with the same level of deFUNCTness. We did
not try to delete, redefine, re-mark any of the drives or array.
It seems strange that a Raid-5 array can be so fragile. I would think
that there should be some way to easily unDEFUNCTify all the drives,
just to try some form of recovery. Since parity is involved,
revalidating the logical drives shouldnt be a big deal (rather than
forcing the whole Array to be inaccessible).
At this point, a backup system is running fine, so I have time to
attempt/practice a disaster recovery scenario on this ole-dawg.
I've meandered through a lot of the online Serveraid docs, with no
clear solution. I feel like i'm missing something easy. Anyone have
any experiences or ideas that could help bring this array back from
the dead?

I also have a new Serveraid-4Mx card to play with too, if it can help.
Thanks!
rich
 
J

Johnson_dk

rich said:
Hi,
I have a Netfinity 5500 with a builtin ServeraidII.
It was flashed to Bios 7.0 a month ago, working fine.
System has 4 18GB drives divided into 5 logical ones.
I have the configuration logs and details.
But yesterday, we brought down the system (Win2KS)normally.
During powerup, we noticed one of the drives did not appear to be
seated properly. So since the system was still in POST and Windows had
not started booting yet, The drive was pulled and reseated. To make a
long story short, we ended up with three of the drives being marked as
defunct. But at no time, did windows even begin to startup, so I
believe the information on the logical drives of the Array should be
clean. If I reset the controller and try to read the configuration
from the drives, I end up with the same level of deFUNCTness. We did
not try to delete, redefine, re-mark any of the drives or array.
It seems strange that a Raid-5 array can be so fragile. I would think
that there should be some way to easily unDEFUNCTify all the drives,
just to try some form of recovery. Since parity is involved,
revalidating the logical drives shouldnt be a big deal (rather than
forcing the whole Array to be inaccessible).
At this point, a backup system is running fine, so I have time to
attempt/practice a disaster recovery scenario on this ole-dawg.
I've meandered through a lot of the online Serveraid docs, with no
clear solution. I feel like i'm missing something easy. Anyone have
any experiences or ideas that could help bring this array back from
the dead?

I also have a new Serveraid-4Mx card to play with too, if it can help.
Thanks!
rich

Never reseat ServeRaid drives while there power on the system eventhoug no
OS i running, the raid controller detects that the drives has been removed
and fails these. To recover from this, use the ServeRaid CD or in some
cases the only thing that works is the DOS config v3.50c.
Boot the system and start the config util, set one drive at the time to
"Online", remenber that the last disk has to be rebuild AND THIS DISK HAS TO
BE THE ONE WHO FAILED FIRST !!!!!

I have done this lots of times, and it works i most cases unless you dont do
the first faled disk last, or you have Hardware failures on multiple disks.

PS. Remember to upgrade: Disk firmware, ServeRaid Driver, Server
Administration program, Bios, system management processor Firmware, Diags,
Director...... and so on.... do not do upgrades of just ServeRaid firmware.
IF you chose to upgrade do it all or leave it alone.


Best Regrads


Johnny
 
R

rich

Thanks for the info Johnny,
The serveraid CD doesnt give me the option to bring the drives back
online. I suspect that is because there is more than one DDD beast in
the array. There are three DDD and one Online drive. I think I know
which drive needs to be rebuilt (the one that was first reseated). So
I assume that I need to make a HSP drive in the array and then rebuild
that first-failed DDD drive onto it? Will(should?) the other two DDD
drives then go online at that point, or will they need to be rebuilt
also?
Where can I get a copy of the DOS Config v3.50c? The earliest copy of
the Serveraid stuff I have is around 4.6.

Never reseat ServeRaid drives while there power on the system eventhoug no
OS i running, the raid controller detects that the drives has been removed
and fails these. To recover from this, use the ServeRaid CD or in some
cases the only thing that works is the DOS config v3.50c.
Boot the system and start the config util, set one drive at the time to
"Online", remenber that the last disk has to be rebuild AND THIS DISK HAS TO
BE THE ONE WHO FAILED FIRST !!!!!

I have done this lots of times, and it works i most cases unless you dont do
the first faled disk last, or you have Hardware failures on multiple disks.

PS. Remember to upgrade: Disk firmware, ServeRaid Driver, Server
Administration program, Bios, system management processor Firmware, Diags,
Director...... and so on.... do not do upgrades of just ServeRaid firmware.
IF you chose to upgrade do it all or leave it alone.
When I did it, I was confused about upgrading everything at once. I
understand the requirement, but is there a proper sequence of events
(since in reality, it can't all be done at the same instant). Do I
need to uninstall/Reinstall director? I seem to remember that I had
some problem with something called twintail (which we dont do), that
was backleveled and wasnt part of the upgrade download. We finally got
it working, but lost Director functionality. Since this server was
going to be upgraded to SBS2k3 soon, we decided to leave well enough
alone.
Thanks for all your advice, this all proves that understanding theory
doesnt help much in testing a real Disaster recovery plan.

Rich
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top