RAID-1 insight needed

R

Ralph Innes

Hi all!

Do you know of a site on the internet where a guy could find a good
treatment of RAID? Or, have you had the misfortune, yourself, to
troubleshoot RAID problems?

Here's what I have:

An ECS P4VXAD mobo w/Promise Technology "Lite" RAID onboard.

A pair of 30GB HDD in removable drive trays, in RAID-1 configuration.

The system has been running with only minor glitches for about 5 years.
But now, we're seeing a reluctance to boot up into Windows (98SE), with
disparate messages about DLL's, etc.

After repeated power-downs and restarts, the system will eventually start.
And run O.K. for the remainder of the day.

This is my guess (diagnosis):

That we have a failing HDD - inability to read a particular part of the
drive surface reliably.

This is my understanding of how RAID-1 is supposed to work:

Data is written to both mirrored drives during a write operation, but is
not read back to check it.

During a "read" operation, the RAID controller reads from either drive, a
sector or 2 from one drive, a sector or 2 from the other, and "stitches"
the file together.

My problem is, that I don't have a good understanding of how this is
accomplished, and understanding stuff is the basis for troubleshooting.

Our next step is to insert only one tray (HDD) in the system, and see if it
boots into Windows without incident. And, if it fails, to try the other
drive.

If either scenario is successful, I guess we could say that one drive has
become defective.

If neither is successful, I don't know where that leaves us.

Any insight would be much appreciated.

TIA,

- Ralph
 
R

Ralph Innes

If you have not yet, install the raid management software
from Promise. See if that indicates any problems.

Connect one drive to the motherboard integral controller and
check it with the HDD manufacturer diagnostics. Next
disconnect, connect the other and check it too.

If either drive fails you have found your problem...
maybe... removable caddies are themselves another point of
possibly failure, intermittent contact can occur so check
these drives directly connected to the motherboard via
cable, no caddy interface present. If the caddy is the
problem then merely unplugging them from caddy and plugging
in again could temporarily, or longer-term resolve the
problem.

If none of these seems to be the problem it is possible you
have a case of windows just being windows. Windows will
succumb to problems where it ends up not having needed files
anymore, resulting from some update, installation or
removable of an application or a virus running amuck.

Since it would seem that regardless of the problem, windows
now does not have the needed files to operate, the first
procedure should be considering if there is any valuable
data on this array and copying it off if possible. Next do
the above testing and if it checks out ok do an installation
of windows overtop of itself (as an attempted time saver) or
a clean installation if that doesn't work which may lose all
files and settings if you need to format the partition.

There is another possiblity, that general system instability
has caused the problem. Motherboard or PSU failings can
result in random error messages, that while loading the OS
it could be resulting in a default message that assumes a
file problem when it is a minor crash of some sort. Key in
differentiating whether this is the case would be whether
all the same files are indicated or if it is more random in
nature. You can choose F8 key early in boot process and
choose to make a boot log which will be made in the root of
the OS partition then retrieve that boot log and see what it
reports.

If all else fails the remaining attempt would be pulling the
drives and scanning them separately in another system, or
acquiring a 3rd hard drive, connecting to that system and
seeing if it can install and run windows (whether that be a
clean new windows install or duplicating the old drives to
that one if possible... if your choice of dupe utility can't
see the drives on the raid controller then connect one or
the other direct to the motherboard integral controller).
Also keep in mind that a system of that age might not be
able to use hard drives over 128GB in size, though I suspect
it is new enough and only the OS is older.

Kony, thanks for taking the time and interest to give such a detailed
response - much appreciated.

In preparation for running drive diagnostics, I'd emailed Don to have him
take the lid off that HDD tray, to determine which manufacturer's
diagnostics to download.

He phoned me and said, "I... think... we... may... have... diagnosed...
the... problem."

He found a scrap of paper, about an inch long, mashed into the female
Centronics connector on the rear of the removable drive-tray. Doh!

I've been using your well-reasoned approach to diagnose yet another RAID
problem, which will be the subject of another post.

Thanks again. It's refreshing to know that there are selfless folk like
yourself, who'll take the trouble to assist those in need.

- Ralph
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top