Possible imminent drive failure....or something else? (semi-long)

J

jab3

Hello -

I've been having strange things going on with one of my machines lately and
thought I would probe the minds here. A few of weeks ago this machine
basically shutdown everything but the PSU. I came in, thinking the
screensaver was just on (my screensavers are blank screens), and hit a key.
Nothing. Moved the mouse, Nothing. The power light was on and I could
hear it, so I turned it off, waited, and turned it back on. Nothing. That
is, the PSU turned on and blew air, but there was no BIOS beep, no hard
drive spin-up, no nothing except the sound of the PSU. (which is not as
loud as when everything is working) So I went out of town for a week and
left it off, thinking it maybe was the processor (when I put the processor
in this new Abit mb (IS7 I think), there was a bit of resistance with the
heatsink/fan/socket so I, um, pushed a bit harder and it snapped/popped into
place). I came back from vacation and decided to give it a try, but first
took out 1 of the memory modules and unplugged the CD-ROM and the floppy.
Left the 2 hard drives and everything else in. I also checked the
capacitors; they looked okay. (reason for new mb is old mb had capacitor
problem) And it worked. So I put the memory module back in; it worked
again.

It ran for about a week before quiting again with same symptoms. PSU turns
on but nothing else. I let it sit a week Turned it back on again and it
worked again. BTW, I run Linux primarily on this box. (1st drive has Win98
and WinXP, with 12gig FAT32 free-for-all space. 2nd drive is only Linux)
So I decided to check the boot log from Linux (dmesg and /var/log/boot.msg).
The relevant parts (well, what I considered relevant :)) are:

hda: Host Protected Area detected
current capacity is 66055248 sectors (33820 MB)
native capacity is 78177792 sectors (40027 MB)
hda: Host Protected Area disabled.
hda: 78177792 sectors (40027 MB) w/1819KiBCache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
hda:<4>hda: dma_timer_expiry: dma status == 0x60
hda: DMA timeout retry
hda: drive not ready for command
ide0: reset: success
hda1 hda2 hda3 hda4 <<4>>hda: dma_timer_expiry: dma status == 0x20
hda: timeout waiting for DMA
hda: status timeout: status=0xd0 { Busy }

ide: failed opcode was: unknown
hda: drive not ready for command
ide0: reset: success
hda5 hda6 >

(this is reported a couple of times)

[more stuff...hdb didn't have these issues ]

But later on during fsck/journal-checking I get (presumably only hdb, Linux
drv):

*****************************************
* Warning: The dma on your hard drive is turned off. *
* This may really slow down the fsck process. *
*****************************************

The dma has not always been off. There are also some messages about probing
failure (e.g. - ide2: Wait for ready failed before probe !; same for
ide[3-5]), but I just assumed that was because there is no ide[2-5].

So, does this mean that hda is on the way out? Or could the 'shutdown'
problems be something else? (Or am I suffering from 2 separate problems?)


Thanks for any help,
jab3


p.s. - yes, I'm backing up as we speak
 
P

philo

jab3 said:
Hello -

I've been having strange things going on with one of my machines lately and
thought I would probe the minds here. A few of weeks ago this machine
basically shutdown everything but the PSU. I came in, thinking the


<snip>

why guess?
run the mfg's diagnostic on each harddrive...
if one is bad...backup at once and replace the drive!
 
J

jab3

philo said:
<snip>

why guess?
run the mfg's diagnostic on each harddrive...
if one is bad...backup at once and replace the drive!

Well, I did run some diagnostics and some SMART tests. They seem to have
come back basically okay, though there is a little something weird going on
with the 2nd drive, hdb. (the drive I didn't think had any issues) There
was some potential problems with seek response or something.

But, my question also has to do with why the computer is quitting after a
week of running. And can a bad hard drive cause that.


-jab3
 
P

philo

Well, I did run some diagnostics and some SMART tests. They seem to have
come back basically okay, though there is a little something weird going
on
with the 2nd drive, hdb. (the drive I didn't think had any issues) There
was some potential problems with seek response or something.

But, my question also has to do with why the computer is quitting after a
week of running. And can a bad hard drive cause that.


If there is a problem with seek time...
I'd suspect a bad drive.
Also check your machine for overheating.
You may also want to open it up and check all the connections...
ie: unplug and replug the cables, cards and the RAM etc
 
K

kony

Hello -

I've been having strange things going on with one of my machines lately and
thought I would probe the minds here. A few of weeks ago this machine
basically shutdown everything but the PSU. I came in, thinking the
screensaver was just on (my screensavers are blank screens), and hit a key.
Nothing. Moved the mouse, Nothing. The power light was on and I could
hear it, so I turned it off, waited, and turned it back on. Nothing. That
is, the PSU turned on and blew air, but there was no BIOS beep, no hard
drive spin-up, no nothing except the sound of the PSU. (which is not as
loud as when everything is working) So I went out of town for a week and
left it off, thinking it maybe was the processor (when I put the processor
in this new Abit mb (IS7 I think), there was a bit of resistance with the
heatsink/fan/socket so I, um, pushed a bit harder and it snapped/popped into
place). I came back from vacation and decided to give it a try, but first
took out 1 of the memory modules and unplugged the CD-ROM and the floppy.
Left the 2 hard drives and everything else in. I also checked the
capacitors; they looked okay. (reason for new mb is old mb had capacitor
problem) And it worked. So I put the memory module back in; it worked
again.

It ran for about a week before quiting again with same symptoms. PSU turns
on but nothing else. I let it sit a week Turned it back on again and it
worked again. BTW, I run Linux primarily on this box. (1st drive has Win98
and WinXP, with 12gig FAT32 free-for-all space. 2nd drive is only Linux)
So I decided to check the boot log from Linux (dmesg and /var/log/boot.msg).
The relevant parts (well, what I considered relevant :)) are:

hda: Host Protected Area detected
current capacity is 66055248 sectors (33820 MB)
native capacity is 78177792 sectors (40027 MB)
hda: Host Protected Area disabled.
hda: 78177792 sectors (40027 MB) w/1819KiBCache, CHS=65535/16/63, UDMA(100)
hda: cache flushes supported
hda:<4>hda: dma_timer_expiry: dma status == 0x60
hda: DMA timeout retry
hda: drive not ready for command
ide0: reset: success
hda1 hda2 hda3 hda4 <<4>>hda: dma_timer_expiry: dma status == 0x20
hda: timeout waiting for DMA
hda: status timeout: status=0xd0 { Busy }

ide: failed opcode was: unknown
hda: drive not ready for command
ide0: reset: success
hda5 hda6 >

(this is reported a couple of times)

[more stuff...hdb didn't have these issues ]

But later on during fsck/journal-checking I get (presumably only hdb, Linux
drv):

*****************************************
* Warning: The dma on your hard drive is turned off. *
* This may really slow down the fsck process. *
*****************************************

The dma has not always been off. There are also some messages about probing
failure (e.g. - ide2: Wait for ready failed before probe !; same for
ide[3-5]), but I just assumed that was because there is no ide[2-5].

So, does this mean that hda is on the way out? Or could the 'shutdown'
problems be something else? (Or am I suffering from 2 separate problems?)
p.s. - yes, I'm backing up as we speak


When system fails to power-on, POST, unplug the drives and
retry it. Even if it can't then boot the OS, it will still
POST if all else is working properly. I'd suspect power
supply first, motherboard 2nd... bad drives don't "usually"
prevent a system from POSTing at all.

If your power supply is failing, it may damage hardware
including drives. Get rest of system working THEN focus on
drives.
 
J

jab3

[ snip ]
When system fails to power-on, POST, unplug the drives and
retry it. Even if it can't then boot the OS, it will still
POST if all else is working properly. I'd suspect power
supply first, motherboard 2nd... bad drives don't "usually"
prevent a system from POSTing at all.

If your power supply is failing, it may damage hardware
including drives. Get rest of system working THEN focus on
drives.


I had a feeling the power supply could be a culprit. (though I
didn't mention that in the message. It's just that it's a
semi-new ( < year) Thermaltake 450W; the computer is Abit 478
mb with P4, 2 hard drives, 1 CD-ROM, floppy, and FX-5500 (I think)
nVidia graphics card. Perhaps it's the case; this is the same
case I had another MB in (ECS maybe) in which the capacitors
overheated (you may remember that thread, probably not). I may
just need to get a new case, but first.....

How can I test the PSU? Just use a regular voltmeter? Better way?
I'll check the voltages from the BIOS screen (the temps were ok each time
I checked).



Thanks for any help,
jab3
 
K

kony

I had a feeling the power supply could be a culprit. (though I
didn't mention that in the message. It's just that it's a
semi-new ( < year) Thermaltake 450W; the computer is Abit 478
mb with P4, 2 hard drives, 1 CD-ROM, floppy, and FX-5500 (I think)
nVidia graphics card. Perhaps it's the case; this is the same
case I had another MB in (ECS maybe) in which the capacitors
overheated (you may remember that thread, probably not). I may
just need to get a new case, but first.....

Well if it's running hot, that heat will tend to wear out
capacitors on motherboard, power, video card, and less
likely (but still possible) other devices.

How can I test the PSU? Just use a regular voltmeter? Better way?
I'll check the voltages from the BIOS screen (the temps were ok each time
I checked).

Insert multimeter probes into the back of the connector
while still plugged into the load... which in this case
would be the motherboard ATX connector and the 4-pin CPU 12V
connector. Voltage might fluctuate for a briefy moment when
system is turned on but should be stable and very near the
spec'd rail voltage... +-5% is the typical target as it is
what Intel spec'd.

Also examine the motherboard for failed caps, and if you're
ambitious, open the power supply and inspect it was well
(with AC power disconnected of course). If all else fails
then strip the system down to a minimal configuration and
retry it. If still no luck (and you've cleared CMOS again)
swap in different video, memory and CPU (in that order) and
retry it. You might also pull the board from the case and
try it on a non-conductive surface (not on an anti-static
mat, which conducts electricity) again in minimal
configuration of only video, 1 memory module, CPU and
heatsink/fan.
 
J

jab3

[ snipped some stuff ]
Insert multimeter probes into the back of the connector
while still plugged into the load... which in this case
would be the motherboard ATX connector and the 4-pin CPU 12V
connector. Voltage might fluctuate for a briefy moment when
system is turned on but should be stable and very near the
spec'd rail voltage... +-5% is the typical target as it is
what Intel spec'd.

Also examine the motherboard for failed caps, and if you're
ambitious, open the power supply and inspect it was well
(with AC power disconnected of course). If all else fails
then strip the system down to a minimal configuration and
retry it. If still no luck (and you've cleared CMOS again)
swap in different video, memory and CPU (in that order) and
retry it. You might also pull the board from the case and
try it on a non-conductive surface (not on an anti-static
mat, which conducts electricity) again in minimal
configuration of only video, 1 memory module, CPU and
heatsink/fan.

OK. I'll try this when I'm able to borrow a multimeter. :)
Thanks for the help. I may post results, depending on what
they are of course.


-jab3
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top