Is my HDD dying or something else from these errors and symptoms?

A

Ant

Hello.

For about two weeks ago, I had two of these incidents (13 days apart
between them):

hdb: dma_timer_expiry: dma status == 0x61
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: DMA disabled
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success

From this, my old Linux/Debian system became slow and unresponsive due
to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r
now) Linux/Debian, reboot, and things are back to normal speed. I doubt
it is temperature related because the room is in the 60s and 70s
degrees(F) and computer wasn't working intensely (e.g., surfing the
Web).

Also, I recalled before these problems started, my motherboard (CMOS and
BIOS) didn't see both of my primary master drives (both HDDs: hda and
hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to
open the case, but didn't see anything wrong. I wiggled the cable ends
for the HDDs. I booted my machine up and it seemed fine for a few days/a
week and then these errors came up (not disconnections).

I ran smartctl utility on both of my HDDs for information and results:
http://pastebin.ca/930776 ...

My full system specifications can be found here:
http://alpha.zimage.com/~ant/antfarm/about/computers.txt
(secondary/backup machine). Does that mean my decade old Quantum 6.4 GB
HDD (already made a backup just in case) is finally dying? Or is it
something else?

Thank you in advance. :)
--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
I

Ignoramus24341

Probably. What does

smartctl --all /dev/hda

say? (and hdb)

Does it report any errors?

i
 
A

Ant

Please see the link I provided earlier (http://pastebin.ca/930776). It
has all the information.


In comp.sys.ibm.pc.hardware.storage Ignoramus24341 said:
Probably. What does
smartctl --all /dev/hda
say? (and hdb)
Does it report any errors?

--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
I

Ignoramus24341

Your hdb is VERY old (over 6 years of actual running hours). It does
not seem to support SMART error logging, so you do not know what is
wrong with it.

I think that it is dying based on your errror messages from console.

I would make sure to run a backup ASAP, like right now.

i
 
I

Ignoramus24341

Your hdb is VERY old (over 6 years of actual running hours). It does
not seem to support SMART error logging, so you do not know what is
wrong with it.

I think that it is dying based on your errror messages from console.

I would make sure to run a backup ASAP, like right now.

By the way: ALL HARD DRIVES DIE.

There are NO EXCEPTIONS to this.

Some die earlier and some die later. But they all die.

So not having a backup, verges on insanity. (I am not saying that you
are not backing up, just making a statement).

i
 
A

Ant

In alt.comp.periphs.hdd Ignoramus24341 said:
Your hdb is VERY old (over 6 years of actual running hours). It does
not seem to support SMART error logging, so you do not know what is
wrong with it.
I think that it is dying based on your errror messages from console.
I would make sure to run a backup ASAP, like right now.

Already did onto hda since that's the only HDD I have to back up to).
Now, how come before these errors came up my motherboard couldn't see
BOTH HDDs? Is that related or just a coincident?

--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
R

Rod Speed

Ant said:
For about two weeks ago, I had two of these incidents (13 days apart between them):
hdb: dma_timer_expiry: dma status == 0x61
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hda: DMA disabled
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success

Thats normally just a bad cable. And since the problem is seen with
more than one hard drive, its almost certainly just a bad cable.

Can be a bad hard drive controller on the motherboard etc but thats much less likely.
From this, my old Linux/Debian system became slow and
unresponsive due to high CPU usage (e.g., 7.xx in top).

Because its turned the DMA off, as it says.
I had to shutdown (shutdown -r now) Linux/Debian,
reboot, and things are back to normal speed.

Because its turned the DMA on again.
I doubt it is temperature related because the room is in the 60s and 70s
degrees(F) and computer wasn't working intensely (e.g., surfing the Web).

Yeah, most likely just a bad cable.
Also, I recalled before these problems started, my motherboard (CMOS
and BIOS) didn't see both of my primary master drives (both HDDs: hda
and hdb), but can see my secondary master (DVD-ROM drive = hdc).

More evidence of a bad cable to the hard drives.
I had to open the case, but didn't see anything wrong.
I wiggled the cable ends for the HDDs.

And that likely got it going again. Those cable piercing connectors
can bend one of the things that bite the cable when the cable is made
and can get loose if you reef the cable off the drive or motherboard
end by pulling on the ribbon etc.

If its a round cable, its ****ed by design.
I booted my machine up and it seemed fine for a few days/a
week and then these errors came up (not disconnections).
I ran smartctl utility on both of my HDDs for information and results:
http://pastebin.ca/930776 ...
My full system specifications can be found here:
http://alpha.zimage.com/~ant/antfarm/about/computers.txt
(secondary/backup machine). Does that mean my decade old Quantum
6.4 GB HDD (already made a backup just in case) is finally dying?

Nope, just the cable.
Or is it something else?

Yep, the cable.
 
I

Ignoramus24341

Already did onto hda since that's the only HDD I have to back up to).
Now, how come before these errors came up my motherboard couldn't see
BOTH HDDs? Is that related or just a coincident?

I would buy a USB drive (I recommend Western Digital from Newegg).

I am not sure about your second question.

i
 
A

Ant

For about two weeks ago, I had two of these incidents (13 days apart between them):
Thats normally just a bad cable. And since the problem is seen with
more than one hard drive, its almost certainly just a bad cable.
Can be a bad hard drive controller on the motherboard etc but thats much less likely.
Because its turned the DMA off, as it says.

I wonder if I can re-enable DMA without rebooting. I think hdparm
controls that?

Because its turned the DMA on again.

So why did my DMA go off? As a precaution?


Yeah, most likely just a bad cable.
More evidence of a bad cable to the hard drives.
And that likely got it going again. Those cable piercing connectors
can bend one of the things that bite the cable when the cable is made
and can get loose if you reef the cable off the drive or motherboard
end by pulling on the ribbon etc.

Hmm, I have those old fashion flat cables. I guess I will go replace it.
I assume replacing the whole ribbon cable is enough? I didn't see how
many there were in my mini-tower case (hard to see and it's crowded). I
assume two are in total for primary and secondary drives.

If its a round cable, its ****ed by design.

Yeah, I don't have those. My other PC has a SATA cable that are round.
;)

Nope, just the cable.

Hmm, OK! I will go try the cable first then!

Yep, the cable.

;)
--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
W

Walter Mautner

Ant said:
In alt.comp.periphs.hdd Ignoramus24341




Already did onto hda since that's the only HDD I have to back up to).
Now, how come before these errors came up my motherboard couldn't see
BOTH HDDs? Is that related or just a coincident?
You say hda and hdb, so they are on the same controller/cable. You know, one
blocking/stalled drive can block the whole ide bus?
You should consider moving the 2nd drive to the 2nd ide cable, often enough
there were incompatibilities between different harddrive brands.
 
A

Arno Wagner

In comp.sys.ibm.pc.hardware.storage Ant said:
For about two weeks ago, I had two of these incidents (13 days apart
between them):
hdb: dma_timer_expiry: dma status == 0x61
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: DMA disabled
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41
hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success

These are some problems the sytem has with getting data from.to the
HDD. It can be cabeling, the HDD and the controller on the mainboard.
Unlukley, but possible, is also a failing PSU.
From this, my old Linux/Debian system became slow and unresponsive due
to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r
now) Linux/Debian, reboot, and things are back to normal speed. I doubt
it is temperature related because the room is in the 60s and 70s
degrees(F) and computer wasn't working intensely (e.g., surfing the
Web).
Also, I recalled before these problems started, my motherboard (CMOS and
BIOS) didn't see both of my primary master drives (both HDDs: hda and
hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to
open the case, but didn't see anything wrong. I wiggled the cable ends
for the HDDs. I booted my machine up and it seemed fine for a few days/a
week and then these errors came up (not disconnections).
I ran smartctl utility on both of my HDDs for information and results:
http://pastebin.ca/930776 ...

hdb looks fine and the absence of seek errors may indicate your
PSU is fine too.

For hda, Raw_Read_Error_Rate, Seek_Error_Rate and
Hardware_ECC_Recovered look bad. If it was not for the seek error
rate, I would have said the read circuitry is going bad. This way it
looks like the power may be bad, although I would expect a stronger
impact on the Spin_Up_Time. Your data cable is fine, as a problem
with that would have caused the UDMA_CRC_Error_Count to show
something.

As to the tests, you run them and when they are finished, you look
at the smart attributes and the test-log.
My full system specifications can be found here:
http://alpha.zimage.com/~ant/antfarm/about/computers.txt
(secondary/backup machine). Does that mean my decade old Quantum 6.4 GB
HDD (already made a backup just in case) is finally dying? Or is it
something else?
Thank you in advance. :)

The Quantum looks fine. If something is dying, it is the Seagate.

I think you should do the following:
1. Remove and reseat the power connector on the seagate, and
if you use a splitter-cable, connect it directly to the PSU.
2. Run a long SMART selftest on the disk to its completion and
then post the SMART attributes again.
If they are unchanged, then I would say that your HDA is likely
dying.

Arno
 
A

Ant

Your hdb is VERY old (over 6 years of actual running hours). It does
You say hda and hdb, so they are on the same controller/cable. You know, one
blocking/stalled drive can block the whole ide bus?
You should consider moving the 2nd drive to the 2nd ide cable, often enough
there were incompatibilities between different harddrive brands.

I wasn't aware of that. I am not familiar with the hardwares in PCs. :)
So one drive (even a CD/DVD-ROM drive) has problems, it affects the
other drive on the same controller/cable? I never had problems before
two weeks ago. Even on previous motherboards.
--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
A

Ant

For about two weeks ago, I had two of these incidents (13 days apart
These are some problems the sytem has with getting data from.to the
HDD. It can be cabeling, the HDD and the controller on the mainboard.
Unlukley, but possible, is also a failing PSU.

I had to replace the PSU on 5/14/2007 according to my log:
http://alpha.zimage.com/~ant/antfarm/about/toys.html ... It's almost ten
months old.

hdb looks fine and the absence of seek errors may indicate your
PSU is fine too.
For hda, Raw_Read_Error_Rate, Seek_Error_Rate and
Hardware_ECC_Recovered look bad. If it was not for the seek error
rate, I would have said the read circuitry is going bad. This way it
looks like the power may be bad, although I would expect a stronger
impact on the Spin_Up_Time. Your data cable is fine, as a problem
with that would have caused the UDMA_CRC_Error_Count to show
something.
As to the tests, you run them and when they are finished, you look
at the smart attributes and the test-log.

Where are these logs at?

The Quantum looks fine. If something is dying, it is the Seagate.
I think you should do the following:
1. Remove and reseat the power connector on the seagate, and
if you use a splitter-cable, connect it directly to the PSU.
2. Run a long SMART selftest on the disk to its completion and
then post the SMART attributes again.
If they are unchanged, then I would say that your HDA is likely
dying.

Odd that dmesg and /var/log/messages don't mention hda. Same for SMART.
--
"All the best work is done the way that ants do things -- by tiny but
untiring and regular additions." --Lafcadio Hearn
/\___/\
/ /\ /\ \ Ant @ http://antfarm.home.dhs.org (Personal Web Site)
| |o o| | Ant's Quality Foraged Links (AQFL): http://aqfl.net
\ _ / Please remove ANT if replying by e-mail.
( )
 
S

Stretch

Arno Wagner wrote in news:[email protected]
These are some problems the sytem has with getting data from.to the
HDD. It can be cabeling, the HDD and the controller on the mainboard.
Unlukley, but possible, is also a failing PSU.




hdb looks fine and the absence of seek errors may indicate your
PSU is fine too.

For hda, Raw_Read_Error_Rate, Seek_Error_Rate and
Hardware_ECC_Recovered look bad. If it was not for the seek error
rate, I would have said the read circuitry is going bad. This way it
looks like the power may be bad, although I would expect a stronger
impact on the Spin_Up_Time.
Your data cable is fine, as a problem with that would have caused the
UDMA_CRC_Error_Count to show something.

Not if the commands that initiate the UDMA data tranfers never make it.
Which apparently they didn't as a bus reset was needed to get it going again.
Reversely, UDMA CRC errors say nothing about cable quality.
As to the tests, you run them and when they are finished, you look
at the smart attributes and the test-log.



The Quantum looks fine.
If something is dying, it is the Seagate.

The Seagate is fine.
 
A

Arno Wagner

Where are these logs at?

It seems your hdb cannot log self-test results. For hda, the log
starts at line 139 in your SMART attribute list.
Odd that dmesg and /var/log/messages don't mention hda. Same for SMART.

Indeed for the message log. However it is possible that hda does
things to the bus, like keeping it occupied too long that causes the
error messages for hdb. The error messages in the log are basically a
timout only, and do not indicate that there is necessarily anything
wrong with the disk it happened on.

As for SMART, it is possible that hdb is actually breaking down, but
there seems to be something wrong with hda, so my first take would be
that hda causes interference to hdb in some way. They are on the same
cable, after all. I especially do not like the values opf attributes
1, 7, and 195 on hda as mentioned before. They indicate there is some
problem with the seeking mechanism. This can be due to power issues.
It also can be problems with the drive hardware itself.

Admittedly my comments are highly speculative.

I have had similar errors in my logs in the past. One turned out to be
a shot drive controller (possibly inadequate cooling for this particular
chip). An other one was a software issue and went away after a kernel
update.

Arno
 
A

Arno Wagner

In comp.sys.ibm.pc.hardware.storage Walter Mautner said:
Ant wrote:
You say hda and hdb, so they are on the same controller/cable. You know, one
blocking/stalled drive can block the whole ide bus?
You should consider moving the 2nd drive to the 2nd ide cable, often enough
there were incompatibilities between different harddrive brands.

Good idea.

Some incompatibility could even have been introduced by the problem
on one drive, that normally would not manifest or show, and now cause
issues for the other drive to be logged.
--
vista policy violation: Microsoft optical mouse found penguin patterns
on mousepad. Partition scan in progress to remove offending
incompatible products. Reactivate MS software.
Linux 2.6.24. [LinuxCounter#295241,ICQ#4918962]

Cool .Sig!

Arno
 
A

Arno Wagner

I wasn't aware of that. I am not familiar with the hardwares in PCs. :)
So one drive (even a CD/DVD-ROM drive) has problems, it affects the
other drive on the same controller/cable? I never had problems before
two weeks ago. Even on previous motherboards.

It can happen. The error you see is a timeout. If the other drive
starts to mess with the bus when it has some internal error, the
driver can see a timeout on the drive that is actually fine.

Arno
 
R

Rod Speed

I wonder if I can re-enable DMA without rebooting.

Yes, but its much better to fix whatever is causing the problem instead.
I think hdparm controls that?

Dunno, I dont bother with Linux at that level myself.
So why did my DMA go off? As a precaution?

Yes, when it decided that there is a problem with the DMA, because of the timeouts it can see.

Win does the same thing.
Hmm, I have those old fashion flat cables. I guess I will go replace it.

Yep, best thing to try first, they are so cheap.
I assume replacing the whole ribbon cable is enough?
Yes.

I didn't see how many there were in my mini-tower case (hard to see and it's crowded).

It will have two.
I assume two are in total for primary and secondary drives.
Yep.
Yeah, I don't have those. My other PC has a SATA cable that are round.
;)

Thats why this PC is playing up, its jealous.
Hmm, OK! I will go try the cable first then!

Yep, thats the best thing to do first.
 
R

ray

Hello.

For about two weeks ago, I had two of these incidents (13 days apart
between them):

hdb: dma_timer_expiry: dma status == 0x61 hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hda: DMA disabled
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success
hdb: dma_timer_expiry: dma status == 0x41 hdb: DMA timeout error
hdb: dma timeout error: status=0x58 { DriveReady SeekComplete
DataRequest } ide: failed opcode was: unknown
hdb: DMA disabled
ide0: reset: success

From this, my old Linux/Debian system became slow and unresponsive due
to high CPU usage (e.g., 7.xx in top). I had to shutdown (shutdown -r
now) Linux/Debian, reboot, and things are back to normal speed. I doubt
it is temperature related because the room is in the 60s and 70s
degrees(F) and computer wasn't working intensely (e.g., surfing the
Web).

Also, I recalled before these problems started, my motherboard (CMOS and
BIOS) didn't see both of my primary master drives (both HDDs: hda and
hdb), but can see my secondary master (DVD-ROM drive = hdc). I had to
open the case, but didn't see anything wrong. I wiggled the cable ends
for the HDDs. I booted my machine up and it seemed fine for a few days/a
week and then these errors came up (not disconnections).

I ran smartctl utility on both of my HDDs for information and results:
http://pastebin.ca/930776 ...

My full system specifications can be found here:
http://alpha.zimage.com/~ant/antfarm/about/computers.txt
(secondary/backup machine). Does that mean my decade old Quantum 6.4 GB
HDD (already made a backup just in case) is finally dying? Or is it
something else?

Thank you in advance. :)

Suggest you boot a Live CD and run badblocks on the hard drive.
 
W

Walter Mautner

Ant wrote:

....
Already did onto hda since that's the only HDD I have to back up to).
Now, how come before these errors came up my motherboard couldn't see
BOTH HDDs? Is that related or just a coincident?
Well, you have a Quantum and a Seagate, both on the same cable.
Some models of these competing manufacturers (oh that was a while ago) could
not play nice with each other. That problem needed not to show up at the
first time though ....
And one stalling/blocking harddrive (electronics) can block the whole bus.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top