EIDE Bad Sector Processing ?

R

Ron Reaugh

I'm seeing a set of behaviors over a period of time across several different
mfgs' EIDE HDs(mfged in that last 3 years) that leads me to a new
hypothesis.

It has always been my understanding that the bad sector processing procedure
was one whereby if the drive began to find a sector that was hard to
read(retries required) then it would copy the data to one of is spare
sectors and move that bad sector to the 'bad pool' onboard the HD. This
process would happen on the fly and be essentially transparent to the user
and OS.

Also it was my understanding that if the HD suddenly found a sector that it
could not read and get a completely proper data validation check then the
drive would return a error to the OS declaring a read failure.

When's the last time anyone ever saw a HD data read failure during normal
processing on an EIDE HD in 98SE, ME or XP off a standard mobo EIDE
controller for just a single sector for a small few bit data flaw? When the
last time anyone ever saw a "small read error" in defrag? I've always
ascribed my 'no' aswers to the above two questions to a belief that flaw
processing was working and one didn't ever see sectors go full bad anymore.

The theory that I'm developing is that the drive's internal flawing process
is in fact doing what I describe above but it is also doing it when it can't
completely with validation read a sector before copying it to the spare OR
it's not doing the flawing and leaving the 'mostly readable' sector with the
error and the OS(something) is letting it 'slide by' unreported.

This only becomes apparent during critical processes like Windows
initialization where one simply sees "Windows Protection Error". In more
that one case I've traced such "Windows Protection Error"s to code (DLL or
EXE) that's sat on the HD untouched with NO intervening defrag for along
time prior. A sector just suddenly has bad data in it and Scandisk thorough
isn't finding it. I believe that I'm seeing a slow accumulation of
unreported bad sectors on drives that Scandisk thorough does not find nor
fix. Those accumulating few bad bits would likely go unnoticed anywhere
else or just be fixed by reinstalling an app with a shrug as to what the
problem had been.

I've seen this behavior on quite a number of drives in the several months
prior to a drive obviously going bad or entering a period where the flaw
processing is using up the spares. I see this especially on HDs where a
Scandisk thorough keeps restarting back to checking the folders when only
Explorer and Scandisk are in task manager. I'm guessing that lots of
flawing is going on. Some how the drive is signalling the OS to restart
Scandisk(something changed) without ever declaring any errors.

One recent case I studied was on a Quantum 20GB LCT15 which has ME and just
suddenly quit booting on a Windows Protection Error. No defrag in >6
months. A module in the Nvidia driver was bad(I assume reloading the Nvidia
driver fixed it) and that driver was untouched for many months before.
Scandisk thorough took 15 hours to complete restarting every 5 minutes or
so. The Scandisk finished with NO bad sectors being reported nor any other
error message(normal completion).

I then ran the latest Maxtor Diags and it passed the advanced test. I then
ran the 'low level format' and it went to completion and reported "PASSED"
and didn't take unusually long(1-2 hours). However I noticed that only
39102208 blocks were available rather than the target value for the drive
that had been displayed all along during the low level format 39102336. All
the spares must be used up.

I can find no other reasonable explanation. HD data is going bad and is
not being reported or not being detected.

My theory is that EIDE HD mfgs have determined that they can decrease their
support and warranty costs if they subtly change the error processing to let
'little errors' go by. A bit her and a bit there and nobody will notice,
which is true in the vast majority of cases. Read no evil....report no evil
for a few bits here and there saves them big bucks.

Anyone got any ideas/information?
 
J

Joep

Ron Reaugh said:
I'm seeing a set of behaviors over a period of time across several different
mfgs' EIDE HDs(mfged in that last 3 years) that leads me to a new
hypothesis.

It has always been my understanding that the bad sector processing procedure
was one whereby if the drive began to find a sector that was hard to
read(retries required) then it would copy the data to one of is spare
sectors and move that bad sector to the 'bad pool' onboard the HD. This
process would happen on the fly and be essentially transparent to the user
and OS.

It's my understanding that no remapping takes place during reads. However
sectors can become candidates for remapping.
Also it was my understanding that if the HD suddenly found a sector that it
could not read and get a completely proper data validation check then the
drive would return a error to the OS declaring a read failure.

That's what I'd expect as well.
When's the last time anyone ever saw a HD data read failure during normal
processing on an EIDE HD in 98SE, ME or XP off a standard mobo EIDE
controller for just a single sector for a small few bit data flaw?

We see reports of that on a regular basis.
When the
last time anyone ever saw a "small read error" in defrag? I've always
ascribed my 'no' aswers to the above two questions to a belief that flaw
processing was working and one didn't ever see sectors go full bad anymore.

Same as above.
The theory that I'm developing is that the drive's internal flawing process
is in fact doing what I describe above
No.

but it is also doing it when it can't
completely with validation read a sector before copying it to the spare OR
it's not doing the flawing and leaving the 'mostly readable' sector with the
error and the OS(something) is letting it 'slide by' unreported.

Never seen proof for that.
This only becomes apparent during critical processes like Windows
initialization where one simply sees "Windows Protection Error". In more
that one case I've traced such "Windows Protection Error"s to code (DLL or
EXE) that's sat on the HD untouched with NO intervening defrag for along
time prior. A sector just suddenly has bad data in it and Scandisk thorough
isn't finding it. I believe that I'm seeing a slow accumulation of
unreported bad sectors on drives that Scandisk thorough does not find nor
fix.

I have always seen those, in the past on Win98 systems, where Scandisk did
NOT find 'bad sectors' using standard settings, however after modifying the
scandisk.ini file and increasing the number of passes 'bad sectors' WERE
detected. Probably flaky sectors, one read they're okay, the next they
aren't.
Those accumulating few bad bits would likely go unnoticed anywhere
else or just be fixed by reinstalling an app with a shrug as to what the
problem had been.

I've seen this behavior on quite a number of drives in the several months
prior to a drive obviously going bad or entering a period where the flaw
processing is using up the spares. I see this especially on HDs where a
Scandisk thorough keeps restarting back to checking the folders when only
Explorer and Scandisk are in task manager. I'm guessing that lots of
flawing is going on. Some how the drive is signalling the OS to restart
Scandisk(something changed) without ever declaring any errors.

I think you're delirious.
One recent case I studied was on a Quantum 20GB LCT15 which has ME and just
suddenly quit booting on a Windows Protection Error.

For which 100's of causes could be accountable.
No defrag in >6
months. A module in the Nvidia driver was bad(I assume reloading the Nvidia
driver fixed it) and that driver was untouched for many months before.
Scandisk thorough took 15 hours to complete restarting every 5 minutes or
so. The Scandisk finished with NO bad sectors being reported nor any other
error message(normal completion).

A file being bad does not mean the sector is bad.
I then ran the latest Maxtor Diags and it passed the advanced test. I then
ran the 'low level format' and it went to completion and reported "PASSED"
and didn't take unusually long(1-2 hours). However I noticed that only
39102208 blocks were available rather than the target value for the drive
that had been displayed all along during the low level format 39102336. All
the spares must be used up.

How do you mean 'available'? Available where?
I can find no other reasonable explanation.

That doesn't mean a thing.
HD data is going bad and is
not being reported or not being detected.

What do the SMART attributes say, what are raw values for reallocated
sectors?
My theory is that EIDE HD mfgs have determined that they can decrease their
support and warranty costs if they subtly change the error processing to let
'little errors' go by. A bit her and a bit there and nobody will notice,
which is true in the vast majority of cases. Read no evil....report no evil
for a few bits here and there saves them big bucks.

Anyone got any ideas/

You need medication of some sort. I hate the paranoia conspiracy theories
especially when they're built upon nothing.
[/QUOTE]
 
R

Ron Reaugh

Joep said:
It's my understanding that no remapping takes place during reads. However
sectors can become candidates for remapping.

That's not my understanding. If the read was marginal but recoverable then
the remap must take place now as later the read may not be marginal but bad.
If the read is bad(unrecoverable) now then that's when I've heard something
about later and after a write. But if the read is bad(unrecoverable) now
then the drive should continue to return an error as long as the read has an
unrecoverable error. Operationally with respect to coherrent OS and data
integrity how else could it be?
That's what I'd expect as well.


We see reports of that on a regular basis.

I haven't in the above OSs and I don't recall seeing a mention of many such
anywhere of the >type< I described. When's the last time anyone saw a
single sector used by a file go bad/found bad during a defrag. That case
should be a very high probability of finding such and I always run a defrag
on a disk with suspect behavior and have yet to ever find a 'simple sector
error'. It's taken a long time for this to percolate to my attention.
Something is wrong; something doesn't fit.
Same as above.


No.

Based on what? I believe otherwise and have seen too many cases now.
Never seen proof for that.

I just experienced such a case. System suddenly wont boot into Windows ME.
Some file in the Nvidia drivers has gone bad. The system is used by a
novice and hadn't been on in a month. There's been no defrags, updates nor
maintenance of any kind done. The drive passes Scandisk thorough in
Safemode that will boot. Nothing reports any kind of bad read yet this
driver has some bad code causing "Windows protection error". I've seen
this a number of times before in various forms but never recognized it.
What other explanation is there?
(DLL

I have always seen those, in the past on Win98 systems, where Scandisk did
NOT find 'bad sectors' using standard settings, however after modifying the
scandisk.ini file and increasing the number of passes 'bad sectors' WERE
detected. Probably flaky sectors, one read they're okay, the next they
aren't.

Five WinME boots in a row failed. Three Scandisk thoroughs found nothing so
your assertion is suspect. Further such a marginal sector would be flawed.
The drive is designed to remap marginal sectors before they go full bad. If
a drive is not so designed then there might as well not be any flaw
processing at all.
I think you're delirious.

Garner greater HD experience first.
For which 100's of causes could be accountable.
Huh?


A file being bad does not mean the sector is bad.

Huh? A file with an unrecoverable read error sector is a bad file. I
suppose you're invoking evil demons doing random rogue writes changing a few
bits here and there(hmm I guess that's a read write.).
How do you mean 'available'? Available where?

At the end of the Maxtor "low level format" routine that's what was
reported...the max LBA(39102208). The max LBA expected was displayed from
the beginning of the "low level" run as 39102336. It got smaller yet
"PASSED".
That doesn't mean a thing.

What seems to be meaning little are your responses.
What do the SMART attributes say, what are raw values for reallocated
sectors?

All the SMART results were "PASSED". I didn't look any any low level info
for the drive and generally it's not relevant to my assertions.
You need medication of some sort. I hate the paranoia conspiracy theories
especially when they're built upon nothing.

Gather more experience in the field before you spout.

If someone had the time it would be rather easy to test my hypothesis.
Write a HD inventory program that snapshots all the files on a HD and CRCs
them along with a FAT(allocation) list for each. See if a files CRC ever
changes without an allocation change etc.

I've just seen too many cases of a "small error" that seemed to come from
nowhere and wasn't being caught nor reported. If such were happening then
it would be very difficult to spot short of specifically looking in a
fashion such as I've described above.
 
J

Joep

Ron Reaugh said:
That's not my understanding. If the read was marginal but recoverable then
the remap must take place now as later the read may not be marginal but
bad.

recovery takes place all the time.
I haven't in the above OSs and I don't recall seeing a mention of many such
anywhere of the >type< I described. When's the last time anyone saw a
single sector used by a file go bad/found bad during a defrag. That case
should be a very high probability of finding such and I always run a defrag
on a disk with suspect behavior and have yet to ever find a 'simple sector
error'. It's taken a long time for this to percolate to my attention.
Something is wrong; something doesn't fit.

Just because you think a disk is supect it doesn't mean it is. This may
explain why your defrag hypothesis doesn't fly.

Based on what? I believe otherwise and have seen too many cases now.

Believe has nothing to do with it.
spare

I just experienced such a case. System suddenly wont boot into Windows ME.
Some file in the Nvidia drivers has gone bad. The system is used by a
novice and hadn't been on in a month. There's been no defrags, updates nor
maintenance of any kind done. The drive passes Scandisk thorough in
Safemode that will boot. Nothing reports any kind of bad read yet this
driver has some bad code causing "Windows protection error". I've seen
this a number of times before in various forms but never recognized it.
What other explanation is there?

A corrupt file does not mean by definition a bad sector at all.

No, the FILE has bad data.
I believe that I'm seeing a slow accumulation of

Five WinME boots in a row failed.

Yes, and probably 20 more if you try 20 more times, so what?
Three Scandisk thoroughs found nothing so
your assertion is suspect.

No yours is: there are no 'bad sectors' on that disk, however there are
corrupt files.
Further such a marginal sector would be flawed.
The drive is designed to remap marginal sectors before they go full bad.

ECC corrections take place all the time.
If
a drive is not so designed then there might as well not be any flaw
processing at all.



Huh?

Huh? What huh? A windows protection error can be caused by a number of
things, in essence it's a memory error.
Huh? A file with an unrecoverable read error sector is a bad file. I
suppose you're invoking evil demons doing random rogue writes changing a few
bits here and there(hmm I guess that's a read write.).

Again, a corrupt file is not the same as a bad sector. PERIOD. No evil
demons required, corrupt memory would do.
At the end of the Maxtor "low level format" routine that's what was
reported...the max LBA(39102208). The max LBA expected was displayed from
the beginning of the "low level" run as 39102336. It got smaller yet
"PASSED".

Spares are transparent, menaing, you never get to see them. They're not
reported when querying int13h params
What seems to be meaning little are your responses.

Dig deeper.

All the SMART results were "PASSED". I didn't look any any low level info
for the drive and generally it's not relevant to my assertions.

Well I'd say they are relevant as it gives an inside in the actual number of
remapped sectors.
If someone had the time it would be rather easy to test my hypothesis.
Write a HD inventory program that snapshots all the files on a HD and CRCs
them along with a FAT(allocation) list for each. See if a files CRC ever
changes without an allocation change etc.

To prove what? Corrupt files?
I've just seen too many cases of a "small error" that seemed to come from
nowhere and wasn't being caught nor reported.

No, files can be corrupted without the disk going bad.
If such were happening then
it would be very difficult to spot short of specifically looking in a
fashion such as I've described above.

No, it wouldn't tell anything other than some files are corrupt.
 
J

Joep

Ron Reaugh said:
As was your whole post.

Nope: Your paranoid ideas are based upon the assumption that a corrupt file
is by definition caused by a bad sector, which ain't true.
 
F

Folkert Rienstra

Ron Reaugh said:
As was your whole post.

I agree, it should never have been posted.
People like yourself that consistently fail or worse, even refuse to
setup their newsreader properly should be ignored in the first place.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top