On Mon, 25 Jun 2007 20:11:53 -0400, "Richard Urban"
Ignoring bad sectors is playing with fire.
Agreed. I implied this a little tersely, and felt a bit guilty about
"not showing my workings", i.e. explaining my assertions.
The hard drive manufacturers include a generous amount of spare sectors when
the drive is new. As sectors in a drive go bad (and they do) they are
silently remapped to a good sector. The hard drive diagnostic electronics
does this. Any information that can be saved - is saved.
Corollory: Any info that cannot be saved, is lost.
If a write operation fails, it's easy; the firmware just writes it to
a different sector and changes the internal addressing to use the new
sector from then on. The old one is mapped out.
But what if a read operation fails? Sure, the bad sector will be
mapped out and a good one substituted, but what becomes the contents
of the new replacement sector? Does the firmware allow the failure to
be visible to the OS, or hide it?
By the time you physically see chkdsk tell you that you have bad sectors,
all of the spares have already been used. You could have hundreds of bad
sectors before you see the first one with chkdak. Chkdsk doesn't remap them
(because there are no spares left), so chkdsk disables the sector from being
accessed.
The above misses a key point that requires (even more) detail.
IDE and later drives have "intelligence", i.e. the actual mechanics of
platters and bytes are managed by program logic contained within the
HD's firmware. SMART was added as a window into this logic, perhaps
after concerns were raised about what was being covered up. See...
http://cquirke.blogspot.com/2005/09/if-government-got-smart.html
....and while you're up...
http://cquirke.mvps.org/9x/baddata.htm
It is this firmware that manages bad sectors on the fly, using the
HD's pool of bad sectors, "below" any OS that may be running. Unless
the OS is SMART-aware and can figure out a common subset of SMART
attributes (as these vary between HD vendors), the OS has no insight
or awareness as to what the HD's firmware is up to.
Specifically, it has no access to the HD'shidden spare sectors. If it
gets to see a bad cluster, it is only because the HD's firmware has
failed to hide it via in-house management. Perhaps it was a failed
read, and the firmware had the "ethics' to report a failure back to
the system and thus the OS, or it ran out of spare sectors, or the
sector failed too suddenly to fix, or the OS timed the HD out while
the firmware was on its millionth retry to "save" the data.
But between HD firmware and ChkDsk /R, there is another player; the
NTFS driver code itself. Just as the firmware does, it will attempt
to "fix" bad sectors on the fly, but it does so at a more visible
level, swapping clusters and leaving the bad cluster marked as such.
To see what sectors have already been marked bad by HD firmware, you
need to use a SMART tool that reports attribute detail. Look at the
raw data values for Reallocated, Uncorrectable and Pending.
To see what clusters have already been marked as bad by NTFS code,
AutoChk and ChkDsk, do a ChkDsk and look at the reported bad cluster
count, which should be zero. One exception; if the partition was
imaged from a failing HD to a good one, it may "inherit" bad cluster
marks with the file system - but SMART won't lie.
So you can see, your drive is likely quite sick.
Yep.
The usual advice is to "download the HD vendor's test utility", but
these things often just show you "OK" or "Fail" as a summary.
When you look at SMART detail, you will see several columns "Current,
Worst, Threshold, Value, Status".
The way these operate vary, but typically, there will be a "raw value
data" column that increases in count, sometimes resetting and starting
over. Think of this as like the second hand on a clock. When this
"clicks over", the value of Current decreases by one. The Worst field
keeps a copy of the worst Current value, if the Current counter is
reset every now and then. The Threshold is the point at or below
which the Current or Worst value will trigger a Status change to Fail.
So, you can lose a LOT of bad sectors before a climbing count in
Pending, Uncorrectable, etc. trigger a notch down in Current and
Worst, and many multiples of these before these reach the Threshold
and trigger SMART to return Fail for that attribute.
So when you use the vendor's tool that just says "OK", all you know is
that no attribute has gone all the way down. It's like a doctor who
won't pronounce anyone dead until their skin has rotted off to reveal
the skull beneath, or an automatic parachute that opens on impact.
That's why I advocate a more detailed tool such as HD Tune, even
though you need to understand SMART attributes to interpret the
results. For example, several scary-looking attributes like "Seek
Errors" will routinely have raw values in the thousands ;-)
--------------- ----- ---- --- -- - - -
Error Messages Are Your Friends