On Fri, 12 Aug 2005 04:58:03 -0700, "Robert Reader"
I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:
Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.
That means the hard drive is dying. Evacuate data and replace it.
I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.
Yep, most likely it will be.
It's important not to confuse bad clusters with lost clusters.
Lost clusters are chained out of the free space, but have no directory
entry that defines them as a file or subdirectory. This is a file
system logic error that follows bad exits in particular.
Bad clusters contain physically bad sectors, and indicate the hard
drive is failing at the hardware level. Listen up!
For a cluster to show up as bad, it has to escape the hard drive's
built-in defect "management". When the hard drive's firmware detects
excessive retries are required to read a sector without error, it will
try to copy the contents to another "spare" sector. If it succeeds,
then the "spare" is assigned the address of the sick sector, and the
sick sector is never used again. All of this happens below the
awareness of the rest of the PC; the OS has no clue, and nothing shows
up in ChkDsk logs, etc. It's successfullt swept under the carpet.
There's only one window into this process, and that is S.M.A.R.T.,
which can read the hard drive's internal record-keeping, and thus
hopefully see to what extent the HD has been covering up defects.
Windows has zero built-in S.M.A.R.T. awareness, even though this
facility has been around since 2G drives of the Win9x era.
BIOS can report S.M.A.R.T. status as part of the startup process, but
by duhfault, this reporting is disabled.
Hard drive vendor's diagnostics typically just give you an "OK" or
"fail" summary, with no detail at all. An alarming amount of
impending carnage is rubber-stamped as "OK".
When you finally find a 3rd-party tool that reports raw S.M.A.R.T.
detail, you find it's pretty hard to understand; the values just don't
make sense, unless you know how to interpret them. Then you find that
even though the summary says "OK", there have been x sectors that had
to be "fixed" on the fly, etc. Not nearly so "OK" after all.
So by the time a cluster shows as "bad", it's had to be so bad that
the hard drive's attempts to paper over it and deny there's a problem,
have failed. Even at this stage, the problem can be covered up - this
time by the OS code that operates the "better" NTFS file system.
This code will do exactly the same thing that the hard drive firmware
tried to do; read the data out of the failing allocation unit (cluster
rather than sector, this time) and copy it somewhere else, marking the
original cluster as "bad". This time there would be visible signs
within the OS's record-keeping of clusters - if you could ever get a
clear view of that information, that is.
Once a cluster's marked out of use as "bad", further ChkDsk /R or
AutoChk tests will not test it again. So an elective test may report
"no (new) bad clusters" even when there have been 20 bad clusters
successfully "fixed" by NTFS's on-the-fly fiddling, and another 30 bad
sectors successfully "fixed" by the HD's internal defect management.
I think you can see what all this means - that despite any claims to
the contrary, the game is rigged to hide impending HD failure from
you, hopefully until the HD's warranty period has expired. The large
print may claim your vendors really care about your data, but the
small print confirms they just want to duck support calls.
2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors?
A sector is a hardware-level unit of storage, typically containing 512
bytes. A cluster is a file system level unit of data storage,
containing a power of 2 sectors - typically 8, for 4k clusters.
Yes, a newly-discovered bad cluster means one or more bad sectors,
unless something has faked the marking process for some reason.
Viruses used to do that long ago; I don't think today's OSs lend
themselves to that particular way of hiding malicious code anymore.
There's one circumstance in which existing bad clusters do not mean a
failing hard drive, and that is where the contents of failing hard
drive are imaged (copied exactly) to a good replacement hard drive.
The raw imaging process will preserve the existing bad cluster marks,
even though they no longer refer to actual bad clusters.
3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?
Good question. As one who does data recovery, and who has seen too
many "too late" dead drives that ate data which might have been saved
if an earlier alarm had been raised, I'm not inclined to trust vendor
reporting, especially ChkDsk. "Everything's fine" may be a lie, but
"hey, something might possibly be wrong" is certain to mean trouble.
-- Risk Management is the clue that asks:
"Why do I keep open buckets of petrol next to all the
ashtrays in the lounge, when I don't even have a car?"