Bad Clusters

G

Guest

I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:

Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.

I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.

2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors?

3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?

I could find no documentation that clearly explains all this. Does anyone
know of a good source of documentation for these kinds of questions?

Regards,
Robert Reader
 
R

Rock

Robert said:
I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:

Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.

I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.

2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors?

3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?

I could find no documentation that clearly explains all this. Does anyone
know of a good source of documentation for these kinds of questions?

Regards,
Robert Reader

You might want to check out the health of the drive. Download a drive
diagnostic utility from the drive manufacturer's web site. That will
create a bootable floppy or CD. Boot from that and run the diagnostics.
 
S

Steve N.

Robert said:
I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:

Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.

I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.

2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors?

3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?

I could find no documentation that clearly explains all this. Does anyone
know of a good source of documentation for these kinds of questions?

Regards,
Robert Reader

This is the most in depth article I've found so far.

http://support.microsoft.com/kb/314835/EN-US/

Steve
 
L

Lil' Dave

Robert Reader said:
I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:

Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.

I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.

Yep.

2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors? http://www.pcguide.com/ref/hdd/file/clustClusters-c.html
http://www.pcguide.com/ref/hdd/file/clustChaining-c.html
http://www.pcguide.com/ref/hdd/geom/tracks.htm
http://www.pcguide.com/ref/hdd/geom/tracksSector-c.html
http://www.pcguide.com/ref/hdd/geom/formatDefect-c.html


3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?

NTFS or FAT32, which?
I could find no documentation that clearly explains all this. Does anyone
know of a good source of documentation for these kinds of questions?

Not sure if XP's version of NTFS has a sector correlation with clusters.
FAT32 does.
 
C

cquirke (MVP Windows shell/user)

On Fri, 12 Aug 2005 04:58:03 -0700, "Robert Reader"
I recently ran a chkdsk with the repair option on an external hard drive, and
it reported the following on one of the files:
Windows replaced bad clusters in file 99407
of name \Archive\SOFTWA~1\TopoMaps\CONSOL~1\TopoMaps.zip.

That means the hard drive is dying. Evacuate data and replace it.
I have several questions:
1) Does this mean the file it refers to is now corrupt? The file was still
present on the disk after chkdsk finished. If bad clusters were found, I
assume the data in the clusters is now trash.

Yep, most likely it will be.

It's important not to confuse bad clusters with lost clusters.

Lost clusters are chained out of the free space, but have no directory
entry that defines them as a file or subdirectory. This is a file
system logic error that follows bad exits in particular.

Bad clusters contain physically bad sectors, and indicate the hard
drive is failing at the hardware level. Listen up!


For a cluster to show up as bad, it has to escape the hard drive's
built-in defect "management". When the hard drive's firmware detects
excessive retries are required to read a sector without error, it will
try to copy the contents to another "spare" sector. If it succeeds,
then the "spare" is assigned the address of the sick sector, and the
sick sector is never used again. All of this happens below the
awareness of the rest of the PC; the OS has no clue, and nothing shows
up in ChkDsk logs, etc. It's successfullt swept under the carpet.

There's only one window into this process, and that is S.M.A.R.T.,
which can read the hard drive's internal record-keeping, and thus
hopefully see to what extent the HD has been covering up defects.

Windows has zero built-in S.M.A.R.T. awareness, even though this
facility has been around since 2G drives of the Win9x era.

BIOS can report S.M.A.R.T. status as part of the startup process, but
by duhfault, this reporting is disabled.

Hard drive vendor's diagnostics typically just give you an "OK" or
"fail" summary, with no detail at all. An alarming amount of
impending carnage is rubber-stamped as "OK".

When you finally find a 3rd-party tool that reports raw S.M.A.R.T.
detail, you find it's pretty hard to understand; the values just don't
make sense, unless you know how to interpret them. Then you find that
even though the summary says "OK", there have been x sectors that had
to be "fixed" on the fly, etc. Not nearly so "OK" after all.


So by the time a cluster shows as "bad", it's had to be so bad that
the hard drive's attempts to paper over it and deny there's a problem,
have failed. Even at this stage, the problem can be covered up - this
time by the OS code that operates the "better" NTFS file system.

This code will do exactly the same thing that the hard drive firmware
tried to do; read the data out of the failing allocation unit (cluster
rather than sector, this time) and copy it somewhere else, marking the
original cluster as "bad". This time there would be visible signs
within the OS's record-keeping of clusters - if you could ever get a
clear view of that information, that is.

Once a cluster's marked out of use as "bad", further ChkDsk /R or
AutoChk tests will not test it again. So an elective test may report
"no (new) bad clusters" even when there have been 20 bad clusters
successfully "fixed" by NTFS's on-the-fly fiddling, and another 30 bad
sectors successfully "fixed" by the HD's internal defect management.


I think you can see what all this means - that despite any claims to
the contrary, the game is rigged to hide impending HD failure from
you, hopefully until the HD's warranty period has expired. The large
print may claim your vendors really care about your data, but the
small print confirms they just want to duck support calls.
2) Are bad clusters the same as bad sectors, ie, if a bad cluster exists it
means it contains one or more bad sectors?

A sector is a hardware-level unit of storage, typically containing 512
bytes. A cluster is a file system level unit of data storage,
containing a power of 2 sectors - typically 8, for 4k clusters.

Yes, a newly-discovered bad cluster means one or more bad sectors,
unless something has faked the marking process for some reason.
Viruses used to do that long ago; I don't think today's OSs lend
themselves to that particular way of hiding malicious code anymore.

There's one circumstance in which existing bad clusters do not mean a
failing hard drive, and that is where the contents of failing hard
drive are imaged (copied exactly) to a good replacement hard drive.
The raw imaging process will preserve the existing bad cluster marks,
even though they no longer refer to actual bad clusters.
3) Further down in the same chkdsk report, it reported "0 KB in bad
sectors". Why does it report this when it had just found some bad clusters?
Does it report this because it had "fixed" them, ie, replaced them with
spares, and bad sectors no longer exist?

Good question. As one who does data recovery, and who has seen too
many "too late" dead drives that ate data which might have been saved
if an earlier alarm had been raised, I'm not inclined to trust vendor
reporting, especially ChkDsk. "Everything's fine" may be a lie, but
"hey, something might possibly be wrong" is certain to mean trouble.

-- Risk Management is the clue that asks:
"Why do I keep open buckets of petrol next to all the
ashtrays in the lounge, when I don't even have a car?"
 
R

R. McCarty

Good, Insightful posting - Explained very well.

cquirke (MVP Windows shell/user) said:
On Fri, 12 Aug 2005 04:58:03 -0700, "Robert Reader"



That means the hard drive is dying. Evacuate data and replace it.



Yep, most likely it will be.

It's important not to confuse bad clusters with lost clusters.

Lost clusters are chained out of the free space, but have no directory
entry that defines them as a file or subdirectory. This is a file
system logic error that follows bad exits in particular.

Bad clusters contain physically bad sectors, and indicate the hard
drive is failing at the hardware level. Listen up!


For a cluster to show up as bad, it has to escape the hard drive's
built-in defect "management". When the hard drive's firmware detects
excessive retries are required to read a sector without error, it will
try to copy the contents to another "spare" sector. If it succeeds,
then the "spare" is assigned the address of the sick sector, and the
sick sector is never used again. All of this happens below the
awareness of the rest of the PC; the OS has no clue, and nothing shows
up in ChkDsk logs, etc. It's successfullt swept under the carpet.

There's only one window into this process, and that is S.M.A.R.T.,
which can read the hard drive's internal record-keeping, and thus
hopefully see to what extent the HD has been covering up defects.

Windows has zero built-in S.M.A.R.T. awareness, even though this
facility has been around since 2G drives of the Win9x era.

BIOS can report S.M.A.R.T. status as part of the startup process, but
by duhfault, this reporting is disabled.

Hard drive vendor's diagnostics typically just give you an "OK" or
"fail" summary, with no detail at all. An alarming amount of
impending carnage is rubber-stamped as "OK".

When you finally find a 3rd-party tool that reports raw S.M.A.R.T.
detail, you find it's pretty hard to understand; the values just don't
make sense, unless you know how to interpret them. Then you find that
even though the summary says "OK", there have been x sectors that had
to be "fixed" on the fly, etc. Not nearly so "OK" after all.


So by the time a cluster shows as "bad", it's had to be so bad that
the hard drive's attempts to paper over it and deny there's a problem,
have failed. Even at this stage, the problem can be covered up - this
time by the OS code that operates the "better" NTFS file system.

This code will do exactly the same thing that the hard drive firmware
tried to do; read the data out of the failing allocation unit (cluster
rather than sector, this time) and copy it somewhere else, marking the
original cluster as "bad". This time there would be visible signs
within the OS's record-keeping of clusters - if you could ever get a
clear view of that information, that is.

Once a cluster's marked out of use as "bad", further ChkDsk /R or
AutoChk tests will not test it again. So an elective test may report
"no (new) bad clusters" even when there have been 20 bad clusters
successfully "fixed" by NTFS's on-the-fly fiddling, and another 30 bad
sectors successfully "fixed" by the HD's internal defect management.


I think you can see what all this means - that despite any claims to
the contrary, the game is rigged to hide impending HD failure from
you, hopefully until the HD's warranty period has expired. The large
print may claim your vendors really care about your data, but the
small print confirms they just want to duck support calls.


A sector is a hardware-level unit of storage, typically containing 512
bytes. A cluster is a file system level unit of data storage,
containing a power of 2 sectors - typically 8, for 4k clusters.

Yes, a newly-discovered bad cluster means one or more bad sectors,
unless something has faked the marking process for some reason.
Viruses used to do that long ago; I don't think today's OSs lend
themselves to that particular way of hiding malicious code anymore.

There's one circumstance in which existing bad clusters do not mean a
failing hard drive, and that is where the contents of failing hard
drive are imaged (copied exactly) to a good replacement hard drive.
The raw imaging process will preserve the existing bad cluster marks,
even though they no longer refer to actual bad clusters.


Good question. As one who does data recovery, and who has seen too
many "too late" dead drives that ate data which might have been saved
if an earlier alarm had been raised, I'm not inclined to trust vendor
reporting, especially ChkDsk. "Everything's fine" may be a lie, but
"hey, something might possibly be wrong" is certain to mean trouble.


"Why do I keep open buckets of petrol next to all the
ashtrays in the lounge, when I don't even have a car?"
 
G

Guest

Thanks for the detailed answer. I have two external Maxtor One touch hard
drives I use for backup. I bought them 17 months ago and I ran the chkdsk
just to make sure they were holding out OK. BOTH drives issued a "bad
clusters" notice on one (not the same) file. I hope this doesn't mean they
have both decided to self-destruct simultaneously.

I sure wish someone would create a utility that would give you a clear
picture of what's going on with the drive - whether its starting to cascade
into failure or whether a bad sector was just a solitary, lone occurance.
Like you said, I don't know if dozens of other sectors have recently been
marked out as bad, which would suggest the catastrophic scenario.

Thanks again,
Robert Reader
 
C

cquirke (MVP Windows shell/user)

On Sun, 14 Aug 2005 10:09:01 -0700, "Robert Reader"
Thanks for the detailed answer. I have two external Maxtor One touch hard
drives I use for backup. I bought them 17 months ago and I ran the chkdsk
just to make sure they were holding out OK. BOTH drives issued a "bad
clusters" notice on one (not the same) file. I hope this doesn't mean they
have both decided to self-destruct simultaneously.

I'd replace them, if they're still under warranty. The only exception
would be if you'd imaged from a failing drive to this one, and thus
possibly carried over now-irrelevant bad cluster markers.
I sure wish someone would create a utility that would give you a clear
picture of what's going on with the drive - whether its starting to cascade
into failure or whether a bad sector was just a solitary, lone occurance.

This starts to look like the Halting Problem ;-)

If you knew enough to reach the answer, you'd have reached the answer.
Like you said, I don't know if dozens of other sectors have recently been
marked out as bad, which would suggest the catastrophic scenario.

Hard drives don't always die in a linear fashion, a few bad sectors at
a time. So catastrophic failure could always happen - e.g. if a logic
board fries, or a motor burns out.

However, progressive surface failure, either actual or apparent (e.g.
head positioning wobble that reduces accuracy of pickup, or a head
that doesn't work as well as it used to and thus can only read the
strongest of signals) is a common failure pattern.

Why would a previously good HD develop a bad sector?

Either because the ability to read the surface has generally
deteriorated, so that the weakest readable sector is now the strongest
non-readable sector (i.e. the first dead swallow of winter) or
something more focal has gone wrong with that particular part of the
drive. Which of these scenarios is the most comforting?

Well, if it's the first, then that's very un-comforting. Whatever
progressed to make this weakest sector no longer readable, will likely
progress to make other sectors unreadable too.

If the second, then; what could go wrong in one part of a disk?

The only thing that really comes to mind is what is called a "head
crash", i.e. where the head touches the disk and thus damages it.

Why would the head touch the disk - or perhaps, aren't the heads
always touching the disk anyway?

Unlike diskette drives, the heads of a hard drive are not supposed to
actually touch the data-bearing parts of the disk. They are supposed
to fly very close to the disk, riding on the cushion of air that
"sticks" to the surface of the disk as it spins - something that
actually works, when the disk is spinning at proper speed.

I can think of three things that can go wrong here.

Firstly, the disk could get bumped while it's active, causing the
heads to strike the disk in much the same way that a record would skip
tracks when you kick the turntable (as DJs and pensioners will know).

Secondly, the gap between disk and head is so small, that even a
particle of cigarette smoke may be big enough to have the same effect
as hitting a boulder on the highway. Head hits the object, and
bounces DING Ding ding on the disk; that could be today's bad cluster
and tomorrow's extension of this initial dead zone.

Thirdly, the disk might slow down due to a power drop, causing the
heads to pinch through the air cushion and touch the disk - and as the
disk is still likely to be moving apace, that's gonna hurt.

So we have scenarios that either cause physical damage, or are caused
by impurities within the supposedly-clean sealed airspace. In fact,
the one is likely to cause the other; a physical head strike may kick
up debris that from then on may swirl around in this now-polluted
airspace, and possibly cause further collisions and abrasion.

If the airspace is not polluted from head strike debris, why would it
be polluted? The most likely cause would be a failure of the air
filtering, such that the airspace is not so "sealed" after all.

When the heads hit the disk, is it only that part of the disk that is
damaged? Or are the heads also damaged, and thus left working
slightly less well than before?

Considering all this, how would you postulate a scenario of "just one
bad sector" without an increased likelyhood of more to follow?

The only one I can think of would be an absence of airspace pollution
from a failing air filter etc., plus an isolated head strike that
damaged the disk enough to cause a bad sector, but did not damage the
head enough to weaken its ability to function, plus the strike did not
kick up any loose debris, or leave the damaged surface unsmooth enough
to snag the head again. That's asking for a LOT of faith, IMO.


---------- ----- ---- --- -- - - - -
Failure is not an option.
It's built into the software
 
G

Guest

I'm getting the feeling that these bad cluster errors are spurious and don't
represent a real problem.

I tried an experiment. On one of the external drives, the bad cluster error
was reported in a 4 GB file which is a zip file of map images from a mapping
application I have. First, I ran "chkdsk /f /r" a couple times in a row, and
the same "Windows replaced bad clusters in file ..." appeared each time,
indicating the clusters were not really fixed by Windows, and they keep
appearing in the same file. In addition, I ran an integrity check on the zip
file where it does CRC checking of the internal zip structure an no errors
were found in the file (if bad clusters were either "bad" or subsequently
"fixed" one might expect corruption.)

Then I deleted the file, and ran chkdsk again. Chkdsk reported no errors at
all. (If there was really a problem, it should have been detected during the
"checking free space" stage of the process.) I then re-zipped the images
from my main hard drive in a fresh zip file and copied the file to the
external drive again. I ran chkdsk and got the "bad clusters" error again in
that file.

This makes me suspect that chkdsk is reacting in some strange way to that
file and is not really detecting bad clusters at all.

This does not give me much confidence in chkdsk.
 
C

cquirke (MVP Windows shell/user)

On Fri, 19 Aug 2005 06:21:02 -0700, "Robert Reader"
I'm getting the feeling that these bad cluster errors are spurious and don't
represent a real problem.
I tried an experiment. On one of the external drives, the bad cluster error
was reported in a 4 GB file which is a zip file of map images from a mapping
application I have. First, I ran "chkdsk /f /r" a couple times in a row, and
the same "Windows replaced bad clusters in file ..." appeared each time,
indicating the clusters were not really fixed by Windows, and they keep
appearing in the same file.

Not that true bad clusters can be "fixed", but IKWYM; it's not even
succeeding in pretending they're "fixed".
In addition, I ran an integrity check on the zip file where it does CRC
checking of the internal zip structure an no errors were found in the file
(if bad clusters were either "bad" or subsequently "fixed" one might
expect corruption.)
Yup.

Then I deleted the file, and ran chkdsk again. Chkdsk reported no errors at
all. (If there was really a problem, it should have been detected during the
"checking free space" stage of the process.)

Yes... could be that it isn't doing that check.
I then re-zipped the images from my main hard drive in a fresh zip file
and copied the file to the external drive again. I ran chkdsk and got the
"bad clusters" error again in that file.
This makes me suspect that chkdsk is reacting in some strange way to that
file and is not really detecting bad clusters at all.

Yes, it looks like that, and that's strange. Is there any way to tell
whether it's throwing errors at the same place?
This does not give me much confidence in chkdsk.

Me neither. I'd get a "second opinion" from something like HD-Tune's
surface check, and if that also shows errors, the problem may be in
the process of writing large, sustained material to the HD, such as
local power levels etc. Even if that were the case, I'd expect
theerror persist in free space after the file was deleted, given that
deleting a file doesn't change the contents of the clusters holding
the file's data (it just marks them as "free for use" within disk
structure info that's held elsewhere).

Strange... let me know what www.hdtune.com shows up.


------------ ----- ---- --- -- - - - -
The most accurate diagnostic instrument
in medicine is the Retrospectoscope
 
P

Plato

=?Utf-8?B?Um9iZXJ0IFJlYWRlcg==?= said:
I'm getting the feeling that these bad cluster errors are spurious and don't
represent a real problem.

It's never been a perfect science.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top