More life from hard disk with bad sectors

E

Eric Gisin

Many of us have reported seeing bad blocks caused by power problems,
which cease when that is fixed.

The explanation "running out of spare sectors" is absolutely idiotic. It is
not possible to remap that many without getting lots of prior warning.
 
F

Folkert Rienstra

Wizard said:
My experience is that Blake is correct.

And another one from the school of hard knocks.

No, that is not what he said, it has nothing to do with the spare
pool being exhausted. Only how a "bad" sector manifests itself.
Bad blocks, once seen, grow like weeds.

Yup, until you stop what's causing them.
Unrecoverable read error bad sectors are usually
caused by bad power supply or drives overheating.

["Clueless gibberish" snipped]
 
F

Folkert Rienstra

Another clueless parrot troll.

Roger Blake said:
Modern hard drives do bad block forwarding, invisible to the operating
system and thus the end user. If you start actually seeing bad blocks
it means that all the spares have been used and the drive is in the
process of dying a grievous death.
 
F

Folkert Rienstra

Next time, proofread your post before sending please.

Taed Wynnell said:
I don't think that's quite right based on my understanding.
If you're WRITING to the block when the error is first detected
by the drive, then the reallocation is invisible to the OS.

Only on 'sector not found', which isn't a possibility on recent drives without
sector IDs. Faulty writes will only be noticed on the next read to the sector.
However, if a READ is the first time the error is detected, then
an error is returned to the OS (and it will "detect" a bad block),

Only when it is an unrecoverable read error bad sector. Recoverable read
error bad sectors are the ones that are replaced directly (if needbe).
but the disk then places that block number in a pending list and the reallocation
isn't done

is done
the next time it is written to.
Thus, the OS could see an error before the spares have been exhausted.

Correct for unrecoverable read error bad sectors.
This process is described in most detailed data sheets for the drives,
and the 5 or so that I've read seem to follow this general process.

Correct, except that it is not like you explained here.
 
F

Folkert Rienstra

CJT said:
Are you implying drives do a read-after-write check?

Maybe, maybe not, judged by the other typo and
that he says to have read the drive documentation.
If not (and I don't think they do such a check routinely),

They don't. Only for sectors in the bad sector candidate (pending) list.
it seems to me the only way a drive would ever detect such an error
(i.e. upon a write) is if things are so hosed the drive can't even sync
to the write location.

Right, and that doesn't happen anymore on current harddrives unless the
servo marks have been damaged and more than one sector will be lost.

Bad sectors are reallocated on recoverable reads or writes to bad sector
candidates.
 
A

Ardent

1) only works for drives smaller than 8GB

Could be. Unfortunately :) all my large disks have no bad sectors to
try with!
Btw, why not finish the experiment and zero-write the whole
drive and use it as one normally would, and see what happens.

I have tried this with other drives earlier and bad sectors remained
bad :-(
If your theory is correct it should soon die. Somehow, I doubt that.

As I said the particular drive progressively developed the bad sectors
during a period of three months - after doing what I did the disk is
continuing to work for me for a little more than two years without
developing any new bad sector.

I posted my experiment as I am sure there are a lot of DOS aficionados
using smaller disks and trying all sorts of things with hard disks, as
frequently seen in posts here and elsewhere.

Thanks for your comments

Sandy Archer
 
A

Al Dykes

Are you implying drives do a read-after-write check? If not (and I
don't think they do such a check routinely), it seems to me the only
way a drive would ever detect such an error (i.e. upon a write) is
if things are so hosed the drive can't even sync to the write location.

However, if a READ is the first time


Modern Disks have a huge percentage of the raw capacity decicated to
ECC (error correction codes) which is enough redundant data so that if
a block is unreadable the ECC codes will allow the data to be
recalculated and nothing is lost. When ECC recovery happens the block
is redirected to a spare, the data is written there and the
application has know idea anything happened. SMART data will tell you
how many times this has happened.
 
A

Al Dykes

And another one from the school of hard knocks.

No, that is not what he said, it has nothing to do with the spare
pool being exhausted. Only how a "bad" sector manifests itself.


Yup, until you stop what's causing them.
Unrecoverable read error bad sectors are usually
caused by bad power supply or drives overheating.

IMHO once I see bad blocks the disk gets yanked and replaced on any PC
that's being used for business purposes. Life't too short to have
unexpected total failures.

["Clueless gibberish" snipped]
 
F

Folkert Rienstra

Ardent said:
Could be. Unfortunately :) all my large disks have no bad sectors to
try with!


I have tried this with other drives earlier

But you didn't do it with *this* drive, the one that you did the ex-
periment on.
and bad sectors remained bad :-(

Then either the drives didn't have bad sector management or they
had exhausted their spare sector pool already.
As I said the particular drive progressively developed the bad sectors
during a period of three months - after doing what I did the disk is
continuing to work for me for a little more than two years without
developing any new bad sector.

Yes, I read it the first time.
I posted my experiment as I am sure there are a lot of DOS aficionados
using smaller disks and trying all sorts of things with hard disks, as
frequently seen in posts here and elsewhere.

Yes, but your experiment doesn't count for anything if you don't finish it
an no hard conclusions can be drawn from it.

The 'fact' that you think that you never touched the areas that you took
out of use is no fact at all as drives heads patrol the surfaces in order
to not heat-up the heads and platters tracks when idle.
 
F

Folkert Rienstra

Complete and utterly clueless drivel.
The trolls are out in full force this week.

Al Dykes said:
Modern Disks have a huge percentage of the raw capacity decicated to
ECC (error correction codes) which is enough redundant data so that if
a block is unreadable the ECC codes will allow the data to be
recalculated and nothing is lost.

If a block is unreadable it means that the ECC codes failed
to allow the data to be recalculated and data *is* lost.
When ECC recovery happens the block is redirected to a spare

Only when a set number of retries was needed to accomplish a corrected read.
Sectors that are read with ECC corrections without several retries will not be
reallocated at all.
 
F

Folkert Rienstra

Al Dykes said:
Dykes, you are a clueless troll, that can't even setup his news
client properly or worse, refuses to do so enjoying submitting
peoples reply addresses to the spammers just to annoy them.

And replacing a blown tyre is of little use when the road is
full of spikes, Dykes, nomatter how humble your opinion.

And another one from the school of hard knocks.

No, that is not what he said, it has nothing to do with the spare
pool being exhausted. Only how a "bad" sector manifests itself.


Yup, until you stop what's causing them.
Unrecoverable read error bad sectors are usually
caused by bad power supply or drives overheating.

IMHO once I see bad blocks the disk gets yanked and replaced on any PC
that's being used for business purposes. Life't too short to have
unexpected total failures.

Eric Gisin wrote:

Clueless gibberish. Read my and Folkert's reply to J Clarke.

["Clueless gibberish" snipped]
 
S

Sugdash the Mankball-launderer

Folkert said:
Dykes, you are a clueless troll, that can't even setup his news
client properly or worse, refuses to do so enjoying submitting
peoples reply addresses to the spammers just to annoy them.

(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
(e-mail address removed)
 
C

chrisv

Modern Disks have a huge percentage of the raw capacity decicated to
ECC (error correction codes) which is enough redundant data so that if
a block is unreadable the ECC codes will allow the data to be
recalculated and nothing is lost.

Huh? Maybe you're thinking of CDROM's, but not HD's...
 
E

Eric Gisin

chrisv said:
Huh? Maybe you're thinking of CDROM's, but not HD's...
A hard drive uses about 400 bits/sector ECC - 10%, a CD-ROM uses 20% for
top-level ECC.
 
F

Folkert Rienstra

Al Dykes said:

Which obviously you didn't bother to read or didn't understand a word of:

"
When a sector is written to the hard disk, the appropriate ECC codes
are generated and stored in the bits reserved for them. When the
sector is read back, the user data read, combined with the ECC bits,
*can tell the controller if any errors occurred during the read*.
Errors that can be corrected using the redundant information
are corrected before passing the data to the rest of the system.
* The system can also tell when there is too much damage to the *
* data to correct, and will issue an error notification in that event. *
The sophisticated firmware present in all modern drives uses
ECC *as part of its overall error management protocols*.
This is all done * "on the fly" * with no intervention from the user
required, and no slowdown in performance even when errors are
encountered and must be corrected.
"

http://www.storagereview.com/guide2000/ref/hdd/geom/errorRead.html

"
The hard disk's controller employs a sequence of sophisticated tech-
niques to manage errors that occur when reading data from the disk.
In a way, the system is kind of like a *troubleshooting flowchart*.
When a problem occurs, the simplest techniques are tried first, and
*if they don't work, the problem is escalated to a higher level*.
Every manufacturer uses different techniques, so this is just a rough
example guideline of how a hard disk will approach error management:

ECC Error Detection: The sector is read, and error detection is applied
to *check for any read errors*. If there are no errors, the sector is
passed on to the interface and the read is concluded successfully.

ECC Error Correction: The controller will attempt to correct the error
using the ECC codes read for the sector. The data can be corrected
very quickly using these codes, normally "on the fly" with no delay.
* If this is the case, the data is fixed and the read considered successful. *
Most drive manufacturers consider this occurrence common enough that
* it is not even considered a "real" read error. *
An error corrected at this level can be considered "automatically corrected".

Automatic Retry: The next step is usually to wait for the disk to spin
around again, and retry the read. Sometimes the first error can be
caused by a stray magnetic field, physical shock or other non-repeating
problem, and the retry will work. If it doesn't, more retries may be done.
Most controllers are programmed to retry the sector a certain number
of times before giving up.
* An error corrected after a straight retry is often *
* considered "recovered" or "corrected after retry". *

Advanced Error Correction: Many drives will, on subsequent retries
after the first, invoke more advanced error correction algorithms that
are slower and more complex than the regular correction protocols, but
have an increased chance of success.
* These errors are "recovered after multiple reads" *
* or "recovered after advanced correction". *

Failure: If the sector still cannot be read, the drive will signal a read
error to the system.
* These are "real", unrecoverable read errors, the kind *
* that result in a dreaded error message on the screen. *
"

http://www.storagereview.com/guide2000/ref/hdd/geom/errorMapping.html

"Many drives are smart enough to realize that if a sector can only be read
after retries, the chances are good that something bad may be happening
to that sector, and the next time it is read it might not be recoverable.
For this reason, the drive will usually do something when it has to use
retries to read a sector
*(but usually not when ECC will correct the problem on the fly)*.
What the drive does depends on how it is designed."
"

From the Hitachi Deskstar 180 GXP manual :

"
9.11.1 Auto Reassign function
The sectors which show some errors may be reallocated automatically
when specific conditions are met.
The spare tracks for reallocation are located at regular intervals from
Cylinder 0. The conditions for auto-reallocation are described below.

9.11.1.1 Nonrecovered write errors
When a write operation cannot be completed after the Error Recovery
Procedure (ERP) is fully carried out, the sector(s) are reallocated to the
spare location. An error is reported to the host system only when the write
cache is disabled and the auto reallocation has failed.
If the Write Cache function is ENABLED when the number of available
spare sectors reaches 0 sector, both Auto Reassign function and Write
Cache function are automatically disabled.

9.11.1.2 Nonrecovered read errors
When a read operation has failed after defined ERP is fully carried out,
a hard error is reported to the host system.
* This location is registered internally as a candidate for reallocation *.
When a registered location is specified as a target of a write operation,
a sequence of media verification is performed automatically. When the
result of this verification meets the criteria, this sector is reallocated.

9.11.1.3 Recovered read errors
When a read operation for a sector has failed once and then has recovered
*at the specific ERP step*, this sector of data is automatically reallocated.
A media verification sequence may be run prior to the reallocation according
to the predefined conditions.
"

Note: Advanced Error Correction = ERP (Error recovery procedure)
(Actually I believe

Your beliefs are obviously dangerous.
data CDs have two layers of ECC, one in the media
layer, and one at something below the filesystem level.
I can't get a reference for this.)

How fortunate for you as it probably would prove you wrong again.

So to conclude:

ECC Error Correction:

*If this is the case, the data is fixed and the read considered successful*.
*it is not even considered a "real" read error*.
An error corrected at this level can be considered "automatically corrected".

Sparing:

Many drives are smart enough to realize that if a sector can only be read
after *retries*, the chances are good that something bad may be happening
to that sector, and the next time it is read it might not be recoverable.
For this reason, the drive will usually do something when it has to use
retries to read a sector.

Recovered read error bad sectors are spared only after the (a spe-
cific) ERP step (Advanced Error Correction step) has been entered.

A complete fabrication by someone who obviously hasn't got a clue.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top