A Question About SMART numbers


A

Al Dykes

I've just installed a SMART utility (everest) on a Y/O laptop
and have questions about the SMART numbers it produces.
(I know that disks do ECC error recovery routinely, and
an individual event isn't a reason to replace the disk.)

I can't make sense of the relationship of the "Threshold", "Value",
"Worst", and "Data" columns because the "data" value is frequenly in
excess of "worst", but the status is still OK.

I see a few numbers here that might worry me. How is my disk doing ?

[ HITACHI_DK23DA-20 (14L6TL) ]

Threshold Value Worst Data Status
Raw Read Error Rate 50 100 100 101 OK: normal
Throughput Performance 50 100 100 4010 OK: normal
Start/Stop Count 0 98 98 2422 OK: passing
Reallocated Sector Count 10 100 100 5 OK: normal
Seek Error Rate 50 100 100 452 OK: normal
G-Sense Error Rate 0 100 99 145 OK: passing
Hardware ECC Recovered 0 100 90 113 OK: passing
Reallocation Event Count 0 100 100 5 OK: passing
Current Pending Sector Count 0 99 99 1 OK: passing
Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
Load Retry Count 0 100 100 384 OK: passing
Read Error Retry Rate 0 100 1 514 OK: passing

Thanks
 
Ad

Advertisements

J

Joep

Al Dykes said:
I've just installed a SMART utility (everest) on a Y/O laptop
and have questions about the SMART numbers it produces.
(I know that disks do ECC error recovery routinely, and
an individual event isn't a reason to replace the disk.)

I can't make sense of the relationship of the "Threshold", "Value",
"Worst", and "Data" columns because the "data" value is frequenly in
excess of "worst", but the status is still OK.

The 'worst' values relate to the 'threshold' values.
 
R

Robert Nichols

:
:I've just installed a SMART utility (everest) on a Y/O laptop
:and have questions about the SMART numbers it produces.
:(I know that disks do ECC error recovery routinely, and
:an individual event isn't a reason to replace the disk.)
:
:I can't make sense of the relationship of the "Threshold", "Value",
:"Worst", and "Data" columns because the "data" value is frequenly in
:excess of "worst", but the status is still OK.
:
:I see a few numbers here that might worry me. How is my disk doing ?
:
: [ HITACHI_DK23DA-20 (14L6TL) ]
:
: Threshold Value Worst Data Status
:Raw Read Error Rate 50 100 100 101 OK: normal
:Throughput Performance 50 100 100 4010 OK: normal
:Start/Stop Count 0 98 98 2422 OK: passing
:Reallocated Sector Count 10 100 100 5 OK: normal
:Seek Error Rate 50 100 100 452 OK: normal
:G-Sense Error Rate 0 100 99 145 OK: passing
:Hardware ECC Recovered 0 100 90 113 OK: passing
:Reallocation Event Count 0 100 100 5 OK: passing
:Current Pending Sector Count 0 99 99 1 OK: passing
:Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
:Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
:Load Retry Count 0 100 100 384 OK: passing
:Read Error Retry Rate 0 100 1 514 OK: passing

The numbers in the "Data" column are raw values. Their exact meaning is
arbitrary, but can sometimes be inferred. For example, 5 defective
sectors have been reallocated to spares. Those raw numbers are then
normalized, by formulas known only to the manufacturer, usually to a
range of 0 (worst) to 100 (best). That result is what is shown in the
"Value" column, and an alarm condition is indicated when that number
drops below the "Threshold" number. The "Worst" column shows the worst
(lowest) normalized value seen during the life of the device.

What I see in the above numbers is that in the past something happened
to the drive that caused a high Read Error Retry Rate (Worst = 1). I
would guess that resulted in the reallocation of 5 sectors, with 1
additional bad sector currently flagged for reallocation the next time
that sector is written. The Read Error Retry Rate is now back to a
normalized value of 100 (good). I'd worry if the Reallocated Sector
Count continues to grow, but otherwise the drive appears to be in good
shape.
 
A

Al Dykes

:
:I've just installed a SMART utility (everest) on a Y/O laptop
:and have questions about the SMART numbers it produces.
:(I know that disks do ECC error recovery routinely, and
:an individual event isn't a reason to replace the disk.)
:
:I can't make sense of the relationship of the "Threshold", "Value",
:"Worst", and "Data" columns because the "data" value is frequenly in
:excess of "worst", but the status is still OK.
:
:I see a few numbers here that might worry me. How is my disk doing ?
:
: [ HITACHI_DK23DA-20 (14L6TL) ]
:
: Threshold Value Worst Data Status
:Raw Read Error Rate 50 100 100 101 OK: normal
:Throughput Performance 50 100 100 4010 OK: normal
:Start/Stop Count 0 98 98 2422 OK: passing
:Reallocated Sector Count 10 100 100 5 OK: normal
:Seek Error Rate 50 100 100 452 OK: normal
:G-Sense Error Rate 0 100 99 145 OK: passing
:Hardware ECC Recovered 0 100 90 113 OK: passing
:Reallocation Event Count 0 100 100 5 OK: passing
:Current Pending Sector Count 0 99 99 1 OK: passing
:Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
:Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
:Load Retry Count 0 100 100 384 OK: passing
:Read Error Retry Rate 0 100 1 514 OK: passing

The numbers in the "Data" column are raw values. Their exact meaning is
arbitrary, but can sometimes be inferred. For example, 5 defective
sectors have been reallocated to spares. Those raw numbers are then
normalized, by formulas known only to the manufacturer, usually to a
range of 0 (worst) to 100 (best). That result is what is shown in the
"Value" column, and an alarm condition is indicated when that number
drops below the "Threshold" number. The "Worst" column shows the worst
(lowest) normalized value seen during the life of the device.

What I see in the above numbers is that in the past something happened
to the drive that caused a high Read Error Retry Rate (Worst = 1). I
would guess that resulted in the reallocation of 5 sectors, with 1
additional bad sector currently flagged for reallocation the next time
that sector is written. The Read Error Retry Rate is now back to a
normalized value of 100 (good). I'd worry if the Reallocated Sector
Count continues to grow, but otherwise the drive appears to be in good
shape.


Bingo. right on.

I had a crash BSOD crash that resulted in a unbootable XP system. It
would come half-way up and crash and reboot. It smelled like a disk
problem.

I did a low level format and ran the proceedure that Compaq wanted and
it gave an OK so I didn't have a way to get Compaq to give me a new
disk. Then I booted Linux and ran badblocks overnight and it didn't
show any problems, so I reimaged from a backup and it's been running
fine. That was months ago.

Thanks
 
F

Folkert Rienstra

You do now, do you? It's about time.
Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
isn't as often as it sounds but more often than that 113 in the statistics.

Yeah, you would be replacing them on a daily basis if it did.
The 'Hardware ECC Recovered' count appears to be linked to the
ERP count (Read error retries).
:
: I can't make sense of the relationship of the "Threshold", "Value",
: "Worst", and "Data" columns because the "data" value is frequenly
: in excess of "worst", but the status is still OK.
:
: I see a few numbers here that might worry me. How is my disk doing ?
:
: [ HITACHI_DK23DA-20 (14L6TL) ]
:
: Threshold Value Worst Data Status
: Raw Read Error Rate 50 100 100 101 OK: normal
: Throughput Performance 50 100 100 4010 OK: normal
: Start/Stop Count 0 98 98 2422 OK: passing
: Reallocated Sector Count 10 100 100 5 OK: normal
: Seek Error Rate 50 100 100 452 OK: normal
: G-Sense Error Rate 0 100 99 145 OK: passing
: Hardware ECC Recovered 0 100 90 113 OK: passing
: Reallocation Event Count 0 100 100 5 OK: passing
: Current Pending Sector Count 0 99 99 1 OK: passing
: Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
: Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
: Load Retry Count 0 100 100 384 OK: passing
: Read Error Retry Rate 0 100 1 514 OK: passing

The numbers in the "Data" column are raw values. Their exact meaning is
arbitrary, but can sometimes be inferred.

'Guessed at' as it is 'vendor specific and proprietary' and hopefully every
manufacturer uses the same spot and datawidth in the 'Device Attribute Data
Structure'.

It depends on the value of the 'Pre-failure/Advisory bit' what type of
alarm is indicated.

Why? Why not worry now? When it has happened before, so it can happen
again (unless you happen to know what it was and that it won't happen again,
if you can help it).
It may happen again and also stop again and how will that then be different
from the first time?

Or did you mean to say 'keeps growing steadily' as that would make more sense.

Yup, it appears like a temporary event that went by and the predictive sta-
tistics returned to safe values, either after time all by itself or by the LLF.
Question is:
what did happen and can it happen again if you don't do anything about it.
Maybe the 'G-Sense Error Rate' has something to do with it?
Bingo. right on.

I had a crash BSOD crash that resulted in a unbootable XP system.
It would come half-way up and crash and reboot. It smelled like a disk
problem.

I did a low level format and ran the proceedure that Compaq wanted and
it gave an OK so I didn't have a way to get Compaq to give me a new disk.
Then I booted Linux and ran badblocks overnight and it didn't show any
problems, so I reimaged from a backup and it's been running fine.
That was months ago.

Who was that again who said:
" Yank the drive. Life't too short to have unexpected total failures. "
 
A

Al Dykes

You do now, do you? It's about time.
Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
isn't as often as it sounds but more often than that 113 in the statistics.

Yeah, you would be replacing them on a daily basis if it did.
The 'Hardware ECC Recovered' count appears to be linked to the
ERP count (Read error retries).
:
: I can't make sense of the relationship of the "Threshold", "Value",
: "Worst", and "Data" columns because the "data" value is frequenly
: in excess of "worst", but the status is still OK.
:
: I see a few numbers here that might worry me. How is my disk doing ?
:
: [ HITACHI_DK23DA-20 (14L6TL) ]
:
: Threshold Value Worst Data Status
: Raw Read Error Rate 50 100 100 101 OK: normal
: Throughput Performance 50 100 100 4010 OK: normal
: Start/Stop Count 0 98 98 2422 OK: passing
: Reallocated Sector Count 10 100 100 5 OK: normal
: Seek Error Rate 50 100 100 452 OK: normal
: G-Sense Error Rate 0 100 99 145 OK: passing
: Hardware ECC Recovered 0 100 90 113 OK: passing
: Reallocation Event Count 0 100 100 5 OK: passing
: Current Pending Sector Count 0 99 99 1 OK: passing
: Off-Line Uncorrectable Sector Count 0 99 99 4 OK: passing
: Ultra ATA CRC Error Rate 0 200 200 13 OK: passing
: Load Retry Count 0 100 100 384 OK: passing
: Read Error Retry Rate 0 100 1 514 OK: passing

The numbers in the "Data" column are raw values. Their exact meaning is
arbitrary, but can sometimes be inferred.

'Guessed at' as it is 'vendor specific and proprietary' and hopefully every
manufacturer uses the same spot and datawidth in the 'Device Attribute Data
Structure'.

It depends on the value of the 'Pre-failure/Advisory bit' what type of
alarm is indicated.

Why? Why not worry now? When it has happened before, so it can happen
again (unless you happen to know what it was and that it won't happen again,
if you can help it).
It may happen again and also stop again and how will that then be different
from the first time?

Or did you mean to say 'keeps growing steadily' as that would make more sense.

Yup, it appears like a temporary event that went by and the predictive sta-
tistics returned to safe values, either after time all by itself or by the LLF.
Question is:
what did happen and can it happen again if you don't do anything about it.
Maybe the 'G-Sense Error Rate' has something to do with it?
Bingo. right on.

I had a crash BSOD crash that resulted in a unbootable XP system.
It would come half-way up and crash and reboot. It smelled like a disk
problem.

I did a low level format and ran the proceedure that Compaq wanted and
it gave an OK so I didn't have a way to get Compaq to give me a new disk.
Then I booted Linux and ran badblocks overnight and it didn't show any
problems, so I reimaged from a backup and it's been running fine.
That was months ago.

Who was that again who said:
" Yank the drive. Life't too short to have unexpected total failures. "


Me.

Since it was a personal machine, I wanted to experiment, and I had the
time, and didn't have a spare (which I always have on a business
site). So I took the time to experiment. The fact that it was passing
tests and Compaq wanted it to fail befre they'd swap it meand I'd be
out $150.

If it comes to keeping an employee productive I'm set up to swap a
machine out and reimage it. Fast and cost effective, but I don't
learn much anount anything but imaging.

(And you must be mistaking me for someone else, I've always stated
that ECC/FEC is heavily used on disks and has been for
decades. There's Maximal Probability stuff that makes reading a data
track a lot like demodualating a radio signal on a noisy channel.)
 
Ad

Advertisements

F

Folkert Rienstra

Al Dykes said:
You really enjoy posting peoples Reply Addresses on the internet, don't you, Al.
Maybe the abuse department at Panix.com has more influence on you than I have
to get you to conform to good usenet practice.
You do now, do you? It's about time.
Oh, and while ECC 'on the fly' error recovery is routine, 'routinely'
isn't as often as it sounds but more often than that 113 in the statistics.


Yeah, you would be replacing them on a daily basis if it did.
The 'Hardware ECC Recovered' count appears to be linked to the
ERP count (Read error retries).
[snip]


..... the drive appears to be in good shape.
Yup, it appears like a temporary event that went by and the predictive sta-
tistics returned to safe values, either after time all by itself or by the LLF.
Question is:
what did happen and can it happen again if you don't do anything about it.
Maybe the 'G-Sense Error Rate' has something to do with it?
[snip]
Who was that again who said:
" Yank the drive. Life't too short to have unexpected total failures. "

Me.

Since it was a personal machine, I wanted to experiment, and I had the
time, and didn't have a spare (which I always have on a business site).
So I took the time to experiment. The fact that it was passing tests and
Compaq wanted it to fail befre they'd swap it meand I'd be out $150.

Right.
Your beliefs vary with out of who's pocket the replacement cost has to come.
If it comes to keeping an employee productive I'm set up to swap a
machine out and reimage it. Fast and cost effective, but I don't
learn much anount anything but imaging.

Yes, except that is not at all what you meant when you said that quote.
(And you must be mistaking me for someone else, I've always stated
that ECC/FEC is heavily used on disks and has been for decades.

So it is your opinion then that the "Al Dykes" <[email protected] that
said in the thread :
"Re: More life from hard disk with bad sectors"

was an imposter, it can't have possibly come from you?

Or, alternatively, are you now saying that 'heavily used' ECC recovery,
resulting (by your own words) in 'heavily redirection' to spares 'creating' many
bad blocks, is daily practice (about once in every minute) and therefor quite ok?

Quite a change for someone who's life motto is:

"
IMHO once I see bad blocks the disk gets yanked and replaced on any
PC that's being used for business purposes. Life't too short to have
unexpected total failures.
"

You often haven't got a clue about what you are saying, have you, Al.
There's Maximal Probability stuff that makes reading a data
track a lot like demodualating a radio signal on a noisy channel.)

That about confirms it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top