Seagate's Seek Error Rate, Raw Read Error Rate, & Hardware ECC Recovered SMART attributes


F

Franc Zabkar

Seagate's Seek Error Rate, Raw Read Error Rate, and Hardware ECC
Recovered SMART attributes create a lot of anxiety amongst Seagate
users. This is because the raw values are typically very high, and the
normalised values (Current / Worst / Threshold) are usually quite low.
Despite this, the numbers in most cases are perfectly OK.

The anxiety arises because we intuitively expect that the normalised
values should reflect a "health" score, with 100 being the ideal
value. Similarly, we would expect that the raw values should reflect
an error count, in which case a value of 0 would be most desirable.
However, Seagate calculates and applies these attribute values in a
counterintuitive way.

In fact the normalised values of Seagate's Seek Error Rate, Raw Read
Error Rate, and Hardware ECC Recovered attributes are logarithmic, not
linear, and the raw values are sector counts or seek counts, not error
counts.

Seagate's SMART documentation is not publicly available. The following
information has not been gleaned from any official source, but is
based on my own testing and observation, and on testing by others.
Therefore it may contain errors.


Seek Error Rate
---------------

The raw value of each SMART attribute occupies 48 bits. Seagate's Seek
Error Rate attribute consists of two parts -- a 16-bit count of seek
errors in the uppermost 4 nibbles, and a 32-bit
count of seeks in the lowermost 8 nibbles. In order to see these data,
we will need a SMART utility that reports all 48 bits, preferably in
hexadecimal. Two such utilities are HD Sentinel and HDDScan.

I believe the relationship between the raw and normalised values of
the SER attribute is given by ...

normalised SER = -10 log (lifetime seek errors / lifetime seeks)

In the above formula, if the drive has recorded no errors, then we
would still need to set the number of errors to 1, otherwise the
result would be indeterminate.

The following table correlates the normalised SER against the actual
error rate:

90 = <= 1 error per 1000 million seeks
80 = <= 1 error per 100 million
70 = <= 1 error per 10 million
60 = <= 1 error per million
50 = 10 errors per million
40 = 100 errors per million
30 = 1000 errors per million
20 = 10 errors per thousand

A drive that has not yet recorded 1 million seeks will show 100 and
253 for the Current and Worst values. I believe this is because the
data are not considered to be statistically significant until the
drive has recorded 1 million seeks. When this target is reached, the
values drop to 60 and 60, assuming there have been no errors.

By way of example, here are the SMART data for my 13GB Seagate HDD:
http://www.users.on.net/~fzabkar/SmartUDM/13GB.RPT

Attribute ID Threshold Value Worst Raw
===============================================================
Seek Error Rate 7 30 53 38 052E0E3000EC

The number of lifetime seek errors = 0x052E (uppermost 4 nibbles)

The number of lifetime seeks = 0x0E3000EC (lowermost 8 nibbles)

Using Google's calculator ...

0x052E = 1326
0x0E3000EC = 238 026 988

http://www.google.com/search?q=0x052E+in+decimal
http://www.google.com/search?q=0x0E3000EC+in+decimal

Applying the formula ...

normalised SER = -10 log (0x052E / 0x0E3000EC)

http://www.google.com/search?q=-10+log+(0x052E+/+0x0E3000EC)

.... we get a result of 52.54.

Here is a second example:
http://www.users.on.net/~fzabkar/SmartUDM/120GB.RPT

Attribute ID Threshold Value Worst Raw
===============================================================
Seek Error Rate 7 30 79 60 00000580A6AC

The above drive is in fact error free. It has recorded 0x0580A6AC
seeks (= 92 million) without error.

Applying the formula ...

normalised SER = -10 log (1 / 0x0580A6AC)

.... we get a result of 79.65

Note that we have used 1 instead of 0 for the error count (because log
0 is indeterminate).


Raw Read Error Rate and Hardware ECC Recovered
----------------------------------------------

The raw values of the RRER and HER attributes represent a sector
count, not an error count. This figure rolls over to 0 once the count
reaches about 250 million. I suspect that the drive records the total
number of errors in each block of 250 million sectors, and then
recalculates the normalised values of each attribute accordingly. This
means that RRER and HER would be updated according to a rolling
average rather than on a lifetime basis. I'm almost certain that the
normalised values are also logarithmic, but I'm not sure how they are
calculated. The above figure of 250 million sectors applies to the
7200.11 and DiamondMax 22 models, but may not apply to all.

While writing this article I came upon a Seagate document entitled
"Diagnostic Commands". It doesn't discuss SMART attributes, but it
refers to "Error Recovery Usage Rate" and defines it as ...

Error Recovery Usage Rate =

-log10 {(Number of sectors in which controller invoked specified error
recovery scheme)/[(Number of sectors transferred) * (512 bytes/sector)
* (8 bits/byte)]}

This lends support for my Seek Error Rate formula, and suggests that
the RRER and HER attributes may be similarly calculated.

In fact the document mentions (but does not discuss) 5 different error
recovery schemes:

HARD = multiple retries invoked and failed
FIRM = multiple retries invoked
SOFT = 5 retries invoked
OTF = 1 retry invoked (On The Fly)
RAW = OTF ECC invoked

"On The Fly" means that errored data is corrected using the ECC bytes,
without an additional access of the platters.

Based on the abovementioned Error Recovery Usage Rate formula, I now
postulate that the normalised value of the Raw Read Error Rate
attribute could be calculated as follows:

normalised RRER = -10 log (number of errored sectors / total bits
transferred)

The total number of bits is ...

(250 million sectors) x (512 bytes/sector) x (8 bits/byte) = 1.024 x
10^12

It seems to me that it makes more sense to use a round figure, say
10^12.

If we now let the number of errors equal 0 (or 1), then we have ...

max normalised RRER = -10 log (1 / 10^12) = 120

Similarly, if we let the number of errors equal 250 million (ie every
sector is errored), then we have ...

min normalised RRER = -10 log (1 / 4096) = 36

Therefore, if my hypothesis is correct, we would expect that the
threshold value of the RRER attribute would be 36, and its maximum
possible value would be 120. In fact my Internet research tends to
confirm a maximum of 120 for 7200.11 models, but the threshold figure
is 34.

FWIW, here are the numbers for my own Seagate drives:

Attribute ID Threshold Value Worst Raw
================================================================
Raw Read Error Rate 1 6 114 100 00000386EBBA
(ST3320620A)
Raw Read Error Rate 1 6 64 62 00000AFD20E3
(ST3120026A)
Raw Read Error Rate 1 34 77 66 000007820F8F
(ST340016A)
Raw Read Error Rate 1 0 79 78 00000753BA8E
(ST313021A)

Hardware ECC recovered 195 0 100 63 00000C62F66E
(ST3320620A)
Hardware ECC recovered 195 0 64 62 00000AFD20E3
(ST3120026A)
Hardware ECC recovered 195 0 77 66 000007820F8F
(ST340016A)

http://www.users.on.net/~fzabkar/SmartUDM/320GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/120GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/40GB.RPT
http://www.users.on.net/~fzabkar/SmartUDM/13GB.RPT


References:
-----------

Here are several Usenet discussions where I have posted the results of
my experiments:

Seagate - SMART Raw Read Error Rate test:
http://groups.google.com/group/comp.sys.ibm.pc.hardware.storage/browse_thread/thread/b6eb8aa2476f9cac/030c515959145d44#030c515959145d44

SER, RRER, and HEC discussion:
http://groups.google.com/group/comp.sys.ibm.pc.hardware.storage/browse_thread/thread/54b8ad6d34549e95/ae6ca014b3ff211a#ae6ca014b3ff211a

Seek Error Rate discussion:
http://groups.google.com/group/comp.sys.ibm.pc.hardware.storage/browse_thread/thread/87001db5c567fb9a/63ccf100808bc3f6#63ccf100808bc3f6

A report from a Seagate user regarding the RRER attribute:
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracuda/New-Maxtor-STM3500320AS-500GB-S-M-A-R-T-Problem/m-p/22276

HD Sentinel (DOS / Windows / Linux):
http://www.hdsentinel.com/

HDDScan for Windows:
http://hddscan.com/

Explanation of SMART attributes:
http://en.wikipedia.org/wiki/S.M.A.R.T.

- Franc Zabkar
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top