Seagate ST3120827AS 7200.7 Problem?

P

Phoenix AG

Hi,

I've just bought a brand new Seagate 7200.7 hard disk. It seems to be
working fine but Active SMART reports that the Raw read error rate and
the ECC on the fly count are fluctuating like crazy.

They both started going up from 55 and went upto 62. Now they are back
to 60. Changing every 5 minutes.
Do I have a bad disk? Should I get it replaced immediately?

Or is this normal on Seagate hard disks? I have never used Seagate
HDDs in the past so I don't have much experience with them.

Can anyone please help? I would be very grateful.

Thank you.
If more information is required, please let me know and I will provide
it.


***
....the Phoenix shall rise...
 
A

Arno Wagner

Previously Phoenix AG said:
I've just bought a brand new Seagate 7200.7 hard disk. It seems to be
working fine but Active SMART reports that the Raw read error rate and
the ECC on the fly count are fluctuating like crazy.
They both started going up from 55 and went upto 62. Now they are back
to 60. Changing every 5 minutes.
Do I have a bad disk? Should I get it replaced immediately?
Or is this normal on Seagate hard disks? I have never used Seagate
HDDs in the past so I don't have much experience with them.

AFAIK this behaviour is normal. I have seen similar readings on
Seagate for both attributes and on Samsung for "Hardware_ECC_Recovered".
Whether the actual numbers mean trouble depends on the threshold set
on the disk.

On a ST3120026A it looks like this here:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 055 052 006 Pre-fail Always - 24109901
3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 347404439
9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 11140
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 250
194 Temperature_Celsius 0x0022 047 056 000 Old_age Always - 47
195 Hardware_ECC_Recovered 0x001a 055 052 000 Old_age Always - 24109901
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Raw error rate will only be problematic as it approaches 6.
Hardware_ECC_Recovered is an "old-age" attribute, meaning it
does not indicate imminent failure in any case.

One attribute that is critical and should be watched for its
raw value increasing is Reallocated_Sector_Ct, since it indicates
the number of times a sector was bad. Also Temperature_Celsius
is important to make sure the disk will live long. My 47C
reading here is a bit too hot for real comfort, but the disk
non-critical and cooling it wbetter would be difficult.

Arno
 
P

Phoenix AG

AFAIK this behaviour is normal. I have seen similar readings on
Seagate for both attributes and on Samsung for "Hardware_ECC_Recovered".
Whether the actual numbers mean trouble depends on the threshold set
on the disk.

On a ST3120026A it looks like this here:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 055 052 006 Pre-fail Always - 24109901
3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 347404439
9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 11140
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 250
194 Temperature_Celsius 0x0022 047 056 000 Old_age Always - 47
195 Hardware_ECC_Recovered 0x001a 055 052 000 Old_age Always - 24109901
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Raw error rate will only be problematic as it approaches 6.
Hardware_ECC_Recovered is an "old-age" attribute, meaning it
does not indicate imminent failure in any case.

One attribute that is critical and should be watched for its
raw value increasing is Reallocated_Sector_Ct, since it indicates
the number of times a sector was bad. Also Temperature_Celsius
is important to make sure the disk will live long. My 47C
reading here is a bit too hot for real comfort, but the disk
non-critical and cooling it wbetter would be difficult.

Arno

Thank you for the answer :)

My Reallocated Sector Ct is 100, Worst value is 100, Thresh is 36.
After checking just now with Active SMART, it seems another value has
fluctuated. The Seek Error Rate, which was 100 earlier, has gone down
to 61, Worst value is 60, Thresh is 30. Is this another thing to worry
about?

Which program did you use to generate those numbers? I can't seem to
export a log from Active SMART. Maybe if I got the other program, I
could post my SMART details and you could see them?

Also, as earlier, the Raw read error rate and ECC count has fluctuated
by a value of about 2. It seems to be going up, rather than down.


***
....the Phoenix shall rise...
 
A

Arno Wagner

Previously Phoenix AG said:
AFAIK this behaviour is normal. I have seen similar readings on
Seagate for both attributes and on Samsung for "Hardware_ECC_Recovered".
Whether the actual numbers mean trouble depends on the threshold set
on the disk.

On a ST3120026A it looks like this here:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 055 052 006 Pre-fail Always - 24109901
3 Spin_Up_Time 0x0003 097 096 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 33
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 085 060 030 Pre-fail Always - 347404439
9 Power_On_Hours 0x0032 088 088 000 Old_age Always - 11140
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 250
194 Temperature_Celsius 0x0022 047 056 000 Old_age Always - 47
195 Hardware_ECC_Recovered 0x001a 055 052 000 Old_age Always - 24109901
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 1
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age Offline - 0
202 TA_Increase_Count 0x0032 100 253 000 Old_age Always - 0

Raw error rate will only be problematic as it approaches 6.
Hardware_ECC_Recovered is an "old-age" attribute, meaning it
does not indicate imminent failure in any case.

One attribute that is critical and should be watched for its
raw value increasing is Reallocated_Sector_Ct, since it indicates
the number of times a sector was bad. Also Temperature_Celsius
is important to make sure the disk will live long. My 47C
reading here is a bit too hot for real comfort, but the disk
non-critical and cooling it wbetter would be difficult.

Arno
[/QUOTE]
Thank you for the answer :)
My Reallocated Sector Ct is 100, Worst value is 100, Thresh is 36.

Actually interesting is the raw number. (Last one on the right in
my sample output)
After checking just now with Active SMART, it seems another value has
fluctuated. The Seek Error Rate, which was 100 earlier, has gone down
to 61, Worst value is 60, Thresh is 30. Is this another thing to worry
about?

Might be. Is your PSU o.k. and is the drive firmly mounted?
Which program did you use to generate those numbers? I can't seem to
export a log from Active SMART. Maybe if I got the other program, I
could post my SMART details and you could see them?

"smartmontools", also ported to Windows. Available here:

http://smartmontools.sourceforge.net/

And the outoput above is not a log, bit the output.
Also, as earlier, the Raw read error rate and ECC count has fluctuated
by a value of about 2. It seems to be going up, rather than down.

Well, maybe there is some external factor causing this...
But not necessarily. Could just be variations in usage
pattern.

Arno
 
P

Phoenix AG

Actually interesting is the raw number. (Last one on the right in
my sample output)


Might be. Is your PSU o.k. and is the drive firmly mounted?

Well, I think so. On a separate note, my computer seems to crash
whenever I run any 3d graphics program on it, like a game or
something. It's just a month old, actually...and I've been trying to
sort out the problem. Have already changed my PSU 3 times, the RAM
twice and the graphics card once. It went away for a while, but now is
back in full force and I can't play any game at all without it
crashing after a half hour.

The drive seems firmly mounted.

Also, Active SMART now gives me a T.E.C. date of May 2005 for the Raw
Read Error rate. This definitely can't be good?
"smartmontools", also ported to Windows. Available here:

http://smartmontools.sourceforge.net/

And the outoput above is not a log, bit the output.


Well, maybe there is some external factor causing this...
But not necessarily. Could just be variations in usage
pattern.

Arno

Thanks for the reply, once again.

Here is a complete log of the smartmontools output.

=== START OF INFORMATION SECTION ===
Device Model: ST3120827AS
Serial Number: 5MS07R09
Firmware Version: 3.42
User Capacity: 120,034,123,776 bytes
Device is: Not in smartctl database [for details use: -P
showall]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Wed Apr 06 19:31:48 2005 India Standard Time
SMART support is: Available - device has SMART capability.
SMART support is: Disabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x82) Offline data collection
activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed

without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline
immediate.
Auto Offline data collection
on/off supp
ort.
Suspend Offline collection
upon new
command.
Offline surface scan
supported.
Self-test supported.
No Conveyance Self-test
supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before
entering
power-saving mode.
Supports SMART auto save
timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging
support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 71) minutes.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 061 052 006 Pre-fail
Always -
228800583
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail
Always -
0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always -
6
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always -
0
7 Seek_Error_Rate 0x000f 062 060 030 Pre-fail
Always -
1750450
9 Power_On_Hours 0x0032 100 100 000 Old_age
Always -
27
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always -
0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always -
8
194 Temperature_Celsius 0x0022 042 046 000 Old_age
Always -
42 (Lifetime Min/Max 0/26)
195 Hardware_ECC_Recovered 0x001a 061 052 000 Old_age
Always -
228800583
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always -
0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline -
0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always -
0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
Offline -
0
202 TA_Increase_Count 0x0032 100 253 000 Old_age
Always -
0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA
_of_first_error
# 1 Extended offline Completed without error 00% 10
-
# 2 Short offline Completed without error 00% 8
-
# 3 Short offline Completed without error 00% 8
-

SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.

Hope this information is of some use.

Active SMART giving me a TEC date can't be good, can it?


***
....the Phoenix shall rise...
 
A

Arno Wagner

Well, I think so. On a separate note, my computer seems to crash
whenever I run any 3d graphics program on it, like a game or
something.
Aha!

It's just a month old, actually...and I've been trying to
sort out the problem. Have already changed my PSU 3 times, the RAM
twice and the graphics card once. It went away for a while, but now is
back in full force and I can't play any game at all without it
crashing after a half hour.

That sounds very mach like overloaded PSU or inadequate cooling.
Both would affect a HDD as well. You should correct this problem
first, it may be the root cause.

What PSU do you have? And what CPU and graphics card?
The drive seems firmly mounted.

Then that is not the cause.
Also, Active SMART now gives me a T.E.C. date of May 2005 for the Raw
Read Error rate. This definitely can't be good?

No idea, I don't use/know Active SMART.
Thanks for the reply, once again.
Here is a complete log of the smartmontools output.
=== START OF INFORMATION SECTION ===
Device Model: ST3120827AS
Serial Number: 5MS07R09
Firmware Version: 3.42
User Capacity: 120,034,123,776 bytes
Device is: Not in smartctl database [for details use: -P
showall]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 2
Local Time is: Wed Apr 06 19:31:48 2005 India Standard Time
SMART support is: Available - device has SMART capability.
SMART support is: Disabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection
activity
was completed without error.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 430) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline
immediate.
Auto Offline data collection
on/off supp
ort.
Suspend Offline collection
upon new
command.
Offline surface scan
supported.
Self-test supported.
No Conveyance Self-test
supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before
entering
power-saving mode.
Supports SMART auto save
timer.
Error logging capability: (0x01) Error logging supported.
No General Purpose Logging
support.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 71) minutes.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_
FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 061 052 006 Pre-fail
Always -
228800583
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail
Always -
0
4 Start_Stop_Count 0x0032 100 100 020 Old_age
Always -
6
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail
Always -
0
7 Seek_Error_Rate 0x000f 062 060 030 Pre-fail
Always -
1750450
9 Power_On_Hours 0x0032 100 100 000 Old_age
Always -
27
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail
Always -
0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age
Always -
8
194 Temperature_Celsius 0x0022 042 046 000 Old_age
Always -
42 (Lifetime Min/Max 0/26)
195 Hardware_ECC_Recovered 0x001a 061 052 000 Old_age
Always -
228800583
197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always -
0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline -
0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age
Always -
0
200 Multi_Zone_Error_Rate 0x0000 100 253 000 Old_age
Offline -
0
202 TA_Increase_Count 0x0032 100 253 000 Old_age
Always -
0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA
_of_first_error
# 1 Extended offline Completed without error 00% 10
-
# 2 Short offline Completed without error 00% 8
-
# 3 Short offline Completed without error 00% 8
-
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute
delay.
Hope this information is of some use.

Looks pretty normal to me. Maybe the seek-error rate is a bit high.

Arno
 
P

Phoenix AG

That sounds very mach like overloaded PSU or inadequate cooling.
Both would affect a HDD as well. You should correct this problem
first, it may be the root cause.

Well, its a 350w PSU. I have 2 SATA drives, an Intel 3.0ghz 530J CPU.
And an ATI X700 Pro 128mb card. Also, a dvd writer and a cd writer. Do
you think thats a little overloaded?

I have 3 fans in the cabinet. Earlier, only 1 fan used to do its job
and keep the system cabinet temperature at about 36C (I have a cabinet
with an LCD on its front which tells me the temp). After I installed
the 2nd drive, it constantly goes to about 38C, after which the
cabinet's other 2 side fans take over and start running and cool it
down to about 35-36.

I have tested the cooling issue quite a lot. The system used to run
quite cool earlier and still crash.

They gave me the best PSU they had in the store, I don't know the
make. But I had them change it thrice, and they assured me that this
one was the best. So it can't be that bad. Can't be the Ram, got that
changed and tested. Got the graphics card changed from a Connect3D to
a Powercolor card. That worked, the crashing stopped, but now its back
in full force.

In fact, I ran an AVG virus scan on my system right now and then I did
it another time. Both times, the system crashed.

At first I thought it might be the onboard sound, so I tried running
3D tests with no sound and even disabled the onboard sound. But that
didn't stop the crashing.

My other HDD (Samsung) gives great SMART numbers, no changes ever.
Except for maybe the temperature a little here and there.

Actually, I am going a bit nuts with all this crashing. It can't be
the PSU, I had that changed a lot of times and tested. Same with ram,
gfx card. Only things I can think of now are the motherboard or the
CPU, very unlikely as it may seem. The motherboard is a Gigabyte.

Now Active SMART gives me a TEC Date of Apr 2005. Which means it
predicts the drive is gonna fail this month. LOL. I think I should
definitely get it changed.

It's been a hellish experience, getting a new comp :(
What PSU do you have? And what CPU and graphics card?


Then that is not the cause.


No idea, I don't use/know Active SMART.



***
....the Phoenix shall rise...
 
M

Michael Hawes

Phoenix AG said:
Well, its a 350w PSU. I have 2 SATA drives, an Intel 3.0ghz 530J CPU.
And an ATI X700 Pro 128mb card. Also, a dvd writer and a cd writer. Do
you think thats a little overloaded?

I have 3 fans in the cabinet. Earlier, only 1 fan used to do its job
and keep the system cabinet temperature at about 36C (I have a cabinet
with an LCD on its front which tells me the temp). After I installed
the 2nd drive, it constantly goes to about 38C, after which the
cabinet's other 2 side fans take over and start running and cool it
down to about 35-36.

I have tested the cooling issue quite a lot. The system used to run
quite cool earlier and still crash.

They gave me the best PSU they had in the store, I don't know the
make. But I had them change it thrice, and they assured me that this
one was the best. So it can't be that bad. Can't be the Ram, got that
changed and tested. Got the graphics card changed from a Connect3D to
a Powercolor card. That worked, the crashing stopped, but now its back
in full force.

In fact, I ran an AVG virus scan on my system right now and then I did
it another time. Both times, the system crashed.

At first I thought it might be the onboard sound, so I tried running
3D tests with no sound and even disabled the onboard sound. But that
didn't stop the crashing.

My other HDD (Samsung) gives great SMART numbers, no changes ever.
Except for maybe the temperature a little here and there.

Actually, I am going a bit nuts with all this crashing. It can't be
the PSU, I had that changed a lot of times and tested. Same with ram,
gfx card. Only things I can think of now are the motherboard or the
CPU, very unlikely as it may seem. The motherboard is a Gigabyte.

Now Active SMART gives me a TEC Date of Apr 2005. Which means it
predicts the drive is gonna fail this month. LOL. I think I should
definitely get it changed.

It's been a hellish experience, getting a new comp :(




***
...the Phoenix shall rise...

according to http://www.jscustompcs.com/power_supply/ your system
should draw about 300W. What is make and model of PSU? Dodgy PSUs are often
from China and are very light in weight, as they skimp on heatsinks. What is
CPU temp? Open case and check cooler on graphics card is running and feel
temp of heatsink after crashes. If too hot to touch, is too hot. What is
spec/make of memory. Unbranded memory can be unreliable. Download Microsoft
memory tester and run it.Have you ovrtclocked system?
Mike.
 
P

Phoenix AG

according to http://www.jscustompcs.com/power_supply/ your system
should draw about 300W. What is make and model of PSU? Dodgy PSUs are often
from China and are very light in weight, as they skimp on heatsinks. What is
CPU temp? Open case and check cooler on graphics card is running and feel
temp of heatsink after crashes. If too hot to touch, is too hot. What is
spec/make of memory. Unbranded memory can be unreliable. Download Microsoft
memory tester and run it.Have you ovrtclocked system?
Mike.

Hi, thanks for the reply :)

I am not sure about the make and model of the PSU. I will open up the
case tomorrow and let you guys know. I did think it was a dodgy PSU as
I have experienced crashes and etc with a bad PSU. But I have changed
it quite a lot now and still no difference. I guess I can ask for it
to be changed again.

The CPU temp on the normal load seems to be about 47-51 C. I haven't
checked it after playing a game. I will also check the CPU heatsink,
thats a great idea.
The fan on the graphics card is running fine.

The memory is PC3200 400mhz memory from Kingston. It's a 512MB stick.

I haven't overclocked anything. Everything is running at defaults.

In fact, to eliminate a software problem, I just formatted my system
recently.

I am now going to let it run for the night converting a load of mp3
files into a different format. Just to test if the CPU is the problem,
and not the graphics card.
Because now I can create a reproduceable crash when I try to run AVG's
system scan.

It feels so bad, having some games I want to play and having a good
system to play them. And not being able to play :( Hehe.. :(


***
....the Phoenix shall rise...
 
A

Arno Wagner

Well, its a 350w PSU. I have 2 SATA drives, an Intel 3.0ghz 530J CPU.
And an ATI X700 Pro 128mb card. Also, a dvd writer and a cd writer. Do
you think thats a little overloaded?

Might be. Especially if it is a low-quality PSU. In some tests
I have seen some did reach less than 80% of their stated load
before flailing. Also the PSU might have limits on 12V current
that is too low for your hardware. Especially older designs
are not able to supply the load ob 12V a modern #D card generates.

Also a PSU loaded nearly 100% is going to die fast. Usually
they are designed for something like 70% continuous load.
I have 3 fans in the cabinet. Earlier, only 1 fan used to do its job
and keep the system cabinet temperature at about 36C (I have a cabinet
with an LCD on its front which tells me the temp). After I installed
the 2nd drive, it constantly goes to about 38C, after which the
cabinet's other 2 side fans take over and start running and cool it
down to about 35-36.
I have tested the cooling issue quite a lot. The system used to run
quite cool earlier and still crash.

Your numbers sound reasonable. Does not seem to be an overheated system.
They gave me the best PSU they had in the store, I don't know the
make. But I had them change it thrice, and they assured me that this
one was the best. So it can't be that bad.

I would not trust this statement. I am curious what these people
think is a good brand for PSUs. WHat it cannot be is a single
faulty PSU out of a series of good ones. Or the people in the shop
only told you they changed it but did nothing?
Can't be the Ram, got that
changed and tested. Got the graphics card changed from a Connect3D to
a Powercolor card. That worked, the crashing stopped, but now its back
in full force.

Typical for overload: The problem gets worse over time.
In fact, I ran an AVG virus scan on my system right now and then I did
it another time. Both times, the system crashed.
At first I thought it might be the onboard sound, so I tried running
3D tests with no sound and even disabled the onboard sound. But that
didn't stop the crashing.

No. Should not crash things.
My other HDD (Samsung) gives great SMART numbers, no changes ever.
Except for maybe the temperature a little here and there.

It may be more tolerant. But still, I think the Seagate is also running
reasonably. You should worry about the crashes first.
Actually, I am going a bit nuts with all this crashing. It can't be
the PSU, I had that changed a lot of times and tested.

It can. An average design run close to the design limit will
be unreliable. The 300W quote from Mike sounds reasonable.
With normal load only 70% of max. capacity you should use at
least a 380W quality PSU. For not so good PSUs you might have
to go up to 450W or 480W for a stable system.
Same with ram,
gfx card. Only things I can think of now are the motherboard or the
CPU, very unlikely as it may seem. The motherboard is a Gigabyte.
Now Active SMART gives me a TEC Date of Apr 2005. Which means it
predicts the drive is gonna fail this month. LOL. I think I should
definitely get it changed.

I think you should cure the crashes first. The drive may be
perfectly fine. As long as the crashes are there, the drive
may just suffer some collateral effects.
It's been a hellish experience, getting a new comp :(

Sometimes. Quality components are key. PSUs are often
overlooked, but I once read that something like 40% of
all electronics equipment failures in the US Navy are
bad switching-mode PSUs (the design used in the PC).
And with modern CPUs and 3D cards drawing a lot on the
12V line, the design requirements also have changed
significantly and some PSU manufacturers are slow to adapt.
Classically, a lot more load was on 5V.

Arno
 
P

Phoenix AG

Might be. Especially if it is a low-quality PSU. In some tests
I have seen some did reach less than 80% of their stated load
before flailing. Also the PSU might have limits on 12V current
that is too low for your hardware. Especially older designs
are not able to supply the load ob 12V a modern #D card generates.

Also a PSU loaded nearly 100% is going to die fast. Usually
they are designed for something like 70% continuous load.

Yes, I understand all that :(
The problem is, 350w PSUs are prevelant all over the place here. It's
really hard getting a 400w or above PSU here. I will go to the shop
and have them change it to an Antec or another good company one
tomorrow. I think enough is enough.
Your numbers sound reasonable. Does not seem to be an overheated system.


I would not trust this statement. I am curious what these people
think is a good brand for PSUs. WHat it cannot be is a single
faulty PSU out of a series of good ones. Or the people in the shop
only told you they changed it but did nothing?

No no. I sat at the shop for 8 hours, right with my system and made
them change and test everything. In fact, most of the time, I tested
things myself. They did change the parts, but now I think, maybe they
went from one crappy PSU to the next?
Typical for overload: The problem gets worse over time.



No. Should not crash things.


It may be more tolerant. But still, I think the Seagate is also running
reasonably. You should worry about the crashes first.

Yes, well the numbers have stopped fluctuating in Active SMART. It
still gives me a TEC Date of Apr 2005. So I have copied all my data to
the other drive. Just in case. Let's see. Maybe Seagate takes a little
while to adjust and then fixes the numbers?
It can. An average design run close to the design limit will
be unreliable. The 300W quote from Mike sounds reasonable.
With normal load only 70% of max. capacity you should use at
least a 380W quality PSU. For not so good PSUs you might have
to go up to 450W or 480W for a stable system.

Yeah. It all comes down to the PSU. I think tomorrow will be the day I
go PSU hunting.
I think you should cure the crashes first. The drive may be
perfectly fine. As long as the crashes are there, the drive
may just suffer some collateral effects.

Exactly. I am really to my wit's end solving the crashing. But slowly
and steadily, I have been narrowing down things which may be faulty.
Yesterday night was a kind of a breakthrough, I think.
It now crashes whenever I overwork the CPU. Like, if I run the virus
scan, it crashes in the middle.
I also took a bunch of mp3 files and tried to convert them. After
about 25%, it crashed.
The system stays stable as long as I am doing general work like word
processing, internet surfing, downloading, etc. But as soon as the CPU
is overworked, crash.
Now I believe this is the reason for the graphics crashes too. It
wasn't the graphics card, its the CPU.

I now wonder what could be the problem with the CPU. It's been
installed with the standard fan which comes in the box with it. Do you
think thats too little?
Sometimes. Quality components are key. PSUs are often
overlooked, but I once read that something like 40% of
all electronics equipment failures in the US Navy are
bad switching-mode PSUs (the design used in the PC).
And with modern CPUs and 3D cards drawing a lot on the
12V line, the design requirements also have changed
significantly and some PSU manufacturers are slow to adapt.
Classically, a lot more load was on 5V.

You know, I tried installing all quality components in this system.
The system would have been much cheaper but I got it for quite a lot.
But I guess I overlooked the PSU :(
Is there a way to check if a PSU has better support for 12v lines?

I must say I am really very grateful for all this help. It's good to
be able to discuss this with people who are aware of these things.


***
....the Phoenix shall rise...
 
P

Phoenix AG

On a separate note, the spin up time has gone from 100 to 96. And now
it tells me that it will fail in May 2005, not Apr 2005 as it had
mentioned earlier (Active SMART).


***
....the Phoenix shall rise...
 
R

Rod Speed

Phoenix AG said:
Yes, I understand all that :(
The problem is, 350w PSUs are prevelant all over the place here. It's
really hard getting a 400w or above PSU here. I will go to the shop
and have them change it to an Antec or another good company one
tomorrow. I think enough is enough.


No no. I sat at the shop for 8 hours, right with my system and made
them change and test everything. In fact, most of the time, I tested
things myself. They did change the parts, but now I think, maybe they
went from one crappy PSU to the next?


Yes, well the numbers have stopped fluctuating in Active SMART. It
still gives me a TEC Date of Apr 2005. So I have copied all my data to
the other drive. Just in case. Let's see. Maybe Seagate takes a little
while to adjust and then fixes the numbers?

Its not the drive that determines that number,
its the smart ute using the data from the drive.
Yeah. It all comes down to the PSU. I think tomorrow will be the day I
go PSU hunting.
Exactly. I am really to my wit's end solving the crashing. But slowly
and steadily, I have been narrowing down things which may be faulty.
Yesterday night was a kind of a breakthrough, I think.
It now crashes whenever I overwork the CPU. Like, if I run the virus
scan, it crashes in the middle.
I also took a bunch of mp3 files and tried to convert them. After
about 25%, it crashed.
The system stays stable as long as I am doing general work like word
processing, internet surfing, downloading, etc. But as soon as the CPU
is overworked, crash.
Now I believe this is the reason for the graphics crashes too. It
wasn't the graphics card, its the CPU.
I now wonder what could be the problem with the CPU.

Yeah, very likely given that you said elsewhere that its an Intel
3.0ghz 530J CPU. They're a tad notorious for that currently.
It's been installed with the standard fan which comes in the box with it.
Do you think thats too little?

More likely the problem is getting the heat away from that.
You know, I tried installing all quality components in this system.

Looks like you might well have gotten fangs
in the arse with that particular Intel CPU.
The system would have been much cheaper but I got it for quite a lot.
But I guess I overlooked the PSU :(

Its much more likely the cpu is the problem.
Is there a way to check if a PSU has better support for 12v lines?

Yes, but not with what the average consumer has available.
I must say I am really very grateful for all this help. It's good to
be able to discuss this with people who are aware of these things.


***
...the Phoenix shall rise...

Or sink |-(
 
P

Phoenix AG

Yeah, very likely given that you said elsewhere that its an Intel
3.0ghz 530J CPU. They're a tad notorious for that currently.

Are you sure about this? You have any links on the internet where I
can check it out? Because I am going to look a little stupid trying to
convince the store guy that the 3.0 ghz CPU is buggy.

On a separate note, the Spin up time on the HDD dropped to 96.

Well, some guy told me to disable HT and see. So I disabled it and it
crashed after a really long time. Like, if it was crashing in 20
minutes on load, it took about an hour to crash.
I am also using a Gigabyte motherboard and now someone tells me that
those are also known to crash???

Is everything known to crash or have I just been really unlucky?


***
....the Phoenix shall rise...
 
R

Rod Speed

Are you sure about this?
Yep.

You have any links on the internet where I can check it out?
Because I am going to look a little stupid trying to convince
the store guy that the 3.0 ghz CPU is buggy.

Just monitor the cpu temp with something like SpeedFan,
it should be obvious that you are seeing the problem
when the cpu temp its getting too high.

The trick with those intel cpus is the inlet air temp. Its best to
get the air from the outside of the case with those, otherwise
the cpu temp does get too high when working hard when you
use the higher temp air from the inside of the case for the cpu.
On a separate note, the Spin up time on the HDD dropped to 96.
Well, some guy told me to disable HT and see. So I disabled
it and it crashed after a really long time. Like, if it was crashing
in 20 minutes on load, it took about an hour to crash.

It would be interesting to monitor the cpu temp in those two configs.

Dont bother with the bios temperature, thats not working the cpu.
I am also using a Gigabyte motherboard and now
someone tells me that those are also known to crash???

Its bullshit.
Is everything known to crash or have I just been really unlucky?

The evidence you have is that working the cpu hard causes it
to crash, and I bet you will see a correlation with the cpu temp
and thats not hard to fix by using a case which allows outside
air to be used to cool the cpu.
 
P

Phoenix AG

Just monitor the cpu temp with something like SpeedFan,
it should be obvious that you are seeing the problem
when the cpu temp its getting too high.

Thank you for the answer, everyone. I am sorry for taking this thread
off topic myself :)
Well, I have great news :) I finally managed to solve the problem :)

Gigabyte has this utility called EasyTune to monitor the motherboard.
I used that. With HT, the temperature did not go above 59C while the
CPU was on full load (AVG virus scan, whole system + conversion of 500
mp3 files).
It crashed in 20 mins, while the temp was still at 59C. Which was
definitely weird, as I was so sure that the CPU was overheating.

Then I disabled HT and did the same test. Monitored the temp. It did
not go beyond 58C. Worked for 1 hour. I had almost thought I had
solved the problem when CRASH! It was really disappointing :(

Then my BIOS has that CPU Enhanced Halt (C1E) setting. This is
supposed to power save and dissipate heat from the CPU. I
enabled/disabled this, nothing happened.

Finally, I searched around on google and I found a guy on some gamer
forum who was having EXACTLY the same problem I was. He solved it by
changing his motherboard. And I also read that the 530J was supported
only by v1002 of some motherboard.

So then I looked at the motherboad and tried to see what version it
was. Definitely not 1001 or anything along that. The version was F1,
while the current version on the Gigabyte site is F4. And F4 adds
support for EIST CPUs, whatever that means.

So I flashed the BIOS and turned on HT and everything. And it worked
:)
After finishing to convert the files and running the whole scan, its
been running 3DMark03 in a constant loop the whole night. And it
hasn't crashed :)

Thank you all for your suggestions. All of them helped in narrowing
down the problem.
The trick with those intel cpus is the inlet air temp. Its best to
get the air from the outside of the case with those, otherwise
the cpu temp does get too high when working hard when you
use the higher temp air from the inside of the case for the cpu.



It would be interesting to monitor the cpu temp in those two configs.

Dont bother with the bios temperature, thats not working the cpu.


Its bullshit.


The evidence you have is that working the cpu hard causes it
to crash, and I bet you will see a correlation with the cpu temp
and thats not hard to fix by using a case which allows outside
air to be used to cool the cpu.

Frankly, I couldn't have agreed with you more. If you look at it
logically, thats what it seems. The solution turned out to be quite
unexpected, but a pleasant surprise :)

On the HDD, the Spin Up Time went to 97. Seek error rate went to 64.
And now it says TEC Date is Apr 2005 in front of Raw read error rate.
and that is 60, with a worst value of 52.

Should I wait and see if this HD fails? I already have my important
data on the other drive. Or should I exchange it immediately? Or
should I live with it because its normal?

Thanks :)


***
....the Phoenix shall rise...
 
R

Rod Speed

Phoenix AG said:
Thank you for the answer, everyone. I am sorry for taking this thread
off topic myself :)
Well, I have great news :) I finally managed to solve the problem :)
Gigabyte has this utility called EasyTune to monitor the motherboard.
I used that. With HT, the temperature did not go above 59C while the
CPU was on full load (AVG virus scan, whole system + conversion of 500
mp3 files).
It crashed in 20 mins, while the temp was still at 59C. Which was
definitely weird, as I was so sure that the CPU was overheating.

Then I disabled HT and did the same test. Monitored the temp. It did
not go beyond 58C. Worked for 1 hour. I had almost thought I had
solved the problem when CRASH! It was really disappointing :(

Then my BIOS has that CPU Enhanced Halt (C1E) setting. This is
supposed to power save and dissipate heat from the CPU. I
enabled/disabled this, nothing happened.

Finally, I searched around on google and I found a guy on some gamer
forum who was having EXACTLY the same problem I was. He solved it by
changing his motherboard. And I also read that the 530J was supported
only by v1002 of some motherboard.

So then I looked at the motherboad and tried to see what version it
was. Definitely not 1001 or anything along that. The version was F1,
while the current version on the Gigabyte site is F4. And F4 adds
support for EIST CPUs, whatever that means.

So I flashed the BIOS and turned on HT and everything. And it worked
:)
After finishing to convert the files and running the whole scan, its
been running 3DMark03 in a constant loop the whole night. And it
hasn't crashed :)

Thank you all for your suggestions. All of them helped in narrowing
down the problem.

Thanks for the feedback, too rare IMO.

That also confirms why I avoid Gigabyte motherboard,
they release them to the field too early in my opinion,
you see FAR too many rev levels at the physical board
level and too many bios flashes too.
Frankly, I couldn't have agreed with you more. If you look
at it logically, thats what it seems. The solution turned out
to be quite unexpected, but a pleasant surprise :)

Yeah, its always logical when you do end up nailing it.
On the HDD, the Spin Up Time went to 97. Seek error rate
went to 64. And now it says TEC Date is Apr 2005 in front of
Raw read error rate. and that is 60, with a worst value of 52.
Should I wait and see if this HD fails?

I doubt they'll exchange it on that basis alone.
I already have my important data on the other drive.

Then you should have it completely backed up.

Its never a good idea to rely on a hard drive not failing.

DVD burners are so cheap now that there is no excuse.
Or should I exchange it immediately? Or
should I live with it because its normal?

Not really possible to say because you'd need to know if that
sort of variation is common with that particular drive model.

I'd certainly be concerned, but it may be normal with that model.
 
P

Phoenix AG

Thanks for the feedback, too rare IMO.

That also confirms why I avoid Gigabyte motherboard,
they release them to the field too early in my opinion,
you see FAR too many rev levels at the physical board
level and too many bios flashes too.

Yes, this is my first experience with Gigabyte's boards too. I have
always used Asus. Frankly, except for this Bios flashing, and the
trouble and grey hair inducing nightmares I've had with all this
crashing, I find it to be quite decent.
It has absolutely excellent features, has overclocking a baby can do,
support for DDR and DDR2 ram, support for RAID, in fact, support for
everything under the sun :)
Flashing the Bios was really simple, specially considering I've never
done it before.

But yeah, they do need to test them out thoroughly before releasing
them.
Yeah, its always logical when you do end up nailing it.



I doubt they'll exchange it on that basis alone.

Oh, I am sure they will exchange it when it fails. What I am worried
about is whether they'll exchange it just on my word that its faulty.
After all, I am having no problems with it at all except for the SMART
errors.
Then you should have it completely backed up.

Its never a good idea to rely on a hard drive not failing.

DVD burners are so cheap now that there is no excuse.

Yeah, I do have a dvd burner. But somehow, due to fate's unkindness
towards me, it doesn't work very well. LOL. It doesn't seem to read
half the CDs and doesn't burn half the time. Otherwise, its a good
fast burner. I am planning to ditch it and buy a new dual layer one as
soon as I get some cash.
Not really possible to say because you'd need to know if that
sort of variation is common with that particular drive model.

I'd certainly be concerned, but it may be normal with that model.

Yes, well, I am not sure what to do. I can go through the
unpleasantness of getting the drive exchanged, arguing with the
dealer...Or I can sit it out and use it...waiting for it to fail, if
it does.


***
....the Phoenix shall rise...
 
R

Rod Speed

Yes, this is my first experience with Gigabyte's
boards too. I have always used Asus.

Yeah, that's what I normally use.
Frankly, except for this Bios flashing, and the trouble
and grey hair inducing nightmares I've had with all
this crashing, I find it to be quite decent.

Sure, they normally do get their act into gear eventually, but
I'm not interested in wasting my time with the level of hassle
you experienced, they should have sorted that particular
problem out before it was released. It aint rocket science.

And should have made it a lot clearer that a later bios flash
fixed that particular problem too. In fact it should have been
announced very unambiguously indeed on the support page
for that particular motherboard so you could have saved
yourself an unbelievable amount of time by just checking
that when it was clear that the system had a problem
initially. In spades with your supplier.
It has absolutely excellent features, has overclocking a
baby can do, support for DDR and DDR2 ram, support
for RAID, in fact, support for everything under the sun :)

Sure, their product generally are quite good feature wise,
the problem is that they get released to early before all
the warts are excised, and they are much too coy about
admitting to problems that have been fixed.
Flashing the Bios was really simple, specially
considering I've never done it before.
But yeah, they do need to test them
out thoroughly before releasing them.

Yeah, no excuse for not testing that particular situation that bit you.

Presumably they do it that way to get the jump on the competition.
Oh, I am sure they will exchange it when it fails.

Yeah, I meant exchange it now with just those symptoms, before it fails.
What I am worried about is whether they'll
exchange it just on my word that its faulty.

Thats what I doubt they will do.
After all, I am having no problems with
it at all except for the SMART errors.

Yep, thats why they are unlikely to agree that its defective.

And I wouldnt just send it back to Seagate and just expect to
get a different drive back either, it could well be another drive
returned for the same reason and found to be fine by Seagate.
Yeah, I do have a dvd burner. But somehow, due to fate's
unkindness towards me, it doesn't work very well. LOL.

You were warned about that grave dancing, you wouldnt listen |-(
It doesn't seem to read half the CDs and doesn't burn half the time.
Otherwise, its a good fast burner. I am planning to ditch it and buy
a new dual layer one as soon as I get some cash.

Sure, but I'd still backup that data with that one
given that the SMART data is a bit of a worry.
Yes, well, I am not sure what to do.

I'd backup the data with the current DVD burner, more than
one copy, and keep monitoring the SMART data myself.
I can go through the unpleasantness of getting
the drive exchanged, arguing with the dealer...

Only you can really say how likely it is that you
can monster them using just the SMART data.

I would do that if you can get another unused drive.
Or I can sit it out and use it...waiting for it to fail, if it does.

I wouldnt do that without a backup of that data.
Even if I had to buy a new burner on the credit card.
 
P

Phoenix AG

Yeah, that's what I normally use.
Yeah, they have always been rock solid, Asus. My old P4 1.6 runs
great, hasn't ever had a problem.
Sure, they normally do get their act into gear eventually, but
I'm not interested in wasting my time with the level of hassle
you experienced, they should have sorted that particular
problem out before it was released. It aint rocket science.

And should have made it a lot clearer that a later bios flash
fixed that particular problem too. In fact it should have been
announced very unambiguously indeed on the support page
for that particular motherboard so you could have saved
yourself an unbelievable amount of time by just checking
that when it was clear that the system had a problem
initially. In spades with your supplier.

Exactly. I would have expected it to say clearly what it updates and
fixes, the bios update. But, all it gave was a line or 2 about some
stupid acronym I've never heard of in my life.
Would have expected better.
Sure, their product generally are quite good feature wise,
the problem is that they get released to early before all
the warts are excised, and they are much too coy about
admitting to problems that have been fixed.



Yeah, no excuse for not testing that particular situation that bit you.

Presumably they do it that way to get the jump on the competition.

Exactly :)
Yeah, I meant exchange it now with just those symptoms, before it fails.

Yes. In fact, the more I think about it, the more I am sure they
won't. I'll take it back to them, he'll pop it into his computer, test
it and tell me its running fine. I am sure half of them won't know
what SMART even is.
Thats what I doubt they will do.


Yep, thats why they are unlikely to agree that its defective.

And I wouldnt just send it back to Seagate and just expect to
get a different drive back either, it could well be another drive
returned for the same reason and found to be fine by Seagate.

Yeah, didn't think of that. Well, guess I am ok with this drive for
now.
You were warned about that grave dancing, you wouldnt listen |-(

LOL. Well, 2005 has not been a very good year for my computers.
Everything seems to be a bit shaky. I'm gonna go tomorrow and get my
dvd burner fixed too, if they can do it. It's still under warranty so
hopefully they should. I actually won this one otherwise, I would
never have got it. It's a Samsung and they are notorious for not
reading CD-Rs and a lot of other media.
Sure, but I'd still backup that data with that one
given that the SMART data is a bit of a worry.

Yeah, I've been backing up stuff. Not much to backup, though. It's my
2nd drive and my 2nd drive has mostly the dump of all things. Like
Joey episodes, some movies, some games, temp space for everything,
other misc stuff downloaded, games installed, etc.

It won't be that big a loss if this drive goes, but yes, it'll be
annoying because it'll mean a loss of a lot of time. And game saves :D
Which is never good...
I'd backup the data with the current DVD burner, more than
one copy, and keep monitoring the SMART data myself.


Only you can really say how likely it is that you
can monster them using just the SMART data.

I would do that if you can get another unused drive.


I wouldnt do that without a backup of that data.
Even if I had to buy a new burner on the credit card.



***
....the Phoenix shall rise...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top