Nforce4 chip woes

J

John

Anyway just in case someone else is interested.

Heres some info.

Heres a post from the forum boards at PC perspective that claims the
temps from a probe are very different from the sensors.

-----------------------------------------------------------------
Hello.

I just wandered why my sensors show so low temperatures (~25 Celsius
CPU and ~40 Celsuis other sensor - I have passive cooled VNF4) so I've
measured it with temperature sonde.

I've found that BIOS CPU temp is somewhat different from temperature
measured with sonde in the middle of CPU cooler (which should be a
little bit less worm then CPU core itself):

Code:
BIOS CPU sensor sonde
31 33
33 37
34 41
35 43
37 45
38 46
39 47
40 49
42 50
43 51
44 52

Then I've measured second BIOS temp sensor and temperature in the
middle of chipset cooler (I have passive one):

Code:
BIOS sensor sonde
30 43
32 46
34 48
35 49
36 50

and then I've turned off CPU fan and realized that chipset cooler
depends heavily on it:

Code:
BIOS sensor sonde
39 83
40 86
41 88

So, can someone name me one reason to make so FAKE temperature
readings by board?

I realy feel like someone fooled me around for last 10 months: CPU at
probably over 60 when it appers to be under 50 Celsuis and chipset
burning at probably ~100 Celsius when board reads 45 Celsuis!

Why? Do they want us to burn our boards down?

-----------------------------------------------------------------

See this IF TRUE and IF its typical and not just this board --- would
explain everything.


See heres the claim Nforce4 boards have problems ---- data corruption
problems.

http://forums.nvidia.com/lofiversion/index.php?t8171.html

People here and at other sites tend to blame hard disks especially
maxtor but you can see that its NOT because of that it covers lots of
HDs . You can also see that any claims of poblems attracts lots of
people who have other problems. Many people initially attracted to
this thread find out they had bad memory sticks etc. and other
problems. However not all problems are explained away here.

If the above is true or even if its not ----- I could see improper
cooling as causing almost all these widespread problems.
I had the same problems with an old KT133a board which had MB chip
cooling problems and my current nforce4 --- the problems which are
very similar happened when I changed my MB cooler from passive to
active.

Right now Im having problems and Im running mostly seagates wtih one
WD and one Hitachi.


So far Ive also tested memory overnight with memtest and no errors
like I expected.
 
P

Paul

Anyway just in case someone else is interested.

Heres some info.

Heres a post from the forum boards at PC perspective that claims the
temps from a probe are very different from the sensors.

-----------------------------------------------------------------
Hello.

I just wandered why my sensors show so low temperatures (~25 Celsius
CPU and ~40 Celsuis other sensor - I have passive cooled VNF4) so I've
measured it with temperature sonde.

I've found that BIOS CPU temp is somewhat different from temperature
measured with sonde in the middle of CPU cooler (which should be a
little bit less worm then CPU core itself):

Code:
BIOS CPU sensor sonde
31 33
33 37
34 41
35 43
37 45
38 46
39 47
40 49
42 50
43 51
44 52

Then I've measured second BIOS temp sensor and temperature in the
middle of chipset cooler (I have passive one):

Code:
BIOS sensor sonde
30 43
32 46
34 48
35 49
36 50

and then I've turned off CPU fan and realized that chipset cooler
depends heavily on it:

Code:
BIOS sensor sonde
39 83
40 86
41 88

So, can someone name me one reason to make so FAKE temperature
readings by board?

I realy feel like someone fooled me around for last 10 months: CPU at
probably over 60 when it appers to be under 50 Celsuis and chipset
burning at probably ~100 Celsius when board reads 45 Celsuis!

Why? Do they want us to burn our boards down?

-----------------------------------------------------------------

See this IF TRUE and IF its typical and not just this board --- would
explain everything.


See heres the claim Nforce4 boards have problems ---- data corruption
problems.

http://forums.nvidia.com/lofiversion/index.php?t8171.html

People here and at other sites tend to blame hard disks especially
maxtor but you can see that its NOT because of that it covers lots of
HDs . You can also see that any claims of poblems attracts lots of
people who have other problems. Many people initially attracted to
this thread find out they had bad memory sticks etc. and other
problems. However not all problems are explained away here.

If the above is true or even if its not ----- I could see improper
cooling as causing almost all these widespread problems.
I had the same problems with an old KT133a board which had MB chip
cooling problems and my current nforce4 --- the problems which are
very similar happened when I changed my MB cooler from passive to
active.

Right now Im having problems and Im running mostly seagates wtih one
WD and one Hitachi.


So far Ive also tested memory overnight with memtest and no errors
like I expected.

I suppose someone could write a book on this topic, but
what good would it do :)

AMD recommends a temperature measurement method for the CPU diode.
No motherboard manufacturer follows these instructions. There will
be an error in the temperature readout as a result. I am not aware
of a monitor chip that uses a two current measurement method.

See section 7.7 on PDF page 77. In particular note 3.
http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/31411.pdf

In the past, there have been boards that measured "socket temperature".
An NTC thermistor was mounted under the socket. There is a resistance
versus temperature curve (not a straight line), and a conversion
equation is used to convert from measured voltage to temperature.
At least at the BIOS level, there seemed to be a "fudge factor"
applied to the result, to make it seem "real". Whether the fudge factor
was determined by lab work with a temperature probe or simply picked
from the air, nobody knows. The fudge value would change from one
BIOS release to the next.

The description on page 77 of the above document, seems to suggest
the diode temperature measurement is no better in this regard.
There are still "too many fingers in the pie", when it comes to
establishing what the actual temperature is. AMD can mess with it.
BIOS designer can choose to ignore the AMD offset stored in the
processor. Temp utility writer is free to do his/her own thing.

The poster in that thread takes the right approach. If you
really want to know what the temp is, purchase a multichannel
digital thermometer (I got a two channel one for $30 at my
local computer store and it was overpriced). The digital
thermometer is easy to calibrate if you want, against some
other thermometer you own. And there are no "fudge factors"
involved. You still have the "displacement error", of not
being right at the source of the heat, but I don't see that
as a killer issue.

As for the observations about the chipset cooling. I've seen
suggestions for what the power dissipation is for an Nvidia
chipset. I've looked at theta_R numbers for various heights
of 35mmx35mm heatsinks. And based on that information, it just
doesn't seem reasonable to be passively cooling an Nvidia
chipset that consists of just the one (North-South) chip.
The temperature of the thing with no airflow should be high,
too high for comfort.

The problem with passive cooling, is the amount of air actually
hitting the "passive" cooler. If an enthusiast reads threads like
this, and removes the chipset cooling solution, one guy may have
an XP-120 with side-spill air blowing over the replacement
cooler. While some other genius may have a water block for the
CPU and no side-spill air at all. I think it is better to be
telling these people, to put the damn fan back on the chipset,
then leave it to chance as to how much cooling air is being
forced onto the "passive" cooler via side-spill.

It would seem, looking at the measured results above, that the
measured BIOS value is the computer case air temperature, and
not the chipset. There could be a mixup as to which sensor is
which.

In any case, I'm going to rush off now, and buy some stock
in a digital thermometer company :)

Paul
 
J

John

In any case, I'm going to rush off now, and buy some stock
in a digital thermometer company :)

Paul

So I guess Im not way off track when I suspect maybe some of these
problems may be caused by marginal cooling. But who knows. It must
largely be OK with most systems or there would be a never ending wail
of problems flooding all the websites and newsgroups.

Thankfully it seems finally fixed. However Ive only had it working for
1 hour so far so maybe Im fooling myself again ----- it seems
reasonably OK now.

Finally the 4-5th time Ive taken it apart and reinstalled the nforce4
cooling fan it seems to be working. I checked pics of other MBs and
they all have the same type of fan and small heatsink.

So far Ive been transferring data from HD to HD and havent had any
lockups. Though Im still paranoid about really stressing it.

Im assuming some of the Agent newsreader problems Ive been having --
it hangs when trying to get new headers, is mainly caused by corrupted
data. My HDs show file structure errors with Seagates HD diag util.

Now I have to figure out how to fix all that. The Seagate 200 gig Ive
taken all data off of it, repartitioned and reformatted twice and even
on a different system and it shows file structure errors with Seagates
diag util. Im thinking of low level formatting it if I can find the
util.I dont see it anywhere in Seagate Tools like it says at their
website.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top