Big problem - CPU says it is -128 degrees!

A

AdenOne

System is Pentium D 925 3GHz D0 revision, ECS Alhena5 mainboard with
latest BIOS update. (Compaq-Presario SR5020AN but swapped the Celeron
3.33GHz for a P-D about 5 months ago) it has worked fine ever since.

This got me by surprise - my one computer was in sleep mode, as it
often is, and when i moved the mouse to wake it, nothing happened, it
turned on but never resumed windows.

So I reset it and suddenly the CPU fan failure warning came up,
initially the fan spun up but then just stopped a few seconds into
POST. According to BIOS, CPU was -128 degrees C. Obviously an error.

So I held a molex powered fan over the CPU fan to at least get some
air flowing, restarted, and overrode the CPU fan failure message and
started Windows Vista. Once at the desktop, SpeedFan started as usual,
I have been using it for about 3 weeks to adjust fan speeds and so on
to cool the PC a bit more efficiently. It has always worked fine.
According to speedfan, the CPU was at -128. I manually set CPU fan
speed to 100%, and it spun up! So now I can at least use windows
without holding a molex fan.

Now, the second strange thing - it seems part of speedstep has also
failed as CPU switches from 3 to 2.4GHz but stays at a voltage of
1.324v where usually at idle it is 1.20v

Thirdly, according to speedfan the +3.3v is actually 4.08v, +5v is
6.85v, +12v is a massive 16.32v while -12v is 4.01v ?

Fourth, CPUID's HWMonitor can no longer read CPU temp, Case temp, or
ANY of the voltages!

As long as I keep speedfan set on "Software Controlled" then I can
manually set the fan speed - the second I put it onto
"SmartGuardian" (the original setting) the fans stop - due most likely
to a CPU temp of -128 C !!!

Whats going on? Could it be that the Pentium D's temperature and\or
voltage sensor is broken? Or is it the mainboard's hardware sensor, an
ITE8718F.

Help would be much appreciated, as I have no idea whats going on now.
Never had this sort of error before. And it has been working fine for
months without even one crash or hang.
 
P

Paul

AdenOne said:
System is Pentium D 925 3GHz D0 revision, ECS Alhena5 mainboard with
latest BIOS update. (Compaq-Presario SR5020AN but swapped the Celeron
3.33GHz for a P-D about 5 months ago) it has worked fine ever since.

This got me by surprise - my one computer was in sleep mode, as it
often is, and when i moved the mouse to wake it, nothing happened, it
turned on but never resumed windows.

So I reset it and suddenly the CPU fan failure warning came up,
initially the fan spun up but then just stopped a few seconds into
POST. According to BIOS, CPU was -128 degrees C. Obviously an error.

So I held a molex powered fan over the CPU fan to at least get some
air flowing, restarted, and overrode the CPU fan failure message and
started Windows Vista. Once at the desktop, SpeedFan started as usual,
I have been using it for about 3 weeks to adjust fan speeds and so on
to cool the PC a bit more efficiently. It has always worked fine.
According to speedfan, the CPU was at -128. I manually set CPU fan
speed to 100%, and it spun up! So now I can at least use windows
without holding a molex fan.

Now, the second strange thing - it seems part of speedstep has also
failed as CPU switches from 3 to 2.4GHz but stays at a voltage of
1.324v where usually at idle it is 1.20v

Thirdly, according to speedfan the +3.3v is actually 4.08v, +5v is
6.85v, +12v is a massive 16.32v while -12v is 4.01v ?

Fourth, CPUID's HWMonitor can no longer read CPU temp, Case temp, or
ANY of the voltages!

As long as I keep speedfan set on "Software Controlled" then I can
manually set the fan speed - the second I put it onto
"SmartGuardian" (the original setting) the fans stop - due most likely
to a CPU temp of -128 C !!!

Whats going on? Could it be that the Pentium D's temperature and\or
voltage sensor is broken? Or is it the mainboard's hardware sensor, an
ITE8718F.

Help would be much appreciated, as I have no idea whats going on now.
Never had this sort of error before. And it has been working fine for
months without even one crash or hang.

Occasionally, parts do not get soldered properly to the motherboard.
Sometimes the legs on a chip are "dry", or there is an intermittent
connection. Using a strong light and some magnification, inspect
the ITE8718F in the corner.

(upper right hand corner of the picture here)
http://h10025.www1.hp.com/ewfrf/wc/...4922&cc=us&dlc=en&lc=en&jumpid=reg_R1002_USEN

Paul
 
A

AdenOne

Occasionally, parts do not get soldered properly to the motherboard.
Sometimes the legs on a chip are "dry", or there is an intermittent
connection. Using a strong light and some magnification, inspect
the ITE8718F in the corner.

(upper right hand corner of the picture here)http://h10025.www1.hp.com/ewfrf/wc/genericDocument?docname=c00864922&...

Yeah I have looked at the ITE chip already, it has like hundreds of
little L shaded legs, i cannot tell if they are soldered on correctly
or not, it has been fine for about 6 or 8 months, been using the
Pentium D for 5 of those, does this chip control CPU voltages as well,
or is it only fan speeds and temperature sensors?

If it is broken, I suppose the only solution would be a new
motherboard, right?
 
A

AdenOne

Would flashing the bios with the current version help, or would it not
flash as it is already the current version? I cant believe how out of
range the voltages are, I mean 16v for the +12v lines, this is way
over spec - unless the ITE chip is faulty and not sensing them
correctly - because the PC works fine, no crashes or hangs, even with
the voltages reported as so out of range. Makes me think, it must be
the ITE chip - as I said, I wonder what all the ITE chip does actually.
 
A

AdenOne

Well - I removed all AC power, unclipped the mobo battery, and left it
for about 30 mins, reseated the battery, connected the AC and now all
of a sudden everything is all fine again, reading the CPU temp
correctly, voltages are back to normal, fans are fine, very strange, I
think I will do a full backup though just in case it goes again.
 
P

Paul

AdenOne said:
Would flashing the bios with the current version help, or would it not
flash as it is already the current version? I cant believe how out of
range the voltages are, I mean 16v for the +12v lines, this is way
over spec - unless the ITE chip is faulty and not sensing them
correctly - because the PC works fine, no crashes or hangs, even with
the voltages reported as so out of range. Makes me think, it must be
the ITE chip - as I said, I wonder what all the ITE chip does actually.

How is the BIOS flashing, related to the hardware monitor reading process ?

I don't see any relationship there, so I don't see what you
gain by doing that.

*******
The 8712 datasheet is here.

http://www.ite.com.tw/product_info/file/pc/IT8718F_V0 3_(for C version).zip

There are more examples of Super I/O chips here.

http://www.winbond.com/hq/enu/Produ...ogicIC/SuperIO/LPCSuperIOforDesktopAndServer/

Try page 33 of this one, a spec sheet for W83627HG SuperI/O plus monitor.
It shows how the inputs are attenuated with external resistors, so that
the resulting voltage signal stays within the bounds that the chip can
handle.

http://web.archive.org/web/20061110.../e-winbondhtm/partner/PDFresult.asp?Pname=182

Your voltage readings almost suggest that VREF is lower than normal. VREF
might also be tied in some way, to the voltage used as a reference by
the internal ADC (analog to digital converter).

The monitor doesn't read all the voltages directly. Only voltages safely
within the dynamic range of the ADC can be measured without scaling
resistors being place in the input path. So a 3.3V power supply rail,
or Vcore, can be measured without scaling resistors in place. For
something like a 12V signal, the signal needs to be scaled by a
factor of 4, so the input sees a 3V signal instead. The software
is then responsible for multiplying the measured value by a
factor of 4, so you get 12V on the readout. But the software
won't use the right scale value, unless the hardware designer adheres
to whatever the datasheet suggests for resistor values. If the hardware
and software disagree on scaling, then errors result. (Which is why
my motherboard has a permanently high 12V readout, while the
other measurement channels are fine.)

You could be right. It could be a failure of the SuperI/O chip,
or something could be loading VREF and throwing everything off.
Using the pinout in the ite.com.tw datasheet, look around for
any conductive crap around the VREF pin. Or damage to any of
the resistor networks in the vicinity of the chip. Or even a
standoff or screw shorting to an adjacent resistor. The scaling
resistors will be sprinkled around that chip.

I used to spend hours in the lab, with a magnifier and halogen
lamp, looking for stuff like that. You'd be surprised what
you can find that way.

Paul
 
P

Paul

AdenOne said:
Well - I removed all AC power, unclipped the mobo battery, and left it
for about 30 mins, reseated the battery, connected the AC and now all
of a sudden everything is all fine again, reading the CPU temp
correctly, voltages are back to normal, fans are fine, very strange, I
think I will do a full backup though just in case it goes again.

I don't see how the two are related. (CMOS battery versus misbehaving
SuperI/O chip hardware monitor.) Just another mystery to ponder :)

There is an electrical connection between the battery and the hardware monitor.
The VBAT pin on the SuperI/O, provides the ability for the SuperI/O
to measure the CMOS battery voltage. But that isn't typically
shown in hardware monitoring programs. And I don't see a good
mechanism, for it to upset things.

I think portions of the CMOS memory have checksum protection. A
checksum is not bulletproof, but if there was a problem with the
data contained in the CMOS memory, you might see an error message
during the BIOS POST.

So what we know, is the battery removal fixed it, but as yet, no
plausible theory as to how the battery can upset the chip.

Does the CMOS battery run down and need frequent replacement ?

Does the computer lose its BIOS settings (other than when you remove
the battery) ?

Paul
 
A

AdenOne

I don't see how the two are related. (CMOS battery versus misbehaving
SuperI/O chip hardware monitor.) Just another mystery to ponder :)

There is an electrical connection between the battery and the hardware monitor.
The VBAT pin on the SuperI/O, provides the ability for the SuperI/O
to measure the CMOS battery voltage. But that isn't typically
shown in hardware monitoring programs. And I don't see a good
mechanism, for it to upset things.

I think portions of the CMOS memory have checksum protection. A
checksum is not bulletproof, but if there was a problem with the
data contained in the CMOS memory, you might see an error message
during the BIOS POST.

So what we know, is the battery removal fixed it, but as yet, no
plausible theory as to how the battery can upset the chip.

Does the CMOS battery run down and need frequent replacement ?

Does the computer lose its BIOS settings (other than when you remove
the battery) ?

Paul

No. It is less than a year old, has a July 2007 BIOS, battery is fine,
the only thing I can think of is that SpeedFan v4.33 upset the CMOS
settings somehow - I did not fiddle with voltages or anything, only
the fan speed settings. I tried using the CMOS_CLR jumper, but it did
not work, only by removing the battery and leaving for a while did it
work.
 
M

~misfit~

Somewhere on teh intarweb "AdenOne" typed:
System is Pentium D 925 3GHz D0 revision, ECS Alhena5

ECS??? Why in the owrld would you have ECS?
Never had this sort of error before. And it has been working fine for
months without even one crash or hang.

Then you've been lucky. It seems it just ran out.
 
P

Paul

AdenOne said:
No. It is less than a year old, has a July 2007 BIOS, battery is fine,
the only thing I can think of is that SpeedFan v4.33 upset the CMOS
settings somehow - I did not fiddle with voltages or anything, only
the fan speed settings. I tried using the CMOS_CLR jumper, but it did
not work, only by removing the battery and leaving for a while did it
work.

Well, the only connection between the two, is the connection of
the battery to the VBAT pin. And I don't understand how that would
break anything. Unless a couple things are shorting together
somewhere.

Paul
 
A

AdenOne

Well, the only connection between the two, is the connection of
the battery to the VBAT pin. And I don't understand how that would
break anything. Unless a couple things are shorting together
somewhere.

Paul

Yes, thats right, but would the CMOS not store some sort of info about
voltages, SpeedStep, temperatures related to the SmartGuardian feature?
 
P

Paul

AdenOne said:
Yes, thats right, but would the CMOS not store some sort of info about
voltages, SpeedStep, temperatures related to the SmartGuardian feature?

Sure, the BIOS settings are saved. I don't know what format is used
for the data, or which particular bytes are used. Some of the bytes
at the beginning of the CMOS RAM have standard definitions.

You already explained though, that the temp reading as -128, that
the Smart Guardian screws up because the temperature reads so low,
so that doesn't imply a Smart Guardian setting is corrupted. Merely
that the Smart Guardian takes the low temperature into its algorithm,
and determines that the fan can be turned off.

I'm not sure exactly how Speedstep info is obtained. Yes, there
is a high multiplier and a low multiplier. Maybe the details are
inside the processor, as privileged registers or something. I doubt
those details would need to be stored. A flag that says whether
Speedstep is to be used or not should be stored in the CMOS.

But the thing is, as I said before, the contents of the CMOS
are protected by checksums. There is more than one checksum.
If the checksum is bad, there should be an error message on
the BIOS screen, and then the settings are returned to defaults.
So if the CMOS RAM was corrupted, the checksum scheme should pick
that up. I don't see a way of corrupting it over and over again,
without it being eventually caught by the check during early
POST.

Could the Vcore voltage error, be a measurement error only ?
And not an actual failure to program the correct voltage ?

Paul
 
A

AdenOne

Sure, the BIOS settings are saved. I don't know what format is used
for the data, or which particular bytes are used. Some of the bytes
at the beginning of the CMOS RAM have standard definitions.

You already explained though, that the temp reading as -128, that
the Smart Guardian screws up because the temperature reads so low,
so that doesn't imply a Smart Guardian setting is corrupted. Merely
that the Smart Guardian takes the low temperature into its algorithm,
and determines that the fan can be turned off.

I'm not sure exactly how Speedstep info is obtained. Yes, there
is a high multiplier and a low multiplier. Maybe the details are
inside the processor, as privileged registers or something. I doubt
those details would need to be stored. A flag that says whether
Speedstep is to be used or not should be stored in the CMOS.

But the thing is, as I said before, the contents of the CMOS
are protected by checksums. There is more than one checksum.
If the checksum is bad, there should be an error message on
the BIOS screen, and then the settings are returned to defaults.
So if the CMOS RAM was corrupted, the checksum scheme should pick
that up. I don't see a way of corrupting it over and over again,
without it being eventually caught by the check during early
POST.

Could the Vcore voltage error, be a measurement error only ?
And not an actual failure to program the correct voltage ?

Paul

Well I think vCore was roughly correct, as at full load my CPU goes
from 1.200v all the way up to 1.346v, and the Pentium D is safe up to
about 1.400v. What was weird was that it stepped down to 2.4GHz, but
did not decrease the voltage accordingly.

I agree with you checksum issue, it should have picked it up. I still
think the ITE chip is somehow to blame for all of this weird stuff.

The only anomaly still present is that CPU-Z and HWMonitor, both from
CPUID, take much longer than usual to load, instead of a few seconds
its like minutes now, and the only reason I can think is that they are
having difficulty reading SMBus information or something.
 
P

Paul

AdenOne said:
The only anomaly still present is that CPU-Z and HWMonitor, both from
CPUID, take much longer than usual to load, instead of a few seconds
its like minutes now, and the only reason I can think is that they are
having difficulty reading SMBus information or something.

I think the datasheet for 8718 says it connects to LPC bus. That bus
is also used for flash chips sometimes as well. LPC stands for low pin
count, and the data part is perhaps 4 bits wide. If the clock on it
was 33MHz, then 16MB/sec might be an upper limit on the bus.

SMBUS (system management bus) would still be used for the DIMMs.

One advantage of the LPC bus, is there should no longer be the problem
with multiple monitoring programs, corrupting each other's SMBUS reads.
If the SMBUS was used, then the slow serial read of one program, could
be interrupted by an attempt to read the serial bus by another program.
The LPC operations are more likely to be atomic, so if you want
to run two monitor programs, it should work. The only time multiple
programs get in trouble, is when writing to registers. (For example,
it wouldn't be very good, to have two programs regularly changing
the fan RPM scaling register.)

Paul
 
A

AdenOne

I think the datasheet for 8718 says it connects to LPC bus. That bus
is also used for flash chips sometimes as well. LPC stands for low pin
count, and the data part is perhaps 4 bits wide. If the clock on it
was 33MHz, then 16MB/sec might be an upper limit on the bus.

SMBUS (system management bus) would still be used for the DIMMs.

One advantage of the LPC bus, is there should no longer be the problem
with multiple monitoring programs, corrupting each other's SMBUS reads.
If the SMBUS was used, then the slow serial read of one program, could
be interrupted by an attempt to read the serial bus by another program.
The LPC operations are more likely to be atomic, so if you want
to run two monitor programs, it should work. The only time multiple
programs get in trouble, is when writing to registers. (For example,
it wouldn't be very good, to have two programs regularly changing
the fan RPM scaling register.)

Paul

I see what you mean. I only use SpeedFan v4.33 to change fan speeds -
HWMonitor is used to verify temperatures or to check my GPU temp. It
does not transmit instructions to anything. By the way, since
restarting the PC last night, everything is fine and HWMonitor and CPU-
Z are starting as normal.

One strange thing is that my vCore at idle is now 1.184v - where as
before I never saw it at less than 1.200v; now i know its pretty close
and it is not causing issues so its fine with me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top