Reboot several times a day

  • Thread starter Thread starter Guest
  • Start date Start date
Leland

In December 2005 I did a clean install with my Windows XP
Home Edition CD purchased in December 2002.

The Health Reports from HD Tune and the surface scan are
more germane to the issue in hand. However, what purpose do
all these hard drives serve. The usage in any is not high.

What is the make and model of your motherboard?

--

Hope this helps.

Gerry
~~~~
FCA
Stourport, England

Enquire, plan and execute
~~~~~~~~~~~~~~~~~~~
 
Hi Gerry,

December 2002 - is that prior to SP2 or with it? I am amazed that you got
the "signed" numbers you got with a CD that, I believe, is close to mine in
terms of age... I'm at a loss to explain the difference between our systems
in that regard. Most of the unsigned drivers I have are from MS...

I just looked at my CD and it says "version 2002" but no month.

I ran HDTune, all options, against all drives. Surface scans showed no
problems with any of the drives including the USB attached drives which were
my prior C and D drives (boot and backup).

The two larger drives are for video and slides. I lost the setup once (RAID
stripe) and so lost all of the video and slides I had digitized. I'm
digitizing 50 years worth of my dad's 16mm movies and 35mm slides.

I also take motorcycle movies and digitize them.

One 232gb drive will be primary, the other will be backup.

MB is ASUS P4S8X. I like them and their products and have been using their
MBs for about 15 years now.

I got the Everest Home edition and produced all of the reports. Tons of
info. Is there anything you wanted to see?

I used the link for the HP driver and let HP check what I have installed and
it came back that I have the latest driver for the PSC750XI.

Everest discovered that I have my AGP disabled. Forgot about that. I did
that early during the process of trying to eliminate what was causing the
problem. Not a heavy graphics user (except for movies) so that has been
barely noticeable to me.

Let me know if there is any more information you would like to see.

Thanks.
 
Leland

The SP2 update was issued in August 2004.

I intended to mention but forgot. My figures are only for unsigned drivers.
Using sigverif if you click on Advanced you can get a list of other unsigned
files.

What about the Health Reports from HD Tune?

Is automatic restart still turned off? Can you copy the latest Stop Error
message. Are they always identical? How long do they occur after booting
the machine. It could be a hardware problem e.g. overheating and may not
be a driver. How much dust is there inside the box. Used compressed air
to remove. Do all fans appear to be working OK?

--

Hope this helps.

Gerry
~~~~
FCA
Stourport, England

Enquire, plan and execute
~~~~~~~~~~~~~~~~~~~
 
HI Gerry,

The sigverif run was the default. System files only. The overall counts
looked like yours, just reversed as far as signed and unsigned.

Auto restart is still off. I am planning to leave it off, at least for the
time being.

I haven't had a stop error since the one I last sent you. But this has
happened before. I can go for a week or two with everything running and
then, wham. It starts again. I have tried over and over to deduce a pattern
but have found none.

Here are the HD Tune Health reports on the 4 HDs that are on all the time
(the USB units are rarely powered on):

HD Tune: WDC WD1200JB-75CRA0 Health

ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 200 200 51 0 Ok
(03) Spin Up Time 97 94 21 5916 Ok
(04) Start/Stop Count 99 99 40 1382 Ok
(05) Reallocated Sector Count 199 199 140 10 Ok
(07) Seek Error Rate 200 200 51 0 Ok
(09) Power On Hours Count 92 92 0 5985 Ok
(0A) Spin Retry Count 100 100 51 0 Ok
(0B) Calibration Retry Count 100 100 51 0 Ok
(0C) Power Cycle Count 99 99 0 1295 Ok
(C4) Reallocated Event Count 198 198 0 2 Ok
(C5) Current Pending Sector 200 200 0 0 Ok
(C6) Offline Uncorrectable 200 200 0 0 Ok
(C7) Ultra DMA CRC Error Count 200 253 0 0 Ok
(C8) Write Error Rate 200 200 51 0 Ok

Power On Time : 5985
Health Status : Ok

HD Tune: WDC WD1200JB-32FUA0 Health

ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 200 200 51 0 Ok
(03) Spin Up Time 145 141 21 3266 Ok
(04) Start/Stop Count 99 99 40 1276 Ok
(05) Reallocated Sector Count 200 200 140 0 Ok
(07) Seek Error Rate 200 200 51 0 Ok
(09) Power On Hours Count 93 93 0 5699 Ok
(0A) Spin Retry Count 100 100 51 0 Ok
(0B) Calibration Retry Count 100 100 51 0 Ok
(0C) Power Cycle Count 99 99 0 1218 Ok
(C2) Temperature 106 253 0 44 Ok
(C4) Reallocated Event Count 200 200 0 0 Ok
(C5) Current Pending Sector 200 200 0 0 Ok
(C6) Offline Uncorrectable 200 200 0 0 Ok
(C7) Ultra DMA CRC Error Count 200 253 0 0 Ok
(C8) Write Error Rate 200 155 51 0 Ok

Power On Time : 5699
Health Status : Ok

HD Tune: WDC WD2500JB-00GVA0 Health

ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 200 200 51 0 Ok
(03) Spin Up Time 125 120 21 6250 Ok
(04) Start/Stop Count 100 100 40 489 Ok
(05) Reallocated Sector Count 200 200 140 0 Ok
(07) Seek Error Rate 200 200 51 0 Ok
(09) Power On Hours Count 98 98 0 1974 Ok
(0A) Spin Retry Count 100 100 51 0 Ok
(0B) Calibration Retry Count 100 100 51 0 Ok
(0C) Power Cycle Count 100 100 0 489 Ok
(C2) Temperature 111 93 0 39 Ok
(C4) Reallocated Event Count 200 200 0 0 Ok
(C5) Current Pending Sector 200 200 0 0 Ok
(C6) Offline Uncorrectable 200 200 0 0 Ok
(C7) Ultra DMA CRC Error Count 200 253 0 31 Ok
(C8) Write Error Rate 200 200 51 0 Ok

Power On Time : 1974
Health Status : Ok

HD Tune: WDC WD2500JB-00GVA0 Health

ID Current Worst ThresholdData Status
(01) Raw Read Error Rate 200 200 51 0 Ok
(03) Spin Up Time 122 117 21 6400 Ok
(04) Start/Stop Count 100 100 40 494 Ok
(05) Reallocated Sector Count 200 200 140 0 Ok
(07) Seek Error Rate 200 200 51 0 Ok
(09) Power On Hours Count 98 98 0 1975 Ok
(0A) Spin Retry Count 100 100 51 0 Ok
(0B) Calibration Retry Count 100 100 51 0 Ok
(0C) Power Cycle Count 100 100 0 494 Ok
(C2) Temperature 106 91 0 44 Ok
(C4) Reallocated Event Count 200 200 0 0 Ok
(C5) Current Pending Sector 200 200 0 0 Ok
(C6) Offline Uncorrectable 200 200 0 0 Ok
(C7) Ultra DMA CRC Error Count 200 253 0 24 Ok
(C8) Write Error Rate 200 200 51 0 Ok

Power On Time : 1975
Health Status : Ok

The only thing unusual about the health reports that I noticed is that the C
drive shows no temperatures. The other 3 do. The C drive is the first one
in the above list.

There seem to be two errors and from looking at the event log, 2 stop codes.
One is for a device driver and the other is for random access memory. No
way to predict the amount of time after bootup. I have been up for about 12
hours now without a failure. Have gone up to 20 hours after boot before a
failure occurred. Other times it's 10 minutes.

Can't remember if I mentioned but I have downloaded the MS memory exercise
utility and it shows no failures. I know I did mention that I swapped memory.

Thanks.
 
On Sat, 21 Jan 2006 22:31:02 -0800, Leland Sheppard

Here are the HD Tune Health reports on the 4 HDs that are on all the time
(the USB units are rarely powered on):

Being powered off and on regularly can be worse than running all the
time, so I wouldn't be surprised to see the USBs fail first.
HD Tune: WDC WD1200JB-75CRA0 Health
ID Current Worst ThresholdData Status
(05) Reallocated Sector Count 199 199 140 10 Ok
(C4) Reallocated Event Count 198 198 0 2 Ok

Those aren't so good - looks like the HD has already started to "fix"
bad sectors on the fly...
(C5) Current Pending Sector 200 200 0 0 Ok
(C6) Offline Uncorrectable 200 200 0 0 Ok

....though so far it's succeeded in doing so. I don't see a
temperature reading; is this an old HD, say 10G or smaller?
HD Tune: WDC WD1200JB-32FUA0 Health
ID Current Worst ThresholdData Status
(C2) Temperature 106 253 0 44 Ok

44C is too hot for comfort - fan that HD so it stays under 40C, would
be my instinct here.
HD Tune: WDC WD2500JB-00GVA0 Health
HD Tune: WDC WD2500JB-00GVA0 Health
(C2) Temperature 106 91 0 44 Ok

Another hottie... (and not in a good way)
There seem to be two errors and from looking at the event log, 2 stop codes.
One is for a device driver and the other is for random access memory.

Do you have ECC RAM?
Wondering how a RAM error could be trapped, otherwise.
Can't remember if I mentioned but I have downloaded the MS memory exercise
utility and it shows no failures. I know I did mention that I swapped memory.

Overnight, or even over-weekend, in MemTest86...

www.memtest.org - OK

www.memtest86.com - OK

www.memtest . com - domain-squatter with pop-ups, avoid


---------- ----- ---- --- -- - - - -
Don't pay malware vendors - boycott Sony
 
Hi,

Thanks for responding.

I don't know why that first drive has no temp. reading. I noticed that too.
It and the one listed after it are both 120gb WD drives which I purchased
the same day.

The last two are 250gb WD also purchased the same day as each other.

I buy drives in pairs because I use one to back up the other so I always get
2 at the same time.

Regarding memory - no. It's not ECC. Don't know how XP detects corruption...

Interestingly, I went through the device manager list and did an 'update
driver' on EVERY device or pseudo device on the system. One device showed a
driver problem - my HP PSC750XI - and I let XP update the driver from MS.
Then I went to HP and asked it to check and see if I had the latest driver
and it said I do.

Since that driver was installed, I have not had a crash (knocking on the
nearest piece of wood). I have gone as much as a week or 10 days without a
crash before only to have them return so I'm not rejoicing yet. Cautiously
optimistic is the phrase that comes to mind... It's been 6 or 7 days now, I
think...

I will go get the memory test you mentioned. I'm not really too concerned
about the memory. I got the error with the original memory and swapped it
for a new batch and still got the error. The MS diagnostic says it is
usually hardware; not always. I'm guessing that this whole thing has been
software. The only pieces of hardware that haven't been changed are the MB
and the CPU...

Thanks for your thoughts.
 
On Thu, 26 Jan 2006 21:01:02 -0800, Leland Sheppard
I don't know why that first drive has no temp. reading. I noticed that too.
It and the one listed after it are both 120gb WD drives which I purchased
the same day.

Wow, that is strange! Do they have the same board part numbers and
model numbers? Are they connected to the same sort of controller? By
120G, I'd expect all HDs to monitor temperature, it's usually 20G, 10G
and smaller that come up as --C.
I buy drives in pairs because I use one to back up the other so I always get
2 at the same time.

OK - it's RAID- and board-swap-friendly too ;-)
Regarding memory - no. It's not ECC. Don't know how XP detects corruption...

Me neither. Even when there's parity error checking, it's nearly
always the BIOS that steps in with a Black Screen of Death, rather
than a polite Windows error dialog and log entry. That's why I'm
thinking that "memory error" is spurious, or possibly an "out of
memory" error that is almost equally likely to be spurious.
Interestingly, I went through the device manager list and did an 'update
driver' on EVERY device or pseudo device on the system. One device showed a
driver problem - my HP PSC750XI - and I let XP update the driver from MS.

Ew... I wouldn't do that (as discussed a while back in another
thread). MS is the wrong place to get drivers, unless desperate.
Since that driver was installed, I have not had a crash (knocking on the
nearest piece of wood). I have gone as much as a week or 10 days without a
crash before only to have them return so I'm not rejoicing yet. Cautiously
optimistic is the phrase that comes to mind... It's been 6 or 7 days now, I
think...

It could be, either by genuinely updating some driver error, orsimply
by replacing damaged code files for the same (or even later) driver
version. This is particularly likely if the HP device was added at a
time you had bad RAM; the original driver code files could have been
corrupted as they were being created. Bad RAM = baaad news.
I will go get the memory test you mentioned. I'm not really too concerned
about the memory. I got the error with the original memory and swapped it
for a new batch and still got the error. The MS diagnostic says it is
usually hardware; not always. I'm guessing this thing has been software.

Secondary damage.

Primary effects of bad RAM are transient, random, poorly
reproduceable, and typically cover all scopes and contexts.

But when this causes secondary corruption of the hard drive contents,
this secondary damage will often be limited to certain contexts, and
reproduceable within those contexts, e.g. mileage such as "whenever I
do abc, the system does xyz".

These effects will persist after the bad RAM is replaced, and then be
even more "static" (limited to specific scopes and contexts,
reproduceable) as the primary effects of bad RAM are removed.

Secondary damage can affect anything, because any disk read can be
bit-flipped to a disk write, and any disk address to be written to can
be bit-flipped to write to an arbitrary raw disk address instead.
However, it will most likely affect code created (installed) since the
RAM went bad - making "just re-install Windows" a disaster.

Doing a defrag can have the same effect.

In the case of a recently added hardware item, the drivers for that
item may be corrupted. A "repair install" done after the RAM is fixed
won't replace these code files, and if these code files are always in
effect, the result can be ongoing, broadly-scoped hell.

So you may well have "whacked the rat" - well done, if so!


---------- ----- ---- --- -- - - - -
Don't pay malware vendors - boycott Sony
 
Hi,

Thanks for your thoughts.

I never have had bad ram on this machine either. I swapped memory just in
case; I was swapping everything else, figured I might as well do that too. I
had memory available for another PC I'm building and just used that.

FYI, I downloaded the memory test you mentioned and let it do a couple of
passes on the current memory. Showed no errors which I really expected.

Keep your fingers crossed.

Thanks again.

Leland
 
Back
Top