Occasional Boot failure

J

Jeff

....anyone know what to check if I get an occasional boot failure on a new
machine. ...have a WD 10K raptor and Gigabyte GA-M59SLI mainboard with an
AMD FX-62 processor. All works well, but every few weeks the machine will
post fine, go through the regular loading of the machine's bios and scsi
card's bios, and then search for boot drives, putting up a boot failure
warning, apparently not detecting the WD drive or at least the fact that
it's bootable. The bios is already set to boot from the WD drive first.
There is another large storage drive in the machine, but I don't see that
this matters. I've checked the connections from the mainboard to the drive,
changed the sata cable, but still the same occasional thing. If I turn the
machine off and then reboot, it usually boots fine the 2nd time, although
today it took two new attempts (with nothing done differently between
attempts). Again this is very occasional - perhaps somehow related to me
running RC1 of Vista, but I can't imagine how the OS itself might be
responsible. Any ideas?

Jeff
 
R

Rod Speed

Jeff said:
...anyone know what to check if I get an occasional boot failure on a new machine. ...have a WD
10K raptor and Gigabyte GA-M59SLI mainboard with an AMD FX-62 processor. All works well, but every
few weeks the machine will post fine, go through the regular loading of the machine's bios and
scsi card's bios, and then search for boot drives, putting up a boot failure warning, apparently
not detecting the WD drive or at least the fact that it's bootable. The bios is already set to
boot from the WD drive first. There is another large storage drive in the machine, but I don't see
that this matters.

It may be significant.
I've checked the connections from the mainboard to the drive, changed the sata cable, but still
the same occasional thing.
If I turn the machine off and then reboot, it usually boots fine the 2nd time, although today it
took two new attempts (with nothing done differently between attempts).

What happens if you hit the reset button instead of turning it off when
it fails to boot initially ? Not easy to test with the failure rate so low tho.
Again this is very occasional - perhaps somehow related to me running RC1 of Vista, but I can't
imagine how the OS itself might be responsible.

Yeah, you wouldnt normally expect that sort of success on reboot effect with the OS.
Any ideas?

List the power supply details. It may be marginal and isnt starting up completely
reliably with the drives spinning up fully on the failure to boot situations. The extra
hard drive may be enough to push it over the edge power supply loading wise.

Thats the reason for trying a reset, that doesnt spin the drives down and up
again and if it always restarts fine on a reset its likely a marginal power supply.

List the SMART data for the drives using Everest.
That may indicate longer than normal drive spinup times.
http://www.majorgeeks.com/download.php?det=4181
 
J

Jeff

What happens if you hit the reset button instead of turning it off when
it fails to boot initially ? Not easy to test with the failure rate so low
tho.


Lian Li case that doesn't have a reset button.


List the power supply details. It may be marginal and isnt starting up
completely
reliably with the drives spinning up fully on the failure to boot
situations. The extra
hard drive may be enough to push it over the edge power supply loading
wise.

....powersupply shouldn't be a problem in my case, but I would agree with you
after thinking about it that this would be a first consideration. The power
is a new Corsair 625 watt unit that is powering only the single [dual-core]
processor, mid-range video, one 10K raptor and one 750 gig sata drive.
....also a dual
channel adaptec 39160 scsi card and a TV tuner card. That scsi card delays
the boot for about 30 seconds or so, which would give the drives additional
time to spin up.

List the SMART data for the drives using Everest.
That may indicate longer than normal drive spinup times.
http://www.majorgeeks.com/download.php?det=4181


I'm new to Everest. ...just downloaded and installed, but under Storage >
Smart there is no information.
Under other areas in the Storage both drive and all partitions of those
drives are correctly noted. What is it that Everest is supposed to show that
would be helpful here?
 
R

Rod Speed

Lian Li case that doesn't have a reset button.

Bit academic anyway with the failure rate so low.
...powersupply shouldn't be a problem in my case, but I would agree with you after thinking about
it that this would be a first
consideration. The power is a new Corsair 625 watt unit that is powering only the single
[dual-core] processor, mid-range video, one 10K raptor and one 750 gig sata drive. ...also a dual
channel adaptec 39160 scsi card and a TV tuner card.

Should be fine, thats a pretty decent power supply.
That scsi card delays the boot for about 30 seconds or so, which would give the drives additional
time to spin up.
True.
I'm new to Everest. ...just downloaded and installed, but under Storage > Smart there is no
information.

You do get that sometimes, Everest cant see all drives in all systems.

Try with smartctl from a bootable linux live CD like knoppix.

Not worth bothering with now tho given that its a decent power supply.
Under other areas in the Storage both drive and all partitions of those drives are correctly
noted. What is it that Everest is supposed to show that would be helpful here?

The smart stats for the drives include a spinup time.

Looking more like a fault now. Not going to be easy to
work out where it is tho with the fault being so rarely seen.

If you have an obliging supplier they may be
prepared to swap out the motherboard and
power supply to see if the problem goes away.

If not, I'd basically put up with it and see if it gets worse. Some hint that it will
get worse given that you did need to reboot twice most recently to get it to boot.

I have seen that effect with a system that had an intermittent short to case.
The best test for that possibility is to run the motherboard loose on the
desktop, but that isnt really that feasible with the fault so rarely seen.
 
J

Jeff

The smart stats for the drives include a spinup time.

I'm curious how an application that ran in Windows could measure the spin-up
time of the drive that contained the boot partition for the Windows OS?
....not sure that I understand how that's possible. ..but I suppose that if
all necessary files were already in ram, and the disk was stopped and
started, that such a thing were possible.
If you have an obliging supplier they may be
prepared to swap out the motherboard and
power supply to see if the problem goes away.

I don't think that I would be so obliging. As you mentioned, the problem
isn't all that frequent. ...probably a bigger deal if I'm away and my wife
attempts to boot the machine herself.

Your suggestion about the problem being the spin-up does make sense,
however. I probably could do something to slightly delay booting to get the
drives longer to spin up and see if that does anything. ...simply enabling
the bios on the scsi card would do this (which isn't needed if I don't boot
from a drive connected to that card).

....and I probaby won't get too excited until after I load the final version
of Vista in the unlikely event that the problem could be with the
pre-release version of the OS.

Jeff
 
R

Rod Speed

I'm curious how an application that ran in Windows could measure the spin-up time of the drive
that contained the boot partition for the Windows OS? ...
not sure that I understand how that's possible. ..

It doesnt. The drive measure the spinup time itself, and the Win app just gives
you access to the spinup times that the drive stores in the SMART data.
but I suppose that if all necessary files were already in ram, and the disk was stopped and
started, that such a thing were possible.

Nar, it doesnt do it like that.
I don't think that I would be so obliging.

Those who know anything about decent customer service would.
Corse that isnt likely to be those that sell at the lowest price.
As you mentioned, the problem isn't all that frequent. ...probably a bigger deal if I'm away and
my wife attempts to boot the machine herself.

It shouldnt be hard to tell her to just keep trying when it doesnt
boot. And if she's likely to forget that by the time it happens
again, just stick a note on the PC before going away etc.
Your suggestion about the problem being the spin-up does make sense, however.

I doubt its the problem given that the scsi card provides enough delay so
that the drives should spin up fine, and you have a decent power supply.

Its much more likely to be a power supply or
motherboard fault or even an intermittent short to case.
I probably could do something to slightly delay booting to
get the drives longer to spin up and see if that does anything.
...simply enabling the bios on the scsi card would do this (which
isn't needed if I don't boot from a drive connected to that card).

I think its unlikely to be the problem now that its clear
that the power supply is pretty adequate specs wise.
...and I probaby won't get too excited until after I load the final version of Vista in the
unlikely event that the problem could be with the pre-release version of the OS.

And it will be interesting to see if its getting worse now that you have
seen one example of needing to start three times to get it up. Thats
absolutely classic hardware fault type symptoms but currently the
technical term thats appropriate is 'pathetically inadequate sample'

Time will tell, and fortunately its just a nuisance,
doesnt stop you from using the system.
 
J

Jeff

Its much more likely to be a power supply or
motherboard fault or even an intermittent short to case.

I actually was having more problems right after I finished building. The
Raptor did not come with any jumpers attached. The Seagate 750 gig had a
jumper that limited it to the Sata 1 transfer rates that I didn't realize
was there intially. After removing that jumper, the problem that I was
having where the system wasn't seeing either the entire drive or one of the
drive's partitions (I can't remember which) disappeared completely. I'm not
sure that I understand why, but I haven't had that problem once since
removing the jumper and that was about a month ago.

Could a motherboard issue or power supply issue, including the case short,
allow everything prior to the OS boot to work properly? In other words, the
bios has always functioned perfectly and the scsi card's bios starts just
fine even when I was getting the occasional boot failure.


Jeff
 
R

Rod Speed

I actually was having more problems right after I finished building.

OK, this is critical info if the symptoms were similar.
The Raptor did not come with any jumpers attached. The Seagate 750 gig had a jumper that limited
it to the Sata 1 transfer rates that I didn't realize was there intially. After removing that
jumper, the
problem that I was having where the system wasn't seeing either the entire drive or one of the
drive's partitions (I can't remember
which) disappeared completely. I'm not sure that I understand why, but I haven't had that problem
once since removing the jumper and that was about a month ago.

On the other hand your current problem is with the drive
not being seen. It may be that that problem was always
there and was complicated by the SATA mode jumper
question. Particularly with the fault rate so low now.

Those uncommon faults can be a hell of a cow to diagnose just
because you cant really test much to see if that fixes the problem.
Could a motherboard issue or power supply issue, including the case short, allow everything prior
to the OS boot to work properly?

Yes, I did see that with an intermittent short to case and a motherboard
and power supply fault can certainly make drives invisible to the system.
In other words, the bios has always functioned perfectly and the scsi card's bios starts just fine
even when I was getting the occasional boot failure.

Sure, but the bios may well not be finding the drive when it polls
for drives due to the fault. Can you remember if the drive showed
up in the black bios screen at boot time when it failed to boot ?

If you cant, watch for that the next time it fails to boot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top