For the love of God.. Help me!

A

Andrew McLaren

Michichael said:
Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
mantra
This time it was trying to play Counterstrike: Source.
So this isn't just limited to Halo... BSOD's so far today (got on it 3
hours
ago...): 6.

I guess you're running latest ForceWare drivers (163.69 as of today). I'm
also assuming you've trawled the knowledgebase at Nvidia, read their online
help files, etc and not found any solutions yet. So we'll go straight to the
heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
mutually exclusive.

Firstly, take the system back to a minimal config baseline that works. From
there, you incrementally add or adjust configuration items, one by one, to
bring the system back in line with its present configuration. After each
adjustment, exercise the system in a way which should reproduce the error
(ie play Halo2, I guess). The point at which you start to see the problem
re-appear, will give you a good clue where the cause of the problem may lie.

Here's what you'd do, in a serious industrial setting. Since it's a home
machine, you might choose to be a bit less disciplined; although each
departure diminishes the fidelity of the exercise (and possibly, turns it
into a waste of time if you get too cavalier). Much patience is required.

- first, back up your user data

- remove as many peripheral devices as you can - printers, cameras, sound
cards, scanners etc. We want CPU, memory, graphics card, and one hard disk;
that's all

- if the machine has been overclocked, take *everything* back to the default
factory settings - CPU, buss, graphic processor, the lot.

- re-install Vista from scratch, from original media, reformatting the hard
disk and avoiding any third party drivers during the installation process
(only use the Microsoft-supplied drivers).

- you now have a very plain, vanilla installation of Vista. Performance
might be less than what you'd like; but our goal here is stability, not
performance! (not yet, anyway).

- install Halo2, as the tool with which to exercise the system.

- reproduce the problem scenario; eg, play Halo2 for >30 minutes and verify
that it does not crash (this is the painful part: you need to play games for
at least 30 minutes :)

If it still crashes in this very vanilla environment, then you have a
fundamental hardware problem with your machine. It needs to be examined by a
skilled computer technician. I mean someone with a certificate in
electronics engineering, or similar, who can use an oscilliscope, logic
probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
that it is).

Assuming that Halo2 does run okay, start changing your config back to how it
was. It is very important you only change one thing at a time, and then test
after each change. For example:
- confirm the new, clean install of Vista runs okay.
- then run Windows Update to patch your machine to the current revision
level. Test again
- run DXDiag and export a report of your settings ("Save all Information"),
for future reference and comparison.
- next, install the current Nvidia-supplied Forceware drivers. Test again.
- next, re-attach your peripheral devices, one by one. Exercise the system
in between each, to verify the system continues to run normally. You might
need to spread this over a few days.
- install any additional vendor-supplied drivers for your various devices.
Test system again.
- install your normal user applications eg Office, Photoshop, etc. Avoid
installing any apps which install kernel-mode drivers; we want to stick to
user mode stuff, for now.
- exercise the system. This is a fairly good baseline: a plain installation
of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
the system is still stable, at this point.
- now install any apps which include kernel mode drivers. Test the system
again.
- assuming you want to return to an overclocked configuration, you can start
overclocking again, now. But, don't leap straight to the maximum overclock -
just ramp up the CPU a little bit, and then test. Then increase a little bit
more, and test the system. Then change your memory timing settings, if
that's what you wish ... but again, don't go straight to an aggressive
setting, just moderate - and test the system again.

At some point, the system may start to fail. Observe the last change you
made to the system. If possible, roll back that change (eg uninstall driver,
decrease OC setting, etc) and check that the system returns to stability. Be
aware that not all changes are "idempotent" - in other words, they might be
one-way: even uninstalling the change won't return the system back to a
working state. If that's what you encounter, you may need to repeat the
whole loop, stopping at the point just short of where problems appeared last
time round.

This approach is empirical, and draws on the traditions of root cause
analysis (in the precise engineering sense, not the loose vernacular sense
of "root cause").

The second approach is to be analytically diagnostic: get a memory dump of
the crash, and analyse it.

For this, you need to install the Windows Debugging Tools. You can download
these from here:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx

You'll also need a a symbol path, so WinDBG can find the debug symbols from
Microsoft's public symbol server. In Contol Panel, System, Advanced System
Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
initial underscore). Assign it the value of
"srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
debuggers on your system to download symbols from
http://msdl.microsoft.com/download/symbols, and store them in a directory on
your hard disk called "C:\Symbols". If you do a SET command at the prompt,
you should see this in the output:

_NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols

For more background on confuring the Debug Tools, see:
http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

If you're lucky, the system will still have the mini-dumps from your
previous crashes. These are stored in a location like
"C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
the "90FE" part of the path will vary, for each different crash. Note that
AppData is normally a hidden directory.

If you have no minidump.mdmp files currently on your system, go to Control
Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
configure a specific location for your memory dumps in the "System Failure"
box (eg C:\Dumps, or similar).

A full debug of a memory dump is a complex task, which requires extensive
specialised knowledge. Fortunately, some of this knowledge has been
automated in WinDBG's "analyze" command.
- Run WinDBG from the Windows Debugging Tools in Start menu;
- go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
memory.mdp files on your machine.
- when the dump file is opened, WinDBG will display a message similar to the
following (you'll have a different exception code):


Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File
[C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are
available

Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
System Uptime: 0 days 18:45:04.965
Process Uptime: 0 days 0:00:07.000
Symbol search path is:
srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
.............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a84.1014): Access violation - code c0000005 (first/second chance not
available)
00000000`002e4a30 c3 ret


Now, at the command line at the bottom of the WinDBG window, enter the
command "!analyze -v". That's an exclamation mark, followed by the word
"analyze" spelt in the American fashion with a "z", then a space, and a
dash, and a lower-case v.

WinDBG will chugg away for a minute or two - you will also see some network
activity, as it downloads the debug symbols from the symbol server. It will
then display a diagnostic report, making a reasonable guess at the faulty
module. To give a big headstart to any troubleshooting, include this report
in any problem reports to Nvidia, Microsoft, newsgroup forums etc.

Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
driver. That's to say, the DirectX graphics have 2 main components - a bunch
of functionality which is common to all drivers from all vendors, and so is
written one time for everyone by Microsoft (that's dxgkrnl.sys); and a
vendor-supplied driver, which contains the functionality specific to each
vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
drivers, there is an unusually close symbiosis of the Microsoft-supplied and
vendor-supplied drivers - so a crash in one, can easily be caused by a
problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
interesting, but ... this is some of the most heavily exercised code out
there. Every Vista machine is hammering this driver all day, every day.
There could certainly be many as-yet undiscovered bugs in this driver! But
if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
some fairly unusual condition on your machine which is exposing the bug,
when it does seem to occur with anything like the same frquency on most
other machines. Isolating that unusual condition may also provide you with a
workaround solution, even if there isn't a hotfix (yet) for the bug.

PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
more likely to be a buggy driver, passing bad data as it makes the
transition from User Mode to Kernel Mode (graphics drivers especially prone
to this). That's the "System Service" referred to in the STOP message (ie,
not a "service" as in a process controlled by the Windows Service Manager,
but a "service call" by the operating system, to request a kernel function).

Obviously there's a lot of work here .. but if there's no ready-to-go answer
to your problem, this is the way I'd tackle it. Other folks may have
additional ideas.

Good luck,
 
G

Guest

Alright, I'll give that a shot tomorrow. I also just installed
http://support.microsoft.com/kb/940105.

I will note though, this *is* a vanilla install of Vista. So far the only
things on it are the updates which got it to stop crashing in the first place
on bootup, and Source/Halo2/AVG. All settings are at defaults. No additional
programs have been installed beyond that. This is a virgin system, as far as
hardware problems go, all brand new parts except for the RAM, which will be
replaced come Tuesday. I will give the debug utility a shot though.

I always go through the methodical "change one thing, test, change another
thing, test" procedure, which is why this one is frustrating me so. I can't
reliably replicate the error, it just appears at random. Nothing in the
Application, Security, or other Event logs shows me the problem. I'll keep
you posted, however. Since there's no real documented method of fixing this
that is easily found, I'll keep a log of what I do and when we do get it
nailed, post it as an aid to others.

Andrew McLaren said:
Michichael said:
Anyway, little update. Got another PAGE_FAULT_IN_NONPAGED_AREA from
dxgkrnl.sys, tried calling Microsoft. The rep effectively repeated a
mantra
This time it was trying to play Counterstrike: Source.
So this isn't just limited to Halo... BSOD's so far today (got on it 3
hours
ago...): 6.

I guess you're running latest ForceWare drivers (163.69 as of today). I'm
also assuming you've trawled the knowledgebase at Nvidia, read their online
help files, etc and not found any solutions yet. So we'll go straight to the
heavy-duty troubleshooting. You can take 2 approaches. They're not entirely
mutually exclusive.

Firstly, take the system back to a minimal config baseline that works. From
there, you incrementally add or adjust configuration items, one by one, to
bring the system back in line with its present configuration. After each
adjustment, exercise the system in a way which should reproduce the error
(ie play Halo2, I guess). The point at which you start to see the problem
re-appear, will give you a good clue where the cause of the problem may lie.

Here's what you'd do, in a serious industrial setting. Since it's a home
machine, you might choose to be a bit less disciplined; although each
departure diminishes the fidelity of the exercise (and possibly, turns it
into a waste of time if you get too cavalier). Much patience is required.

- first, back up your user data

- remove as many peripheral devices as you can - printers, cameras, sound
cards, scanners etc. We want CPU, memory, graphics card, and one hard disk;
that's all

- if the machine has been overclocked, take *everything* back to the default
factory settings - CPU, buss, graphic processor, the lot.

- re-install Vista from scratch, from original media, reformatting the hard
disk and avoiding any third party drivers during the installation process
(only use the Microsoft-supplied drivers).

- you now have a very plain, vanilla installation of Vista. Performance
might be less than what you'd like; but our goal here is stability, not
performance! (not yet, anyway).

- install Halo2, as the tool with which to exercise the system.

- reproduce the problem scenario; eg, play Halo2 for >30 minutes and verify
that it does not crash (this is the painful part: you need to play games for
at least 30 minutes :)

If it still crashes in this very vanilla environment, then you have a
fundamental hardware problem with your machine. It needs to be examined by a
skilled computer technician. I mean someone with a certificate in
electronics engineering, or similar, who can use an oscilliscope, logic
probes etc - not just a PC enthusiast who reads Maximum PC (fine publication
that it is).

Assuming that Halo2 does run okay, start changing your config back to how it
was. It is very important you only change one thing at a time, and then test
after each change. For example:
- confirm the new, clean install of Vista runs okay.
- then run Windows Update to patch your machine to the current revision
level. Test again
- run DXDiag and export a report of your settings ("Save all Information"),
for future reference and comparison.
- next, install the current Nvidia-supplied Forceware drivers. Test again.
- next, re-attach your peripheral devices, one by one. Exercise the system
in between each, to verify the system continues to run normally. You might
need to spread this over a few days.
- install any additional vendor-supplied drivers for your various devices.
Test system again.
- install your normal user applications eg Office, Photoshop, etc. Avoid
installing any apps which install kernel-mode drivers; we want to stick to
user mode stuff, for now.
- exercise the system. This is a fairly good baseline: a plain installation
of Vista, Nvidia-supplied graphics drivers and general user apps. Hopefully
the system is still stable, at this point.
- now install any apps which include kernel mode drivers. Test the system
again.
- assuming you want to return to an overclocked configuration, you can start
overclocking again, now. But, don't leap straight to the maximum overclock -
just ramp up the CPU a little bit, and then test. Then increase a little bit
more, and test the system. Then change your memory timing settings, if
that's what you wish ... but again, don't go straight to an aggressive
setting, just moderate - and test the system again.

At some point, the system may start to fail. Observe the last change you
made to the system. If possible, roll back that change (eg uninstall driver,
decrease OC setting, etc) and check that the system returns to stability. Be
aware that not all changes are "idempotent" - in other words, they might be
one-way: even uninstalling the change won't return the system back to a
working state. If that's what you encounter, you may need to repeat the
whole loop, stopping at the point just short of where problems appeared last
time round.

This approach is empirical, and draws on the traditions of root cause
analysis (in the precise engineering sense, not the loose vernacular sense
of "root cause").

The second approach is to be analytically diagnostic: get a memory dump of
the crash, and analyse it.

For this, you need to install the Windows Debugging Tools. You can download
these from here:
http://www.microsoft.com/whdc/devtools/debugging/default.mspx

You'll also need a a symbol path, so WinDBG can find the debug symbols from
Microsoft's public symbol server. In Contol Panel, System, Advanced System
Settings, define an environment varible called "_NT_SYMBOL_PATH" (with an
initial underscore). Assign it the value of
"srv*C:\Symbols*http://msdl.microsoft.com/download/symbols". This will tell
debuggers on your system to download symbols from
http://msdl.microsoft.com/download/symbols, and store them in a directory on
your hard disk called "C:\Symbols". If you do a SET command at the prompt,
you should see this in the output:

_NT_SYMBOL_PATH=srv*C:\Symbols*http://msdl.microsoft.com/download/symbols

For more background on confuring the Debug Tools, see:
http://www.microsoft.com/whdc/devtools/debugging/debugstart.mspx

If you're lucky, the system will still have the mini-dumps from your
previous crashes. These are stored in a location like
"C:\Users\<username>\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp", where
the "90FE" part of the path will vary, for each different crash. Note that
AppData is normally a hidden directory.

If you have no minidump.mdmp files currently on your system, go to Control
Panel, System, Advanced System Settings, Startup and Recovery, Settings, and
configure a specific location for your memory dumps in the "System Failure"
box (eg C:\Dumps, or similar).

A full debug of a memory dump is a complex task, which requires extensive
specialised knowledge. Fortunately, some of this knowledge has been
automated in WinDBG's "analyze" command.
- Run WinDBG from the Windows Debugging Tools in Start menu;
- go to File menu, Open Crash Dump, to open one of the minidump.mdmp or
memory.mdp files on your machine.
- when the dump file is opened, WinDBG will display a message similar to the
following (you'll have a different exception code):


Microsoft (R) Windows Debugger Version 6.6.0007.5
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File
[C:\Users\Someguy\AppData\Local\Temp\WER90FE.tmp\minidump.mdmp]
User Mini Dump File: Only registers, stack and portions of memory are
available

Windows Vista Version 6000 MP (2 procs) Free x64
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Aug 5 15:55:28.000 2007 (GMT+10)
System Uptime: 0 days 18:45:04.965
Process Uptime: 0 days 0:00:07.000
Symbol search path is:
srv*C:\Symbols*http://msdl.microsoft.com/download/symbols
Executable search path is:
............
This dump file has an exception of interest stored in it.
The stored exception information can be accessed via .ecxr.
(a84.1014): Access violation - code c0000005 (first/second chance not
available)
00000000`002e4a30 c3 ret


Now, at the command line at the bottom of the WinDBG window, enter the
command "!analyze -v". That's an exclamation mark, followed by the word
"analyze" spelt in the American fashion with a "z", then a space, and a
dash, and a lower-case v.

WinDBG will chugg away for a minute or two - you will also see some network
activity, as it downloads the debug symbols from the symbol server. It will
then display a diagnostic report, making a reasonable guess at the faulty
module. To give a big headstart to any troubleshooting, include this report
in any problem reports to Nvidia, Microsoft, newsgroup forums etc.

Dxgkrnl.sys, which is referred to in several of your crashes, is a miniport
driver. That's to say, the DirectX graphics have 2 main components - a bunch
of functionality which is common to all drivers from all vendors, and so is
written one time for everyone by Microsoft (that's dxgkrnl.sys); and a
vendor-supplied driver, which contains the functionality specific to each
vendor's hardware (for Nvidia, this will be nvlddmkm.sys). Like all miniport
drivers, there is an unusually close symbiosis of the Microsoft-supplied and
vendor-supplied drivers - so a crash in one, can easily be caused by a
problem in the other. The fact you're seeing crashes in dxgkrnl.sys is
interesting, but ... this is some of the most heavily exercised code out
there. Every Vista machine is hammering this driver all day, every day.
There could certainly be many as-yet undiscovered bugs in this driver! But
if the cuase of the crashes is a bug in dxgkrnl.sys, there would need to be
some fairly unusual condition on your machine which is exposing the bug,
when it does seem to occur with anything like the same frquency on most
other machines. Isolating that unusual condition may also provide you with a
workaround solution, even if there isn't a hotfix (yet) for the bug.

PAGE_FAULT_IN_NONPAGED_AREA can often be caused by faulty hardware. But
since you're seeing a combination of STOP 0x50 and STOP 0x3B, I think it's
more likely to be a buggy driver, passing bad data as it makes the
transition from User Mode to Kernel Mode (graphics drivers especially prone
to this). That's the "System Service" referred to in the STOP message (ie,
not a "service" as in a process controlled by the Windows Service Manager,
but a "service call" by the operating system, to request a kernel function).

Obviously there's a lot of work here .. but if there's no ready-to-go answer
to your problem, this is the way I'd tackle it. Other folks may have
additional ideas.

Good luck,
 
A

Andrew McLaren

Michichael said:
Alright, I'll give that a shot tomorrow. I also just installed
http://support.microsoft.com/kb/940105.

I will note though, this *is* a vanilla install of Vista.

Are you running with the Microsoft-supplied video drivers, or Nvidia
Forceware drivers?

I'd start with vanilla-vanilla ... so vanilla, you can taste it! :) So
vanilla, Homer Hudson knocks on your door and asks to buy your PC. As I
say: if you have a truly plain, out-of-the-box installation of Vista, with
no 3rd party drivers or updates, and the system is routinely crashing, then
I'd think you have a hardware problem.

I've worked on many cases where the customer said "No no, everything is
completely plain". Then I found a 3rd party driver (or the like) in the
dump's call stack. When I confront the customer about it, they say "Oh but
*that* driver doesn't matter, it never causes any problems". Uh-huh ... so,
what's it doing in the memory dump, then??? Basically I just don't trust
anyone, anymore (sad ... but true).

Let us know how it goes.
 
D

dennis@home

Seven said:
Yeah moron.
Most gamers who know their memory timings don't know you can OC the GPU !
And if he is monitoring his GPU temps, I'll wager he understands.

I'll wager that the OP has forgotten that different programs use different
features in the GPU and just because it will run one program indefinitely
while over clocked doesn't mean it will run another even at the same
temperature.
You run stuff out of spec and you accept that odd things may, no make that,
will happen.
No manufacturer of hardware or software is going to be interested in fixing
problems caused by users using stuff out of spec with the possible exception
of a few PC box shifters who will do anything for a bit of cash.
 
D

dennis@home

Michichael said:
Hard to do when you have a full time job as a Systems Administrator and
your
hours are the exact same as theirs! =P

But yeah I'm getting sick of Vista really fast. Can't even get my music to
play from my speakers, and voice from my headset in Halo 2. But that's
going
in another thread in a few moments. Had another
PAGE_FAULT_IN_NONPAGED_AREA
error, as well as a SYSTEM_SERVICE_EXCEPTION when I EXITED counterstrike
source.... Hmmm... -.-;

Now my GPU is reading at most 81C at really intensive scenes of Halo 2, so
I
was worried it'd be a heat issue, but it just doesn't seem to click, as
I've
played it at upwards of 100C without any issues. The card is just designed
for higher heat.

The card may have bigger fans and heat sinks but there is still a limit on
how hot any bit of the *silicon* can get and still perform as the software
expects.
Measuring heatsink temps is at best a guide to the total power in the chip
and not to that in any particular bit of the chip so local tempretures can
be much higher than what you measure. These can cause the hardware to do
things the software can't cope with (do ATI still have the reset GPU catch
all in their drivers? Their drivers failed so often they had a watchdog
reset the GPU if it timed out.). Even a minor change in a bit of driver code
can effect the heat distribution on a GPU. If you have it working the way
you like it then don't change things as there is a good chance something
will change, probably for the worst. Even updating the graphics drivers
causes overclockers problems without changing the whole OS.
Since both the mobo and the card are from XFX, who's tech
support is great, I'm going to grill their tech support team about the
issues.

As its probably a hardware problem that would be a good idea.
As they supply cards intended to be operated out of spec they may have a
returns policy so you can get a new bit of hardware that behaves in a
different way when out of spec.. if you are lucky this will work for you but
I doubt if they gurantee it. Also its a good chance its the RAM causing the
problem so you need to run some intensive RAM tests to eliminate that too
(that will take days to do properly BTW).

Specially since Microsoft would rather make you pay for support in
getting it to work =P

Unless you get the problem with all the hardware running in spec there isn't
much point in talking to M$.. they have no way to fix it.
They certainly don't test windows with all games running on all out of spec
systems.
 
G

Guest

I will note I'm using the defaults for everything. 0 overclock involved. Why
overclock an already unstable system?
 
G

Guest

Well we've narrowed it down to a driver issue. Here's where we stand:

I did about 4 hours of extensive testing in a 64 bit XP installation. No
issues with any game.

Vista, about 2 hours into Halo 2 or Bioshock, or even Counterstrike, the
dxgkrnl.sys error pops up:

0x0000007E (0XFFFFFFFFC0000005, 0XFFFFF98004BDB81A, 0XFFFFF9800D941838,
0XFFFFF9800D941210)

dxgkrnl.sys FFFFF98004BDB81A base at FFFFF98004B22000, Datestamp 46a9480b.

That's an example of one of them.

I've contacted Microsoft and they're going to help me fix it.

At this point, I'm inclined to believe one of two issues: That DX10 is
inherently unstable, or the fact that my NVidia 8800 GTX is faulty in the
DX10 runtime, and I should RMA it. I'll keep you folks posted on the
resolution!
 
T

TOP POSTING iTARDS

=?Utf-8?B?TWljaGljaGFlbA==?= <[email protected]>
Whilst having a limited moment of clarity in an otherwise cloudy
existence the drunkard typed
Well we've narrowed it down to a driver issue. Here's where we stand:

I did about 4 hours of extensive testing in a 64 bit XP installation.
No issues with any game.

Vista, about 2 hours into Halo 2 or Bioshock, or even Counterstrike,
the dxgkrnl.sys error pops up:

0x0000007E (0XFFFFFFFFC0000005, 0XFFFFF98004BDB81A,
0XFFFFF9800D941838, 0XFFFFF9800D941210)

dxgkrnl.sys FFFFF98004BDB81A base at FFFFF98004B22000, Datestamp
46a9480b.

That's an example of one of them.

I've contacted Microsoft and they're going to help me fix it.

At this point, I'm inclined to believe one of two issues: That DX10 is
inherently unstable, or the fact that my NVidia 8800 GTX is faulty in
the DX10 runtime, and I should RMA it. I'll keep you folks posted on
the resolution!

<this is where the post goes>
 
A

Andrew McLaren

Michichael said:
Well we've narrowed it down to a driver issue. Here's where we stand:
I've contacted Microsoft and they're going to help me fix it.

Outstanding. Thanks for the update.

BTW (and just out of curiosity, not questioning your methodological rigour)
were you running dxgkrnl.sys in conjunction with the Forceware driver from
the Nvidia website, or with the plain, Microsoft-supplied driver?
At this point, I'm inclined to believe one of two issues: That DX10 is
inherently unstable,

I'm hoping DX10 is not "inherently" unstable. Maybe it just needs a bit of
real-world buffing, to knock the last of the rough edges off. If Microsoft
need to go and resdesign a whole graphics architecture, it'd be painful for
all of us.

Mind you ... in my world, computers are used for numerical analysis, complex
stochastic algorithms, and databases. Computers are *not* meant for playing
games, or looking at pictures! :)
DX10 runtime, and I should RMA it. I'll keep you folks posted on the
resolution!

I'll be interested to hear what happens ...
 
G

Guest

I had to do a complete wipe/reinstall of Vista. I was running it from the
base Windows Update provided drivers, no sound card, nothin.

I will also note, it's still possible it's the combination of old RAM and
DX10 requiring extensive resources. The new modules will be installed today,
so we'll see.

Also, I noticed something odd. The 8800 GTX has 768 MB of DDR3 RAM. However
dxdiag reports 1499 MB.

It does not show this under XP 64-bit, however. Odd.... I remember that you
can disable the page file under the XP OS, but can you also, if you have 4G
of RAM, disable it in Vista? Or does it still require it for memory dumping.
 
A

Andrew McLaren

Michichael said:
I had to do a complete wipe/reinstall of Vista. I was running it from the
base Windows Update provided drivers, no sound card, nothin.

Interesting. Hmmm. Okay, thanks - that's good to know.
Also, I noticed something odd. The 8800 GTX has 768 MB of DDR3 RAM.
However
dxdiag reports 1499 MB.

Prior to the Windows Display Driver Model (WDDM) introduced in Vista, old
display drivers could generally only use the real, dedicated memory on the
graphics card. In WDDM however, graphics memory is virtualised, just like
main system memory. For main system memory, the virtual backing store is the
page file. For graphics memory, the virtual backing store is system memory.
So DXDiag is correctly reporting the memory available to the WDDM driver,
which is the sum of the real, semiconductor memory on the graphics card,
plus a portion of the global RAM available to the system. On XP, only the
dedicated graphics memory on the graphics card is available to the graphics
driver. On Vista, the graphics driver can use all the dedicated graphics
memory plus, if necessary, a slice of the total system memory sitting on the
motherboard. So DXDiag on Vista reports a higher figure. DXDiag is reporting
correctlty, in both cases.

Note that the WDDM driver does not naively "steal" this RAM from your
system. There are quite elaborate algorithms to ensure fair sharing of
memory, balanced with best overall system performance. You can read all
about it in this paper from Microsoft:
http://download.microsoft.com/download/9/c/5/9c5b2167-8017-4bae-9fde-d599bac8184a/GraphicsMemory.doc
It does not show this under XP 64-bit, however. Odd.... I remember that
you
can disable the page file under the XP OS, but can you also, if you have
4G
of RAM, disable it in Vista? Or does it still require it for memory
dumping.

Well, to have a page file or no, is really a different discussion, not
relevant to your graphics memory. Except that, I guess, if the system was
runing a WDDM driver and was really under heavy load, it's possible that
both global system memory, *and* graphics memory, would end up swapping
pages of memory to disk. But the system doesn't page graphics memory to disk
under normal conditions - see the paper, for details of the allocation
mechanisms used here. Not easily summarised in a few words.

I was fortunate to have the chance to discuss the page file question with
Microsoft guys, at a couple of conferences, etc ... I mean, technical guys
such as Lou Perazzoli, Bruce Worthington and Adrian Marinescu - not just the
marketing clowns with great hair and khaki chinos. I doubt I could fully
reproduce the subtlety of their thinking! But the conclusions were always
clear and unanimous. NT (including NT, 2000, XP, 2003, Vista and Server
2008) was designed from the ground up as a virtual memory operating system.
The page file isn't just an after-thought, to work aound limited real
memory. For example, many of the copy-on-write algorithms used to load EXE
and DLL files when a process loads, will make intelligent use of the page
file, if it is there. They have performance-tuned the crap out of the page
replacement mechanisms for nearly 20 years now (starting in 1988). Despite
the urban myths which circulate in some PC enthusiast forums, NT does *not*
lose performance by naively swapping pages to disk, unnecessarily. Even if
you have 4GB of RAM and 32 bit Windows, the NT kernel can find useful uses
for the page file. Even if your committed memory never even approaches the
limit of your semiconductor memory, NT can find good uses for the page file.
If you're really worried about disk space, drop the file size to 256MB or
the like; but do not remove the page file altogether. That seems to be the
concensus of the cognescenti.

And as you note, a page file on the boot disk is required, if you want to
get a memory dump.
 
A

Andrew McLaren

***** Debugger could not find nt in module list, module list might be
Now I'm no engineer, but maybe, just maybe, this is pointing to a
problem with my RAM?

"Module" as used here, is referring to a Loadable Module - an EXE, DLL, SYS
etc file. The Stack in the dump is completely trashed, so either there was
stack corruption; or the memory hardware didn't preserve the right data and
it evaporated.

You'd probably need to try walking the stack in the dump to see if it looked
like it had been corrupted by something (ie something over-wrote the stack
area with data, apparently all 0s). That is certainly feasible for an
experienced debug user, but far beyond what can be achieved via newsgroup
support.

You can give your RAM a pretty good workout by using the built-in memory
test. Boot up from the Vist DVD and go to repair options. Hit memory test.
Somewhere in the memory dialogue, there's an "Advanced" option where you can
crank up the number of passes to pretty punishing levels. That might help
isolate any hardware issues. Of course there are also many 3rd party memory
test tools as well.

Apart from that ... it's probably up to the Microsoft guys to study your
dump, now.
 
D

dennis@home

Michichael said:
Yeah, after reading through it after another crash I noticed it
wasn't exactly working. I'm really starting to think this is either heat
or RAM related now, because with the X-Fi card OUT, my video card's
operating temperature jumped 10*C (It's idling at 65-70*C, load 80-85*C)
and I'm seeing more frequent crashes. I completely uninstalled
everything to d with the card, as well as many of the other
"Unconnected" devices in the device manager, just to rule them out.
Though I'm getting concerned with the fact that it's citing USBPORt.sys
as a cause now too.

Have you checked the fans are the correct way around?
I ask this because I was evaluating a very expensive server once and the
disks were getting >55C which is just too hot.
They sent the engineers out and eventually the designers as it was a brand
new design.
I was poking around inside with a temp probe while they were there and
noticed one of the fans was backwards.. the result not a lot of airflow even
though there were no fan alarms. It turns out that they were building them
all incorrectly so its just as well it was found early. It pays to check the
simple stuff first.
 
D

dennis@home

Michichael said:
Yup. The two 250mm fans are blowing cool air straight into the case from
the side, I have a 120mm fan in the front as an exhaust, I'm going to
flip it and make it a intake, so all exhaust would go through the
passive vents on the sides, as well as out the back of the case.

You have to be careful if you have exhaust and inlet fans to make sure the
air just doesn't flow straight in one and out the other by the shortest
path.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top