pjp said:
Getting odd problems once and awhile, all seem related.
Word will occasionally not allow me to save document insisting original is
read-only. I have to save under new name, delete original and rename update.
Can't delete/move/rename a file after a program has created it, Any Video
Converter usually. Reopen and close program and then file's ok.
Get very high but very sporadic DPC latency's. System seems fine after
reboot with random period before problem re-emerges. Seldom same sequence of
events leads to repetition.
Chkdsk occasionally needs boot option, e.g. Chkdsk /f.
Disks burn fine initially after reboot but errors occur seem related to DPC
at this time.
PC is loaded with things. Dual-head video and tv all connected, Hauppage TV
input, printer, onboard ethernet, two usb hubs (both self powered, one USB 1
other USB 2), webcam, wheel, flight stick and joypad all connected, a midi
usb piano, midi usb electronic drums, usb mic, three external hard drives,
multi-card reader (4 drives). Also has an add-on 5.1. sound card with on
board sound disabled in bios. Keyboard is PS2 mouse is USB.
Any ideas where to start and/or how to proceed
The problems don't seem related to me. The DPC one is the one that stands alone.
DPC is part of the response to interrupts. When there is a hardware interrupt,
the system runs at interrupt level while servicing it. A small portion of
the servicing is done at that interrupt level. To keep the computer responsive,
a DPC is scheduled to handle any of the "heavy lifting" required, as part of the
hardware interrupt servicing. So if an interrupt needed 1 millisecond of total
processing, perhaps 0.1 millisecond is spent at interrupt level, and the other
0.9 millisecond might be a DPC serviced at user level. Less time spent at
interrupt level, means another hardware interrupt coming in, sees a relatively
low latency to get serviced (as now, a new hardware interrupt will preempt a
DPC if needed, and get serviced).
DPCs sit in a queue. The queue is checked and processed, at user level.
(Exactly how that works, I don't know, and I don't know what priority it
has when compared to user programs.)
Using a tool such as "DPC Latency" tool, you can check the difference between
arrival time in the queue, and when it's serviced. Normally, the queue service
time is relatively low. DPCs are serviced in a timely manner, and the
latency is in the hundreds of microseconds.
If you see "DPC latency spikes", it implies something is causing the system
to run at interrupt level, and activities at user level aren't getting any
processor time. And that isn't good for any software, that has real time
processing requirements (like multimedia movie playback or sound recording).
An example of a "normal" spike, is when a 3D game is entering 3D mode. That
seems to make the system unresponsive for a significant time, until 3D starts
rendering.
In the case of a few Gigabyte motherboards, the spikes in the DPC Latency tool
graphs, are caused by SMM code in the BIOS. The BIOS is able to interrupt the
OS at regular intervals (many times a second) and the SMM code mechanism
allows the OS to be completely pre-empted. (The OS cannot even tell it is
happening. There is no log.) Normally, the SMM code has a short runtime, and
then the computer performance isn't compromised. But if the SMM code takes too
long, some regular latency spikes will be seen in the DPC Service Latency graph.
A BIOS update will fix that, in cases where Gigabyte has been alerted to the fact,
and figured it out. Sometimes, the SMM code is used to configure the VCore voltage
converter, and turn on or turn off converter phases as required (as part of some
cheesy "green" power conversion strategy). Asus also does VCore converters like
that (dynamic phases, with some phases being turned off during moments when the
system is idle).
You could use the Performance plugin, and look for an interrupt counter, and
see if the hardware is generating a high rate of interrupts.
You could use Process Explorer from Sysinternals, which has an entry for DPC
activity. The number of DPCs per second, should have some relationship to the
interrupt count in the Performance plugin.
With tools in place like that, including the DPC Latency tool, remove some
hardware and retest.
Just going from my poor memory, I might see hundreds of interrupts a second
on an "idle" system. If I alt-tab out of a 3D game, the video card continues
to run in the background, and I might see two thousand interrupts per second.
And if I use a cheap GbE LAN card and do link rate testing, I've seen as
high as around 20,000 interrupts per second. And the computer was still able
to operate when that was going on. 20,000 is too high a number for that
activity, and there are too many interrupts per packet on that card. An
Intel LAN chip by comparison, only had a fraction of that level of interrupt
activity, when running at 117MB/sec packet transfer rate. The level of interrupts
is high enough, that the max link rate you can transmit/receive with the
cheap LAN card, is CPU limited, and would require a 4GHz Core2 to run flat out.
Which is ridiculous, as a design. Anyway, that's to give some idea of the spectrum
of values you might see. Interrupts and DPCs should never really drop to
zero, because there is always a small amount of regular interrupt activity.
By removing hardware, you might be able to isolate a high interrupt issue.
But identifying "DPC Latency spikes", is much more difficult, because
you have absolutely no control over SMM (short of changing BIOS versions
and praying something good happens).
*******
Your other symptoms are pretty strange. I don't see anything wrong with
testing memory, and it's a good suggestion even on an otherwise working
computer.
I've seen some pretty strange things here, when it comes to
RAM errors, such as a RAM problem popping up out of the blue after
running VirtualBox. (Moving RAM sticks to alternate slots, setting
command latency from 1 to 2, no adjustment to voltage, and it was
all fixed again. Very strange. The RAM had been stable and error free,
for a year or a year and a half before that. The memory was vetted
with memtest86+, the errors were visible after the VirtualBox runs,
and the errors disappeared after the slight tweaks. Since memtest86+
boots the computer, VirtualBox or its drivers cannot be running
at that time.) The reason I was running a RAM test, is I was
having problems installing a guest OS in VirtualBox, and out of
frustration, checked memory, and was shocked to see errors on
what is normally, rock solid memory.
In terms of RAM test coverage, no memory program can test the BIOS
reserved area. To fix that, and to make it possible to isolate errors
to a single stick, requires running in a special configuration. If
you have a dual channel RAM motherboard, you arrange two memory sticks
in single channel configuration (i.e. two sticks sitting on the one
channel, none on the other channel). That means one of the two
sticks will provide the BIOS storage area, while the second stick can
be fully tested. Then, if you shut down, and rotate those two sticks
in their slots, the stick that was doing the BIOS storage, gets
moved to high memory, where it's fully exposed to memtest86+. By doing
two memtest86+ runs in single channel mode, with two sticks of RAM,
it's possible to completely test both sticks. If errors show up,
then due to the usage of single channel, the addresses shown can be
easily correlated with a particular stick of the two. (No tricky
address calculation, and guessing which stick it might be.) If you
own four sticks of memory, this means there will be testing as
two groups of two tests each, for a total of four memtest86+ runs,
to cover completely the four sticks.
You would think, that if the RAM was bad, and bad data was being
written into the file system, the computer would be bricked in no
time at all. Or a chkdsk run, would be showing "spaghetti" if
something like that was going on. (Using chkdsk to "repair errors",
in a system with flaky hardware, can absolutely ruin a file system.)
If you're experiencing some things going "read-only" on you, that's
too specific for a simple explanation, at least to my way of thinking.
Something like that, requires more intelligence, a more specific
interference of some sort (like a software issue). I tried Googling
on that, but didn't see any good candidate matches for a cause.
Paul