Multicode and scheduling

  • Thread starter =?ISO-8859-1?Q?Erik_Wikstr=F6m?=
  • Start date
?

=?ISO-8859-1?Q?Erik_Wikstr=F6m?=

I have a dual core processor and run Vista, when I run a single threaded
number-crunching application which is very CPU heavy I noticed a strange
behaviour. Vista is scheduling the application on both cores, resulting
in a 50% load on each core.

While I'm not an expert I would think that this behaviour is suboptimal,
at least from a cache perspective, if the app were to run on only one
core then the number of cache misses out to be reduced quite significantly.

As far as I know just about any other OS will try to schedule the same
app on the core that it was last run on if possible, does anyone have
any idea why this is not the case in Vista (and yes, I know I can set
the affinity in the task manager)?
 
C

Carey Frisch [MVP]

Dual Core Processing: Over-simplified, demystified and explained:
http://icrontic.com/articles/dual_core

Intel Dual-Core Processing Technology Explained:
http://www.intel.com/personal/computing/emea/eng/dualcore/demo/index.htm

--
Carey Frisch
Microsoft MVP
Windows Shell/User

----------------------------------------------------------------------

:

I have a dual core processor and run Vista, when I run a single threaded
number-crunching application which is very CPU heavy I noticed a strange
behaviour. Vista is scheduling the application on both cores, resulting
in a 50% load on each core.

While I'm not an expert I would think that this behaviour is suboptimal,
at least from a cache perspective, if the app were to run on only one
core then the number of cache misses out to be reduced quite significantly.

As far as I know just about any other OS will try to schedule the same
app on the core that it was last run on if possible, does anyone have
any idea why this is not the case in Vista (and yes, I know I can set
the affinity in the task manager)?
 
A

Andrew McLaren

Erik Wikström said:
I have a dual core processor and run Vista, when I run a single threaded
number-crunching application which is very CPU heavy I noticed a strange
behaviour. Vista is scheduling the application on both cores, resulting in
a 50% load on each core.
While I'm not an expert I would think that this behaviour is suboptimal,
at least from a cache perspective, if the app were to run on only one core
then the number of cache misses out to be reduced quite significantly.
As far as I know just about any other OS will try to schedule the same app
on the core that it was last run on if possible, does anyone have any idea
why this is not the case in Vista (and yes, I know I can set the affinity
in the task manager)?

Hi Erik,

Off the top of my head, I'm not sure what changes in Vista would result in a
different scheduling strategy, compared to previous Windows versions. There
was some work in the Vista kernel to add better support for NUMA machines,
and for multimedia applications; I dunno if those changes would play a role.

But overall, there are many variables which could affect the processor
affinity; it's a bit of a leap to say "This is a regression in Vista". You'd
probably need to do a bit of analysis to understand exactly why you get the
scheduling you're seeing (eg: run the app in VTune or similar profiler).

Applications can elect to have control over their own scheduling, by using
SetThreadAffinityMask() and related functions. If an app elects *not* to do
configure its own affinity, it is saying (in effect) "I don't care, I'll
just run with whatever mask the operating system provides". So perhaps it's
a case of "caveat emptor".

Even if the application doesn't programmatically control the processor
affinity, the machine's operator can set affinity when they launch the app.
The START command takes a optional parameter of "/AFFINITY <hex affinity
mask>". For example:

C:\>START /affinity 0x00000001 Notepad.exe

.... to make Notepad run on a single CPU. Obviously you can also put this
command in a batch file or Shortcut command line, for convenience.

So, no matter what scheduling Windows applies to the applicaton by default,
you are by no means constrained to accept this as inevitable. You can either
set the affinity in the source code (if you have it) or you can set the
affinity as the operator of the system, when you launch the application. As
you observed, you can also vary the affinity during runtime, using Task
Manager.

I *thought* that Microsoft was planning to release an updated WSRM (Windows
System Resource Manager) which would be compatible with Vista ... but when I
looked at microsoft.com just now I didn't see it anywhere. Maybe it has been
held over for the release of Windows Server 2008. But in general terms, WSRM
provides fairly sophisticated job control, as you'd typically use for
high-end enterprise and scientific/engineering applications. There are also
some cool tools in the Windows Compute Cluster
(http://www.microsoft.com/windowsserver2003/ccs/default.aspx); but Compute
Cluster is only useful for jobs with a high degree of parallelism, if you
app is single-threaded, it is not a candidate.

There may be more info available on processor scheduling when David Solomon
releases his updated MS Press "Inside Windows" book, for Vista - still a few
months away I believe.

I was going to make a bitchy remark about "how high-performance can this app
really be, if it is single-threaded?" but then decided that'd be silly :)

Hope it helps,
 
A

Andrew McLaren

Oh, a further thought about this ...

A process will not necessarily run on all available processors, just because
it's Processor Mask is configured for all processors. The Processor Mask and
Thread Mask merely set the "outer constraints" for thread scheduling. But
the Windows scheduler might keep a single thread running entirely on one
processor, even if the processor mask covers all processors.

To put that another way around - if the processor mask is set to restrict
execution to a single CPU, then the process will only ever run on a single
CPU. If you expand the mask to cover all CPUs, the thread might still only
run on a single CPU. But it has the *option* to shift across to other CPUs,
if that seems appropriate at the time.

So, don't assume your process is running on all available processors, just
because has affinity set to all available processors. For a single threaded
process, it is highly likely that most execution will indeed, be on a single
processor.

I think Windows sets the mask to "ALL" by default, so that the system can
freely move threads around if it appears warranted. But the scheduler is not
going to actively load-balance threads across CPUs, just because the
affinity mask says it can.

You can measure the extent of Cache misses by using PerfMon. There's no
single PerfMon counter to measure how long a thread spends on each available
CPU, but you could delineate the approximate metrics with some of the other
counters. Or use Intel's VTune, or AMD's CodeAnalyst, to track the actual
execution of your process on the CPU(s).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top