Explain this about threads

J

Jon Slaughter

"Instead of just waiting for its time slice to expire, a thread can block
each time it initiates a time-consuming activity in another thread until the
activity finishes. This is better than spinning in a polling loop waiting
for completion because it allows other threads to run sooner than they would
if the system had to rely solely on expiration of a time slice to turn its
attention to some other thread."



I don't get the "a thread can block each time...". What does it mean by
blocking? Does it mean that if thread B needs something from thread A that
thread A stops thread B from running until its finished but not interfer
with some other thread C?

Thanks,

Jon
 
P

Peter Duniho

Jon said:
"Instead of just waiting for its time slice to expire, a thread can block
each time it initiates a time-consuming activity in another thread until the
activity finishes. This is better than spinning in a polling loop waiting
for completion because it allows other threads to run sooner than they would
if the system had to rely solely on expiration of a time slice to turn its
attention to some other thread."

Without the context of the quote, it's hard to know for sure the details
of the scenario being discussed in the quote. However...

Generally speaking, a thread is runnable or not. If it is not, it will
"block". There are a variety of things that can cause a thread to
block, but they generally fall into two categories: waiting for some
resource; and waiting explicitly (i.e. calling Sleep()).

A thread that becomes unrunnable will immediately yield its current
timeslice, allowing some other runnable thread to start executing. If a
thread doesn't become unrunnable, either because it explicitly sleeps or
because it makes some function call that involves having to wait on some
resource (for example, a synchronization object or some sort of i/o), it
will continue to execute for as long as its timeslice.

So, in the quote, they appear to be explaining that polling a resource
is much worse than allowing the operating system to block the thread
until the resource is available, because polling causes that thread to
consume its entire timeslice, rather than allowing other threads to run
during that time.

Assuming the other threads are of the same priority, they will
eventually get some time. It's just that you can waste a lot of time
executing a thread that uses up its entire timeslice without
accomplishing any actual work. This is especially true if the other
threads are better behaved and use blocking techniques to deal with
resource acquisition, since those threads _won't_ generally use their
entire timeslice. This results in the one thread that's not actually
doing anything useful winding up getting the lion's share of the CPU
time, which is exactly the opposite of what you normally would want.

So, with that background, your specific question:
I don't get the "a thread can block each time...". What does it mean by
blocking? Does it mean that if thread B needs something from thread A that
thread A stops thread B from running until its finished but not interfer
with some other thread C?

Thread A doesn't stop thread B explicitly, no. But assuming thread B
uses a blocking technique to wait on a resource being held by thread A,
thread B would be blocked by thread A implicitly. And importantly,
thread B would not be run at all until the resource was released by
thread A.

This means that as long as thread B is blocked, thread A and thread C
can share the CPU without having thread B using any CPU time. Assuming
you just have the three threads, then basically thread A and thread C
get their usual share of the CPU, plus they both get to use the time
thread B otherwise would have used.

For example, let's say the timeslice is one second. Then without any
blocking and with each thread consuming its entire timeslice, in a
three-second period, each thread would run for one continuous second.

Now, if thread B needs something thread A has, it can either poll for
the resource, continuing to use one continuous second for each three
second period, or it can block. If it uses some blocking mechanism,
then in a single three second period, thread B will use ZERO CPU time,
while thread's A and C will use, on average 1.5 seconds (but in reality,
for any given three second period, one of those threads will get two one
second timeslices, while the other will get one; over a six second
period, both threads will each get three one second timeslices though).

Of course, if thread A has to block as well, then thread C gets even
more CPU time, since it's the only one runnable.

This is obviously an oversimplification: timeslices aren't ever nearly
as long as one second on Windows, you never have only three threads, and
the above completely ignores the overhead of context switching between
threads. But it does illustrate the basic point.

The bottom line here is that polling is bad. Really bad. It takes the
one thread that actually has no work to do, and causes it to use the
most CPU time out of any thread running on the system. Polling is
almost always counter-productive. It's almost never the right way to
solve a problem.

Blocking, on the other hand, is a very nice way to solve a problem. The
operating system almost always has some mechanism for allowing a thread
to sit in an unrunnable state until whatever resource it actually needs
is available. After all, the OS is in charge of those resources, so it
naturally knows when they become available. So by using a blocking
technique, a thread that is cannot do any useful work with the CPU is
never allowed to use the CPU, which allows for much more efficient use
of the CPU and much greater net throughput.

As with everything, there are exceptions to the general rule. In very
specific situations, a "spin wait" can improve performance. That is, if
you know that a particular resource will for sure become available
within some period of time less than your timeslice, it can be better to
spin wait for it, because if the thread blocks it could be quite a while
before it gets a chance to run again.

Once the resource it's waiting on becomes available, it still has to
wait its turn in the round-robin thread scheduling to get CPU time
again. In addition, there is of course the overhead in switching
between threads. So if you need a particular thread to be very
responsive AND (and this is very important) you know for sure it won't
have to wait longer than the timeslice, spinning can work well.

On this last point: if the thread will have to wait longer than the
timeslice for the resource, then spinning doesn't do any good at all.
The thread _will_ be preempted; there is no way to prevent that. So all
that spinning in that case does is waste CPU time that could be used by
all the other threads.

The spinning thread will still get interrupted, and put at the end of
the round-robin list so that it has to wait for all the other threads to
get their chance to execute. In fact, because when other threads are
kept from running longer, they may wind up being able to do more work
when they finally do get to run, and because the spinning thread itself
just wasted a bunch of time doing nothing, the net result of having a
thread spin wait like that often is much _reduced_ performance even for
the spinning thread, never mind the issue of overall system throughput I
mentioned above.

So, even in these very specific scenarios where a spin wait may help,
you have to be very careful. If you aren't an expert in managing thread
scheduling, you can easily make your program a lot worse by attempting a
technique like that.

Pete
 
R

Rick Lones

Jon said:
"Instead of just waiting for its time slice to expire, a thread can block
each time it initiates a time-consuming activity in another thread until the
activity finishes. This is better than spinning in a polling loop waiting
for completion because it allows other threads to run sooner than they would
if the system had to rely solely on expiration of a time slice to turn its
attention to some other thread."



I don't get the "a thread can block each time...". What does it mean by
blocking? Does it mean that if thread B needs something from thread A that
thread A stops thread B from running until its finished but not interfer
with some other thread C?
More like thread B stops itself from running until A has made some resource
available or raised some other kind of event. There is usually no reason to sit
and spin in the meantime, so normally the remainder of B's time slice would be
yielded to the scheduler in hopes that some other task has a use for the CPU
resource. B is now said blocked to be "blocked" on some event and will not be
rescheduled until the event occurs. It's less exotic than it may sound - if A
is the operating system, e.g., this happens every time your task B does a
synchronous wait for any OS resource, for example Console.Readline().

HTH,
-rick-
 
W

Willy Denoyette [MVP]

Jon Slaughter said:
"Instead of just waiting for its time slice to expire, a thread can block
each time it initiates a time-consuming activity in another thread until
the activity finishes. This is better than spinning in a polling loop
waiting for completion because it allows other threads to run sooner than
they would if the system had to rely solely on expiration of a time slice
to turn its attention to some other thread."



I don't get the "a thread can block each time...". What does it mean by
blocking? Does it mean that if thread B needs something from thread A that
thread A stops thread B from running until its finished but not interfer
with some other thread C?

Thanks,

Jon


Adding to what others have said in this thread:
1) You should never SpinWait on a single processor machine, doing so
prevents other threads in the system to make progress( unless this is
exactly what you are looking for).
Waiting for an event (whatever) from another thread in a SpinWait loop,
prevents the other thread to signal the event, so basically you are wasting
CPU cycles for nothing.
2) Define the count as such that you spin for less than the time needed to
perform a transition to the kernel and back, when waiting for an event.
Spinning for a longer period is just a waste of CPU cycles, you better give
up your quantum by calling Sleep(1) or PInvoke the Kernel32
"SwitchToThread" API in that case.

Willy.
 
J

Jon Slaughter

Rick Lones said:
More like thread B stops itself from running until A has made some
resource available or raised some other kind of event. There is usually
no reason to sit and spin in the meantime, so normally the remainder of
B's time slice would be yielded to the scheduler in hopes that some other
task has a use for the CPU resource. B is now said blocked to be
"blocked" on some event and will not be rescheduled until the event
occurs. It's less exotic than it may sound - if A is the operating
system, e.g., this happens every time your task B does a synchronous wait
for any OS resource, for example Console.Readline().


Ok, but I don't understand the "blocked on some event and will not be
rescheduled until the event occurs". How does that happen? I can't picture
how the machinery is setup to do this.

If we use your example of ReadLine() then essentially what happens is
eventually the call works its way down to a hardware driver. But this
requires a task switch somewhere to go from user mode to kernel mode. Now
wouldn't the scheduler try and "revive" the user mode code that was tasked
switched because it wouldn't know that its waiting for the kernel mode code
to finish?

Or is there some information in a task switch like a lock or something that
tells the scheduler not to revice a process and the kernel mode code would
determine that.

Hence I suppose in the task switch state one might have something like
"IsBlocked". When the kernel mode is entered, if its asynchronous is might
set IsBlocked momentarily but then release it. If its synchronous then it
would set IsBlocked until it is completely finished.

Am I way off? ;) Its just hard for me to understand how the internal
machinery accomplishes this. Maybe is as simple as a lock though?

Thanks,
Jon
 
J

Jon Slaughter

Peter Duniho said:
Without the context of the quote, it's hard to know for sure the details
of the scenario being discussed in the quote. However...

Generally speaking, a thread is runnable or not. If it is not, it will
"block". There are a variety of things that can cause a thread to block,
but they generally fall into two categories: waiting for some resource;
and waiting explicitly (i.e. calling Sleep()).

A thread that becomes unrunnable will immediately yield its current
timeslice, allowing some other runnable thread to start executing. If a
thread doesn't become unrunnable, either because it explicitly sleeps or
because it makes some function call that involves having to wait on some
resource (for example, a synchronization object or some sort of i/o), it
will continue to execute for as long as its timeslice.

So, in the quote, they appear to be explaining that polling a resource is
much worse than allowing the operating system to block the thread until
the resource is available, because polling causes that thread to consume
its entire timeslice, rather than allowing other threads to run during
that time.

Assuming the other threads are of the same priority, they will eventually
get some time. It's just that you can waste a lot of time executing a
thread that uses up its entire timeslice without accomplishing any actual
work. This is especially true if the other threads are better behaved and
use blocking techniques to deal with resource acquisition, since those
threads _won't_ generally use their entire timeslice. This results in the
one thread that's not actually doing anything useful winding up getting
the lion's share of the CPU time, which is exactly the opposite of what
you normally would want.

So blocking is something done on the thread side? I dont' really get it. If
I've created a thread how can I "block" while I wait for a resource to come
availiable?

Your essentially saying its better to block than

while(resource_unavailable)
Thread.Sleep(10)

or somethign like that... but then how can one actually block? Is it special
API routines?

It seems that blocking would have to occur by the code, such as a driver,
that actually can tell when the resource is ready. So there really couldn't
be self blocking? (seems like it could never wake up)

It sounds if its like a lock that the scheduler uses to know when to put
back a thread into a queue? The thread that controls the resource is one
that controls the lock? In this way, say, if I'm some driver with resource X
and thread Q requests X but X is not ready I can set a lock on X w.r.t to Q
and the schedular will block Q.

Basically you are using the term blocking in a self-referential way like "I
can either block myself or not"... but I don't understand how that could
work.
So, with that background, your specific question:


Thread A doesn't stop thread B explicitly, no. But assuming thread B uses
a blocking technique to wait on a resource being held by thread A, thread
B would be blocked by thread A implicitly. And importantly, thread B
would not be run at all until the resource was released by thread A.

Ok, I didn't realize there were such blocking techniques that a program
could use. I don't even know if I can comprehend how this could work.

How could thread B block itself and know when to "unblock" since it has no
idea when the resource will be available? The only thing that knows this is
the code that sits on top of the resource and it would seem that it would
have to do the blocking and unblocking.

Now if there is a way to set a block on a specific resource like

In Thread B:

Ask for resource,
Block until resource available.

But internally it would seem that the blocking realling occurs by the
controller of the resource and the scheduler actually deals with that?
This means that as long as thread B is blocked, thread A and thread C can
share the CPU without having thread B using any CPU time. Assuming you
just have the three threads, then basically thread A and thread C get
their usual share of the CPU, plus they both get to use the time thread B
otherwise would have used.

Yeah, I understand this. I don't understand how B can actually do any
blocking because the only way it could do this is to poll the resource until
its ready or hook some interrupt. The first case then isn't blocking and
the second isn't available to normal windows applications.

The way I see it, Thread B would have to ask to be blocked until resource is
available(well, essentially this would be implicit if its a synchronous
call).

So the flow might go like

In Thread B,

Ask for resource,
Please Block me until resource is ready
----

Thread A,
Thread B asked to be blocked until resource is ready,
Tell kernel to block B,
Wait until resource is ready(either through polling or interrupt)
Tell kernel to unblock B
return

For example, let's say the timeslice is one second. Then without any
blocking and with each thread consuming its entire timeslice, in a
three-second period, each thread would run for one continuous second.

Now, if thread B needs something thread A has, it can either poll for the
resource, continuing to use one continuous second for each three second
period, or it can block. If it uses some blocking mechanism, then in a
single three second period, thread B will use ZERO CPU time, while
thread's A and C will use, on average 1.5 seconds (but in reality, for any
given three second period, one of those threads will get two one second
timeslices, while the other will get one; over a six second period, both
threads will each get three one second timeslices though).

Of course, if thread A has to block as well, then thread C gets even more
CPU time, since it's the only one runnable.

This is obviously an oversimplification: timeslices aren't ever nearly as
long as one second on Windows, you never have only three threads, and the
above completely ignores the overhead of context switching between
threads. But it does illustrate the basic point.

The bottom line here is that polling is bad. Really bad. It takes the
one thread that actually has no work to do, and causes it to use the most
CPU time out of any thread running on the system. Polling is almost
always counter-productive. It's almost never the right way to solve a
problem.

Right, I realize that because for my application I am doing something like
polling.

I was wondering if somehow I coudl get around it by blocking but I am
unfamiliar with this.

In fact though I don't think I can get around polling the way I am doing it
because I'm working directly with hardware. Basically just using a proxy to
do the hardware. Not only do I have to poll but I also have to introduce
spin waits to slow down the process. This is bad but I think I have no
choice.

I just don't understand how one can do orderly communications in windows. In
dos, say, if you wanted to communicate at some desired rate you could hook
the timer interrupt and it will be called every click of the timer... You
could configure the timer for several frequencies. So if I wanted to
communicate with the parallel port in a timely fashion I could do this quite
easily by hooking the timer.

I could also use interrupts to get input when a resource is availible(such
as data on the parallel port).

But both of these methods are impossible to do in user mode code.... I'm
trying to read about kernel mode drivers and see if this is possible to do
in a driver(I know I can hook interrupts but not sure about the timming so I
still might have to introduce spin waits).


But if I could block for a very precise amount of time in a user mode
program I coudl simulate the timer interrupt. Essentially having some
Thread.Sleep but for high resolution. I know this is impossible on the PC
though but it would be nice.

(Actually what would be nice is to have a small seperate cpu that is
specifically designed for timed communications so one could load some code
there and it will always run at some specified rate and is independent of
the main cpu and os)
Blocking, on the other hand, is a very nice way to solve a problem. The
operating system almost always has some mechanism for allowing a thread to
sit in an unrunnable state until whatever resource it actually needs is
available. After all, the OS is in charge of those resources, so it
naturally knows when they become available. So by using a blocking
technique, a thread that is cannot do any useful work with the CPU is
never allowed to use the CPU, which allows for much more efficient use of
the CPU and much greater net throughput.

Ok, So I guess I was right in that blocking ultimately is part of the OS. I
still don't know what the blocking techniques are though but maybe they are
the synchronous calls?
As with everything, there are exceptions to the general rule. In very
specific situations, a "spin wait" can improve performance. That is, if
you know that a particular resource will for sure become available within
some period of time less than your timeslice, it can be better to spin
wait for it, because if the thread blocks it could be quite a while before
it gets a chance to run again.

Once the resource it's waiting on becomes available, it still has to wait
its turn in the round-robin thread scheduling to get CPU time again. In
addition, there is of course the overhead in switching between threads.
So if you need a particular thread to be very responsive AND (and this is
very important) you know for sure it won't have to wait longer than the
timeslice, spinning can work well.

On this last point: if the thread will have to wait longer than the
timeslice for the resource, then spinning doesn't do any good at all. The
thread _will_ be preempted; there is no way to prevent that. So all that
spinning in that case does is waste CPU time that could be used by all the
other threads.

The spinning thread will still get interrupted, and put at the end of the
round-robin list so that it has to wait for all the other threads to get
their chance to execute. In fact, because when other threads are kept
from running longer, they may wind up being able to do more work when they
finally do get to run, and because the spinning thread itself just wasted
a bunch of time doing nothing, the net result of having a thread spin wait
like that often is much _reduced_ performance even for the spinning
thread, never mind the issue of overall system throughput I mentioned
above.

So, even in these very specific scenarios where a spin wait may help, you
have to be very careful. If you aren't an expert in managing thread
scheduling, you can easily make your program a lot worse by attempting a
technique like that.

I thin kthe problem is though that in some cases spinning is the only way to
do something. For example, I have a kernel mode driver that I use to
communicate with the port. It emulates in and out. so if I want to send a
sequence of bits to the port I might do

out a;
out b;
out c;
etc..

but then this runs as fast as possible. Its actually not that fast as it
takes about 7us, atleast on my computer, to send just one out. So as you
can see, excluding task switch interruptions this code runs about 150khz or
so. Its not all that slow actually but would be nice to have to run fast as
possible(there is an upper limit on the speed of the port but I forgot what
it is... I think its around 100khz or so but depends on the chip used). But
lets suppose its to fast and I want to slow it down?

Only way is to introduce delays. Since the only way to delay for us's is to
introduce a spinwait I have no choice. Now the good thing is, is that 90%
of the time I only have to send a small number of bits(< 100) and the delay
is probably pretty short(< 100us).

This means that that maximally I would have about 10ms delay from start to
finish in sending one command. If its just interrupted once then that might
be ok. (actually its more like 2.5ms on average I think)

In any case I've switch to the idea that it might be best to learn about
kernel mode programming so I'm reading up on that.

Thanks,
Jon
 
M

Mads Bondo Dydensborg

Willy Denoyette [MVP] wrote:

Spinning for a longer period is just a waste of CPU cycles, you better
give
up your quantum by calling Sleep(1) or PInvoke the Kernel32
"SwitchToThread" API in that case.

Actually, IIRC, sleep(0) is enough to get the scheduler going.

Regards,

Mads

--
Med venlig hilsen/Regards

Systemudvikler/Systemsdeveloper cand.scient.dat, Ph.d., Mads Bondo
Dydensborg
Dansk BiblioteksCenter A/S, Tempovej 7-11, 2750 Ballerup, Tlf. +45 44 86 77
34
 
W

Willy Denoyette [MVP]

Mads Bondo Dydensborg said:
Willy Denoyette [MVP] wrote:



Actually, IIRC, sleep(0) is enough to get the scheduler going.

No it's not, Sleep(0) will not relinquish it's timeslice if there are no
"equal priority" ready threads to run, please read the Sleep API description
on MSDN. That means that specifying 0 as sleep count might lead to
starvation of lower priority threads.

Willy.
 
J

Jon Slaughter

Willy Denoyette said:
Adding to what others have said in this thread:
1) You should never SpinWait on a single processor machine, doing so
prevents other threads in the system to make progress( unless this is
exactly what you are looking for).
Waiting for an event (whatever) from another thread in a SpinWait loop,
prevents the other thread to signal the event, so basically you are
wasting CPU cycles for nothing.
2) Define the count as such that you spin for less than the time needed to
perform a transition to the kernel and back, when waiting for an event.
Spinning for a longer period is just a waste of CPU cycles, you better
give up your quantum by calling Sleep(1) or PInvoke the Kernel32
"SwitchToThread" API in that case.

Ok, Tell me then how I can do clocked IO in a timely fashion without using
without spining?

Lets suppose I have do slow down the rate because it simply to fast for
whatever device I'm communicating with if I do not insert delays.

I would be really interested in knowing how I can do this without using
spinwaits because it is a problem that I'm having. It seems like its a
necessary evil in my case.

Basically I'm trying to do synchronous communication with the parallel port.
I have the ability to use in and out which is supplied by a kernel mode
driver and dll wrapper. So if I want to output some data to the port I can
do "out(data)" and it will output it

if I have something like

for(int i = 0; i < data.Length; i++)
out(data);

Then this will run about 100khz or so(on my machine). Now what if I need to
slow it down to 20khz? How can I do this without using spin waits but still
do it in a timely fashion? IGNORE ANY DELAYS FROM TASK SWITCHING! I cannot
control the delays that other processes and task switching introduce so
since its beyond my control I have to ignore it. Whats important is the
upper bound that I can get and the average and not the lower bound. So when
I say it needs to run at 20khz it means as an upper bound.

for(int i = 0; i < data.Length; i++)
{
out(data);
Thread.SpinWait(X);
}

Where X is something that slows this down enough to run at 20khz. I can
figure out X on average by doing some profiling. i.e., if I know how long
out takes and how long Thread.SpinWait(1) takes(on average) then I can get
an approximate value for X.

But how can I do this without using spin waits?
 
W

Willy Denoyette [MVP]

Jon Slaughter said:
Willy Denoyette said:
Adding to what others have said in this thread:
1) You should never SpinWait on a single processor machine, doing so
prevents other threads in the system to make progress( unless this is
exactly what you are looking for).
Waiting for an event (whatever) from another thread in a SpinWait loop,
prevents the other thread to signal the event, so basically you are
wasting CPU cycles for nothing.
2) Define the count as such that you spin for less than the time needed
to perform a transition to the kernel and back, when waiting for an
event. Spinning for a longer period is just a waste of CPU cycles, you
better give up your quantum by calling Sleep(1) or PInvoke the Kernel32
"SwitchToThread" API in that case.

Ok, Tell me then how I can do clocked IO in a timely fashion without using
without spining?

Lets suppose I have do slow down the rate because it simply to fast for
whatever device I'm communicating with if I do not insert delays.

I would be really interested in knowing how I can do this without using
spinwaits because it is a problem that I'm having. It seems like its a
necessary evil in my case.

Basically I'm trying to do synchronous communication with the parallel
port. I have the ability to use in and out which is supplied by a kernel
mode driver and dll wrapper. So if I want to output some data to the port
I can do "out(data)" and it will output it

if I have something like

for(int i = 0; i < data.Length; i++)
out(data);

Then this will run about 100khz or so(on my machine). Now what if I need
to slow it down to 20khz? How can I do this without using spin waits but
still do it in a timely fashion? IGNORE ANY DELAYS FROM TASK SWITCHING!
I cannot control the delays that other processes and task switching
introduce so since its beyond my control I have to ignore it. Whats
important is the upper bound that I can get and the average and not the
lower bound. So when I say it needs to run at 20khz it means as an upper
bound.

for(int i = 0; i < data.Length; i++)
{
out(data);
Thread.SpinWait(X);
}

Where X is something that slows this down enough to run at 20khz. I can
figure out X on average by doing some profiling. i.e., if I know how long
out takes and how long Thread.SpinWait(1) takes(on average) then I can get
an approximate value for X.

But how can I do this without using spin waits?



The data transfer rate on a parallel port is a matter of handshake protocol
between the port and the device, basically it's the device who decides the
(maximum) rate. The exact transfer rates are defined in the IEE1284
protocol standards (IEE1284.1, 2, 3 , 4..) and the modes (like Compatible,
Nibble, Byte and ECP mode) supported by the parallel port peripheral
controller chips. All these kind of protocols (par. port , serial ports
networks, USB, other peripheral protocols) are exactly invented to be able
to control the signaling rates between the system and the device, the PC
hardware and the Windows OS is simply not designed for this, they are not
real-time capable.
Now, if you don't have a device connected that negotiates or respects one of
the IEE1284 protocol modes, you have a problem. You can't accurately time
the IO transfer rate, all you can do is insert waits in your code user mode
or in a driver driver) and as such define a top level rate but no lower
rate!
The easiest way (but still a dirty way to do) is by inserting delays like
you do in your code, this is not a problem for small bursts (say a few 100
bytes) on a single processor box, and a few KB on multi-cores, assuming that
you don't further peg the CPU between each burst, so that other threads
don't starve.


for(int i = 0; i < data.Length; i++)
{
out(data);
Thread.SpinWait(X);
}

Say that the above code uses a SpinWait of 50µsec. (say X = 150000), with a
of Length = 200, the entire loop will at least take 10 msec.
 
M

Mads Bondo Dydensborg

Willy said:
No it's not, Sleep(0) will not relinquish it's timeslice if there are no
"equal priority" ready threads to run, please read the Sleep API
description on MSDN. That means that specifying 0 as sleep count might
lead to starvation of lower priority threads.

We agree - I was sloppy - if there are no other threads ready to run,
nothing will happen.

Regards,

Mads

--
Med venlig hilsen/Regards

Systemudvikler/Systemsdeveloper cand.scient.dat, Ph.d., Mads Bondo
Dydensborg
Dansk BiblioteksCenter A/S, Tempovej 7-11, 2750 Ballerup, Tlf. +45 44 86 77
34
 
J

Jon Slaughter

Willy Denoyette said:
Jon Slaughter said:
Willy Denoyette said:
"Instead of just waiting for its time slice to expire, a thread can
block each time it initiates a time-consuming activity in another
thread until the activity finishes. This is better than spinning in a
polling loop waiting for completion because it allows other threads to
run sooner than they would if the system had to rely solely on
expiration of a time slice to turn its attention to some other thread."



I don't get the "a thread can block each time...". What does it mean by
blocking? Does it mean that if thread B needs something from thread A
that thread A stops thread B from running until its finished but not
interfer with some other thread C?

Thanks,

Jon




Adding to what others have said in this thread:
1) You should never SpinWait on a single processor machine, doing so
prevents other threads in the system to make progress( unless this is
exactly what you are looking for).
Waiting for an event (whatever) from another thread in a SpinWait loop,
prevents the other thread to signal the event, so basically you are
wasting CPU cycles for nothing.
2) Define the count as such that you spin for less than the time needed
to perform a transition to the kernel and back, when waiting for an
event. Spinning for a longer period is just a waste of CPU cycles, you
better give up your quantum by calling Sleep(1) or PInvoke the Kernel32
"SwitchToThread" API in that case.

Ok, Tell me then how I can do clocked IO in a timely fashion without
using without spining?

Lets suppose I have do slow down the rate because it simply to fast for
whatever device I'm communicating with if I do not insert delays.

I would be really interested in knowing how I can do this without using
spinwaits because it is a problem that I'm having. It seems like its a
necessary evil in my case.

Basically I'm trying to do synchronous communication with the parallel
port. I have the ability to use in and out which is supplied by a kernel
mode driver and dll wrapper. So if I want to output some data to the port
I can do "out(data)" and it will output it

if I have something like

for(int i = 0; i < data.Length; i++)
out(data);

Then this will run about 100khz or so(on my machine). Now what if I need
to slow it down to 20khz? How can I do this without using spin waits but
still do it in a timely fashion? IGNORE ANY DELAYS FROM TASK SWITCHING!
I cannot control the delays that other processes and task switching
introduce so since its beyond my control I have to ignore it. Whats
important is the upper bound that I can get and the average and not the
lower bound. So when I say it needs to run at 20khz it means as an upper
bound.

for(int i = 0; i < data.Length; i++)
{
out(data);
Thread.SpinWait(X);
}

Where X is something that slows this down enough to run at 20khz. I can
figure out X on average by doing some profiling. i.e., if I know how long
out takes and how long Thread.SpinWait(1) takes(on average) then I can
get an approximate value for X.

But how can I do this without using spin waits?



The data transfer rate on a parallel port is a matter of handshake
protocol between the port and the device, basically it's the device who
decides the (maximum) rate. The exact transfer rates are defined in the
IEE1284 protocol standards (IEE1284.1, 2, 3 , 4..) and the modes (like
Compatible, Nibble, Byte and ECP mode) supported by the parallel port
peripheral controller chips. All these kind of protocols (par. port ,
serial ports networks, USB, other peripheral protocols) are exactly
invented to be able to control the signaling rates between the system and
the device, the PC hardware and the Windows OS is simply not designed for
this, they are not real-time capable.


No, this is only for ECP and EPP. There is no handshaking and hardware
protocol in SPP which is what I'm using. It is also necessary for me to use
SPP because the device that is attached does not use the same protocol that
EPP/ECP uses.
Now, if you don't have a device connected that negotiates or respects one
of the IEE1284 protocol modes, you have a problem. You can't accurately
time the IO transfer rate, all you can do is insert waits in your code
user mode or in a driver driver) and as such define a top level rate but
no lower rate!
The easiest way (but still a dirty way to do) is by inserting delays like
you do in your code, this is not a problem for small bursts (say a few 100
bytes) on a single processor box, and a few KB on multi-cores, assuming
that you don't further peg the CPU between each burst, so that other
threads don't starve.


Well, thats what I'm doing but I'm trying to find the optimal method. This
is also the method that most programs that do similar things I'm trying to
do use.

I think I'm going to write a simple kernel mode driver that does all the
communications using direct port access(instead of the IOCTRL methodolgy).
Its more of a hack but is probably fast as I can get it. Of course that
method will cause problems with other drivers and stuff but I don't have to
worry about that.

I can also use the interrupt to get information on a regular basis but not
sure how well this will work.

I was thinking that maybe I could use an interrupt and then an external
clock that will trigger the interrupt very precisely and that would probably
give me a pretty accurate method but it would probably starve the system
because of all the task switching per clock. I guess I have no choice but
to either use something like dos or some hardware proxy that can deal with
the latency issues.
 
P

Peter Duniho

Jon said:
So blocking is something done on the thread side? I dont' really get it. If
I've created a thread how can I "block" while I wait for a resource to come
availiable?

You call a method that will block on the resource. Which method you
need to call depends on the resource.

For what it's worth, you seem to already have the basic understanding of
what has to happen. That is, you've already deduced the requirements
for being able to have something that allows a thread to block. So it
seems the main thing lacking is a practical experience or knowledge of
the concrete mechanisms involved. So I'll try to focus more on that.

As a simple example, consider Console.ReadLine(). This is a blocking
call. The thread that calls it will block until the user has entered a
full line of data. In reality, behind the scenes there's some work to
deal with checking for the end-of-line, but the basic idea is that as
long as the user isn't entering any data, that thread isn't doing any
work. It's not runnable. It's blocked.
Your essentially saying its better to block than

while(resource_unavailable)
Thread.Sleep(10)

or somethign like that... but then how can one actually block? Is it special
API routines?

Well, Sleep() is also a blocking method. In your example above, your
thread will be unrunnable for (roughly) 10 milliseconds. This is your
thread blocking.

But, no...I wouldn't say that blocking happens via "special API
routines". It is true that blocking is a consequence of calling
specific methods, but the number and variety of those methods is so
great, I'm not sure the term "special" really applies. Blocking is
almost always more a natural consequence of a specific design than it is
something you do specifically to block.

If _all_ you really want to do specifically is block, then I'd say that
Sleep() would be the function that does that. So in that respect,
sure...that's the "special API routine" that blocks. But there's many
many other ways for a thread to block.
It seems that blocking would have to occur by the code, such as a driver,
that actually can tell when the resource is ready. So there really couldn't
be self blocking? (seems like it could never wake up)

See above. But yet, generally speaking when a thread calls a method
that blocks, it has to be using some sort of resource that the OS knows
about so that the OS can wake the thread back up.

In some cases, this resource is some kind of synchronization object,
like a WaitHandle. This would be something one thread would use to
communicate with another, where one thread waits on the object, and
another thread sets the object when it's done whatever the first thread
was waiting for.

Another example would be the Monitor class, or even just the lock()
statement. Again, these are synchronization mechanisms, but this time
rather than having one thread explicitly waiting and another explicitly
signaling, they are ways of having both threads indicate to the OS "hey,
I want this resource" and then allowing the OS to do the hard work of
ensuring that only one thread is runnable at a time while they are using
that resource.

Another broad class of blocking methods are i/o methods, such as
Console.ReadLine() I mentioned above. Pretty much any form of i/o you
can do involves at some level a call to a low-level OS function that
uses some internal signaling method to allow the OS to make your thread
unrunnable until the i/o completes.

IMHO, this is probably the most applicable example in your own case.
You seem to be doing i/o, and so this is a natural scenario in which to
take advantage of blocking behavior.
It sounds if its like a lock that the scheduler uses to know when to put
back a thread into a queue? The thread that controls the resource is one
that controls the lock? In this way, say, if I'm some driver with resource X
and thread Q requests X but X is not ready I can set a lock on X w.r.t to Q
and the schedular will block Q.

Yes, to some extent that's a fair description of how it works. The
exact mechanism varies, but it always comes down to some means of the
thread registering itself (in practically all cases, this "registering"
is implicit in whatever blocking technique being used) with the OS so
that the OS itself knows under what condition the thread will become
runnable again. Then the OS manages all aspects, including moving the
thread back to the runnable state when appropriate so that the scheduler
can allow the thread to start executing again.
Basically you are using the term blocking in a self-referential way like "I
can either block myself or not"... but I don't understand how that could
work.

Well, you are correct when you observe that if a thread is blocked, it
cannot unblock itself. A blocked thread _always_ depends on some other
thread to unblock it. Sometimes that's as simple as the time period
given in the Sleep() call expiring, sometimes it's something else, like
some i/o completing, or another thread releasing a shared resource.

Saying that a thread can "block itself" is only self-referential in the
way that any reflexive verb is. That is, I suppose it technically is in
fact self-referential, but I don't see that as a problem.
[...]
How could thread B block itself and know when to "unblock" since it has no
idea when the resource will be available?

It can't. It relies on cooperation between the OS and whatever has the
resource in order to signal a state that will allow the OS to know that
the thread can be unblocked.
The only thing that knows this is
the code that sits on top of the resource and it would seem that it would
have to do the blocking and unblocking.

That's right. Because of the way the OS is designed, the holder of the
resource doesn't need to know who else wants it though. All it needs to
do is signal to the OS that it's done with the resource, and the OS will
take care of the rest.

Keeping in mind that synchronizing a shared resource isn't the _only_
way to block, it's just one way to block. But in any other way, there
is a similar mechanism involve that allows the OS to know all of the
details necessary in order to unblock the blocked thread when appropriate.
Now if there is a way to set a block on a specific resource like

In Thread B:

Ask for resource,
Block until resource available.

But internally it would seem that the blocking realling occurs by the
controller of the resource and the scheduler actually deals with that?

Define "controller of the resource". In some respects, the OS is the
"controller". That is, ultimately it is the OS managing who gets the
resource at any given time. On the other hand, you could say that
whatever thread currently has acquired the resource is the "controller".
That is, until that thread releases the resource, the resource is
controlled by that thread and unavailable to any other thread.
Yeah, I understand this. I don't understand how B can actually do any
blocking because the only way it could do this is to poll the resource until
its ready or hook some interrupt. The first case then isn't blocking and
the second isn't available to normal windows applications.

Ah, but it is. The second case, that is. Normal Windows applications
don't have direct access to the interrupts, no. But they do have access
to methods that allow the OS itself to use the interrupts, which
implicitly provides a mechanism for the application itself to use the
interrupts.

This is, in fact, how a lot of the various i/o methods work.
The way I see it, Thread B would have to ask to be blocked until resource is
available(well, essentially this would be implicit if its a synchronous
call).

Yes, that's exactly what happens. By making the synchronous call, the
thread is implicitly telling the OS "block my thread until the resource
is available".

Like I said, it seems to me you already know how it works. You just
don't realize it. :) You've deduced what must happen; the only missing
part is that you don't seem to realize that these mechanisms do in fact
already exist in the OS.
So the flow might go like

In Thread B,

Ask for resource,
Please Block me until resource is ready

So far, so good.
----

Thread A,
Thread B asked to be blocked until resource is ready,
Tell kernel to block B,
Wait until resource is ready(either through polling or interrupt)
Tell kernel to unblock B
return

Not quite right here. Thread A doesn't tell the OS to block B. The OS
already knows, by the semantics of whatever synchronization mechanism is
at work, that B needs to be blocked.

The only involvement thread A might have is in managing the resource
that the OS already knows thread B needs to be blocked on. This might
mean that thread A is holding the resource at the moment thread B
requests it. Or thread A might be a device driver thread managing i/o,
and by asking for some specific i/o on that device, thread B implicitly
tells the OS to block itself until thread A has completed whatever i/o
task is required to provide the result thread B wants.

Thread A might have some implicit relationship to thread B, but it's the
OS that manages which threads block and which ones get to run.
[...]
The bottom line here is that polling is bad. Really bad. It takes the
one thread that actually has no work to do, and causes it to use the most
CPU time out of any thread running on the system. Polling is almost
always counter-productive. It's almost never the right way to solve a
problem.

Right, I realize that because for my application I am doing something like
polling.

I was wondering if somehow I coudl get around it by blocking but I am
unfamiliar with this.

I have seen your comments in other threads. I have yet to see anything
in your comments that suggest that you really need polling.

I can't rule it out, but because of the way Windows works, polling is
usually not going to solve a problem in the way you might hope it would.

For practically any application, using some form of blocking i/o is the
appropriate solution. If you have an application that has such
time-critical needs that some sort of polling mechanism might be
required, it's almost always the case that that application just will
never work properly on Windows, because of its lack of real-time features.
In fact though I don't think I can get around polling the way I am doing it
because I'm working directly with hardware. Basically just using a proxy to
do the hardware. Not only do I have to poll but I also have to introduce
spin waits to slow down the process. This is bad but I think I have no
choice.

Without knowing the full details of your project, I can't really offer
much specific advice. While I have taken note of some of what you've
written in the other threads, I admit that I haven't been following the
conversation closely. If you've posted all of the gory details, I
didn't happen to catch that.

If you have no managed access to the i/o from the hardware, and no
integrated unmanaged access to the i/o (that is, via one of the
higher-level i/o API in Windows), then I suppose it's possible polling
is your only option. However, even there what you should do is use a
call to Sleep(), with a 1 millisecond timeout, any time you poll and
don't have any work to do, to ensure that your thread immediately yields
its timeslice.

If you do have managed access to the i/o, then it will be much better to
take advantage of the blocking behavior of the synchronous API, or in
many cases even better would be to use the asynchronous API (typically,
this would involve using the methods with the words "Begin..." and
"End..." at the start of the name).
I just don't understand how one can do orderly communications in windows. In
dos, say, if you wanted to communicate at some desired rate you could hook
the timer interrupt and it will be called every click of the timer... You
could configure the timer for several frequencies. So if I wanted to
communicate with the parallel port in a timely fashion I could do this quite
easily by hooking the timer.

Well, yes and no. If you've done DOS programming, then you know that
you can get problems when multiple pieces of code all try to hook the
same timer. Either the hooks get properly chained, in which case you
have the potential for performance issues, or they don't, in which case
you simply wind up with some code not getting timer notifications.

So it's not a panacea. :) But yes, DOS provides a low-level way to do
this sort of thing.

On the other hand, DOS isn't really a multi-tasking OS. Yes, there are
DOS programs that use various techniques to implement a form of
multi-tasking, but these are always fragile and rely on cooperation
between the various pieces of code all trying to run at once.

Windows, on the other hand, is a true multi-tasking OS. Code running on
Windows does not get to explicitly decide if and when it will run, but
Windows does provide other mechanisms to deal with efficiently allowing
multiple pieces of code all to share the same CPU resources.

As is the case in all consumer-level multi-tasking OS's though, Windows
doesn't provide any sort of real-time management. So the price of being
able to use this higher-level, more efficient multi-tasking API is that
control over exactly when your thread will execute is lost.

This doesn't prevent orderly communications. But it does prevent you
from knowing _exactly when_ communications will take place, yes.

Fortunately, for practically all i/o that a Windows application might be
asked to do, the OS switches between threads quickly enough that the
end-user never will notice any difference.
I could also use interrupts to get input when a resource is availible(such
as data on the parallel port).

But both of these methods are impossible to do in user mode code.... I'm
trying to read about kernel mode drivers and see if this is possible to do
in a driver(I know I can hook interrupts but not sure about the timming so I
still might have to introduce spin waits).

A kernel mode driver does have access to the same or similar mechanisms
you're familiar with from DOS, including interrupts. And in fact, if
for some reason you need this low-level access to the hardware, writing
a driver is often the best solution, especially if you can use
interrupts (even in a driver, polling has similar problems).

But, it's important to keep in mind that even if you put that sort of
logic into a driver, there's no way for that driver to interact with a
user-mode piece of code in a way that takes the thread-scheduling issues
out of the picture. The driver itself can be less-affected (though it's
subject to the same thread-scheduling rules, so... :) ), but in the end
if you're writing this code on Windows, presumably the data is
eventually presented to the user, or written to a file, or whatever, and
at that point it still has to go through user-mode code and will suffer
the same timing idiosyncrasies that user-mode code always has.
But if I could block for a very precise amount of time in a user mode
program I coudl simulate the timer interrupt. Essentially having some
Thread.Sleep but for high resolution. I know this is impossible on the PC
though but it would be nice.

Sure, it would be nice. You can in fact use other timer mechanisms.
For example, the multimedia timers in WinMM provide higher-resolution
timing. I don't know if there are similar high-resolution timers in
..NET, but there might be.

But even with higher-precision timers, your thread will be subject to
the thread-scheduling rules. There will always be limits to just how
much precision you get under Windows.
(Actually what would be nice is to have a small seperate cpu that is
specifically designed for timed communications so one could load some code
there and it will always run at some specified rate and is independent of
the main cpu and os)

Well, IMHO that's known as "DMA". :)
Ok, So I guess I was right in that blocking ultimately is part of the OS. I
still don't know what the blocking techniques are though but maybe they are
the synchronous calls?

Synchronous i/o calls are a form of blocking, yes.
I thin kthe problem is though that in some cases spinning is the only way to
do something. For example, I have a kernel mode driver that I use to
communicate with the port. It emulates in and out. so if I want to send a
sequence of bits to the port I might do

out a;
out b;
out c;
etc..

but then this runs as fast as possible.

I guess part of the question is why do you need this to run "as fast as
possible"? Or is it more a matter if you _don't_ want it to run "as
fast as possible"? Is there something about the i/o where you need to
control the exact timing of the use of the port?

Typically, there is some buffering available for i/o ports. The buffers
aren't huge, but they are large enough for the driver managing that i/o
port. Then the driver itself has larger buffers that it uses to manage
data on a timing frequency appropriate to user-mode thread scheduling.
Of course, typically there's also some sort of handshaking on the i/o
port so that the driver can signal that the buffers are full (forcing
the appropriate end of the i/o connecton to stop sending, whether that's
the hardware or the user-mode code).

But it's true that not all i/o devices support this sort of handshaking
(in particular, in the hardware-to-computer direction...software should
always be able to stop sending when a buffer is full, though no doubt
there are implementations out there that don't), and in those cases it's
actually possible to lose data if it's not read fast enough.

But I'm not clear on how that applies here.
Its actually not that fast as it
takes about 7us, atleast on my computer, to send just one out. So as you
can see, excluding task switch interruptions this code runs about 150khz or
so. Its not all that slow actually but would be nice to have to run fast as
possible(there is an upper limit on the speed of the port but I forgot what
it is... I think its around 100khz or so but depends on the chip used). But
lets suppose its to fast and I want to slow it down?

If what is too fast? Like I said, most i/o devices deal with this
inherently. They use buffering and handshaking to ensure that data is
not transmitted too quickly in either direction. If data is being sent
by the hardware so fast that the software can't keep up, i/o is simply
stopped until the software catches up. Likewise the other direction.

For what it's worth, it's hard for me to imagine hardware so fast that
the software can't keep up. The memory controller is one of the fastest
i/o devices in a computer, if not THE fastest, and software still winds
up waiting on memory on a regular basis. So _on average_, all code can
execute plenty fast enough to keep up with any i/o device attached to a
computer.

You seem to be asking about the other direction; software being too fast
for the hardware. But that begs the question, why is the hardware
driver not taking care of this already. I seem to recall this was a
regular parallel port; am I misremembering? A standard parallel port
driver should handle all of the buffering you need for it to work
correctly with whatever parallel device is attached to the hardware.
Only way is to introduce delays. Since the only way to delay for us's is to
introduce a spinwait I have no choice. Now the good thing is, is that 90%
of the time I only have to send a small number of bits(< 100) and the delay
is probably pretty short(< 100us).

I guess I'm still not clear on why the timing is so critical. Perhaps
this is a consequence of me not paying close enough attention to the
other threads.
This means that that maximally I would have about 10ms delay from start to
finish in sending one command. If its just interrupted once then that might
be ok. (actually its more like 2.5ms on average I think)

I don't recall the exact length of a timeslice on Windows (and if I
remember correctly, it can vary according to the specific version and
configuration of Windows you're using). However, I think that typically
10ms is less than the timeslice.

So, assuming your code starts sending immediately at the beginning of
its timeslice, it should be able to send all of the data within a single
timeslice. One way to manage this would be to have the thread blocked
until you are ready to send, and then unblock it. For example, use a
WaitHandle, where the thread uses the WaitHandle.WaitOne() method. Some
other thread would set the WaitHandler, and the thread waiting on the
WaitHandle would begin sending immediately after returning from WaitOne().

Assuming you've made sure the handle is not set before calling
WaitOne(), this will ensure that the sending starts right at the
beginning of a timeslice, and as long as 10ms is less than a timeslice
(which it generally should be), you can finish all of the work within a
timeslice, ensuring that your thread is not interrupted while sending
the data.

Now, as far as inserting delays into the data sending code, that's
outside the scope of this thread IMHO. You asked about how threads
block, and that's what I've been trying to explain.

But none of the blocking mechanisms allow you to block a thread and then
start executing it again with the sort of resolution you want. Even if
you raise the priority of your thread, just making it runnable won't
cause it to start executing again until there's a free CPU unit. This
won't happen until any threads currently executing either yield their
timeslice, or use up their timeslice entirely (and so are preempted by
the OS).
In any case I've switch to the idea that it might be best to learn about
kernel mode programming so I'm reading up on that.

If you are dealing with some sort of custom hardware that has very
specific timing needs with regards to its i/o, then yes...it's possible
you'll need to write a driver, and it's possible that driver might need
to be a kernel-mode driver. I don't have enough details to answer that
question.

But even drivers are subject to the thread-scheduling rules. There are
mechanisms provided to drivers to help manage timing issues, but at the
end of the day, Windows is simply not a real-time OS and so getting
precise timing of code execution is not always possible.

You can insert delays into your code to try to ensure a specific
_minimum_ delay between operations, but you have much less control over
the maximum delay, and using the built-in thread-scheduling mechanisms,
including those that block threads, isn't likely to be a good way to
implement the minimum delay aspect.

Pete
 
P

Peter Duniho

Jon said:
[...]
Basically I'm trying to do synchronous communication with the parallel port.
I have the ability to use in and out which is supplied by a kernel mode
driver and dll wrapper.

I guess I'm curious at this point as to why you want to use this kernel
mode driver that provides you direct access to the ports? Windows has a
usable higher-level i/o API that should handle all the i/o timing and
buffering required. I'm not specifically aware of a managed code API,
but the unmanaged parallel port access via CreateFile, etc. ought to
work I would think.

Can you explain why it is the usual buffered i/o mechanisms are suitable
for your needs? I'm not sure it will lead to a specific solution, but
it would at least help us to understand the scenario better.
[...]
Then this will run about 100khz or so(on my machine). Now what if I need to
slow it down to 20khz? How can I do this without using spin waits but still
do it in a timely fashion?

I'm not aware of any non-spin-wait mechanism that will allow you to time
the interval between individual calls to out. The best you can do with
the mechanisms available is to ensure an _average_ data rate and even
there, without some kind of buffering support, you still have the
problem of having to not send data too fast.

But spin-waits are potentially going to cause other problems that will
actually cause your implementation to perform worse. In some cases, it
might actually perform worse than just calling Sleep(1) between each
call to "out" (which itself, as you probably know, would kill
performance if you're looking for a 20khz sending rate).

If you have to use spin-waits, then I suppose you have to. But it would
be helpful to at least make sure you really have to. So far, it's not
clear why you have to (at least, not to me). Spin-waiting is bad enough
that it's definitely a last resort.

Pete
 
J

Jon Slaughter

Peter Duniho said:
Jon said:
[...]
Basically I'm trying to do synchronous communication with the parallel
port. I have the ability to use in and out which is supplied by a kernel
mode driver and dll wrapper.

I guess I'm curious at this point as to why you want to use this kernel
mode driver that provides you direct access to the ports? Windows has a
usable higher-level i/o API that should handle all the i/o timing and
buffering required. I'm not specifically aware of a managed code API, but
the unmanaged parallel port access via CreateFile, etc. ought to work I
would think.

Can you explain why it is the usual buffered i/o mechanisms are suitable
for your needs? I'm not sure it will lead to a specific solution, but it
would at least help us to understand the scenario better.

Because it uses a specific protocol AFAIK so you cannot use any deviced
attached to the port(only those deviced designed to communicate on it).
[...]
Then this will run about 100khz or so(on my machine). Now what if I need
to slow it down to 20khz? How can I do this without using spin waits but
still do it in a timely fashion?

I'm not aware of any non-spin-wait mechanism that will allow you to time
the interval between individual calls to out. The best you can do with
the mechanisms available is to ensure an _average_ data rate and even
there, without some kind of buffering support, you still have the problem
of having to not send data too fast.

huh? How can you ensure an average data rate but not ensure it will not be
to fast? Its very easy to ensure a maximum speed... its impossible to
ensure a mimimum speed.

If I do spinwait(100) then I'm guaranteed a atleast whatever maximum time
spinwait takes to execute.(assuming I kno how long it takes). The cpu cannot
execute any faster than its clock rate... Although maybe some of the
internal optimizations could, in theory, excute things a little faster.

Essentially any instruction executed that takes 1 cycle will always run with
a minimal time of 1/clock_speed. It can never be any faster except for those
possible optimizations the cpu does.
But spin-waits are potentially going to cause other problems that will
actually cause your implementation to perform worse. In some cases, it
might actually perform worse than just calling Sleep(1) between each call
to "out" (which itself, as you probably know, would kill performance if
you're looking for a 20khz sending rate).

My tests have shown thats its pretty good... atleast good enough for my
application.

Because I'm using clocked/synchronous communications with my external
device, it really utlimately doesn't matter about the speed or variations in
speed(except on a few devices that will time out). What I'm trying to
achieve is a way to send data at an average rate so this can be
quantitatively adjusted by the user. So if the user wants 20khz he will get
about 20khz and not 50khz or 5khz(on average).

If he decides to try it a little faster, like 21khz, then on average it will
be about 1khz faster(or atleast no slower, on average than 20khz).

If you have to use spin-waits, then I suppose you have to. But it would
be helpful to at least make sure you really have to. So far, it's not
clear why you have to (at least, not to me). Spin-waiting is bad enough
that it's definitely a last resort.


Sure... but if you can come up with a better way then I'm all ears. This is
the method used by all the programs that I am aware that are doing similar
stuff to what I need.

Although I have no idea how SPP mode works with clocked data for, say, a
printer. I'm not sure if they just bit-banged or what. Obviously the newer
modes get around this by having the hardware deal with it but they then
introduce there own protocols that do not work for me cause the devices I'm
attaching do not understand them.
 
J

Jon Slaughter

Rick Lones said:
For an example of the type of low-level "machinery" that can be used to
synchronize tasks on resources, see "semaphore". A semaphore can be
viewed as a data structure which represents a resource or event. One key
aspect of a classic OS semaphore structure is a queue to which tasks
awaiting the resource or event can chain their task control blocks.

Ok, these are locks... but who implements them? I can't see how a program
can block itself to task scheduling... except maybe it blocks but then
something else has to unblock.
But "it" does know - because your task has called ReadLine()! That
announces that your task cannot continue until an Environment.NewLine has
been registered at the keyboard. You have blocked yourself until an I/O
operation completes.

Ok, so your saying essentially when a synchronous call is made that requires
a resource that you block yourself? Somehow you say "I'll wait until the
resource is free"... but then something else must unblock. But it would
seem then to do any type of blocking you have to know explicitly the
resource to block on so whatever else can unblock. Seems more complicated
than just having the other thing block and unblock.


What I mean is that it makes more sense to me for whatever that is
controlling the resource to control blocking and unblocking.

Example.

Program:
Hey, Save this file

Resource Handler:
(Internally to scheduler: Block Program,
Save the file,
Return status msg and unblock program)
Ok, I saved it.
------

instead of

Program:
Hey, Save this file
(tell scheduler to block me)

Resource Handler:
(Internally to scheduler:
Save the file,
Return status msg and unblock program)
Ok, I saved it.


I guess though it doens't quite matter where it does it and maybe its
actually better for the program to block itself.. just seems like theres no
context there and it could block itself for any reason even if the resource
handler doesn't need to block.(which I guess would be asynchronous commm)

so for something like text output,


Program:
Hey, print x to screen

Resource Handler:
Queue x to be printed
------


instead of

Program:
Hey, print x to screen
(tell scheduler to block me)

Resource Handler:
(Internally to scheduler:
Queue x to be printed
Return status msg and unblock program)
----

As for the scheduler: you could think of it as managing a bunch of queues
whose contents are task control blocks. One of these is the Ready queue,
consisting of task which are runnable. A task is removed from the Ready
queue when it blocks and is put back onto this queue when the event it is
awaiting occurs. While not on the Ready queue, it is never even looked at
for scheduling. Caveat: I am describing a simplified and idealized OS
here, NOT the down and dirty details of any particular Windows OS. But
the model is accurate enough to be useful.

And for a quick handwave at the driver level: Somewhere somehow in some
form there is a semaphore or some equivalent which represents the event "a
line of text was entered at the keyboard". Your task's control block was
placed on that semaphore's queue when you called ReadLine() and there it
will sit until some interrupt handler recognizes that the event has
occurred and "posts" the event to the semaphore. This "post" operation
causes a task (yours) which has been waiting on this semaphore to be
removed from the semaphore's queue and placed back on the scheduler's
Ready queue. Voila - your awaited event has occurred andy you are now
runnable again. (Again, a generic but useful model of the kinds of things
that take place.)


You sound like someone who is ready to appreciate a good book on Operating
System basics. :) If you want to see the guts of a simple and
approachable OS explained at a very useful level of detail you might start
with LaBrosse's description of his uC/OS. This is a bare-bones
multitasking OS intended for embedded systems.

Well, it would be nice if I had time but I got so much other stuff to do. I
know the basics but I need to understand how it all fits together. Maybe
one day I'll actually read a book on it. (I used to program protected mode
back in the day when it was a big deal... but forgot a lot of the stuff and
never really got to much past writing a task switcher.)

I think eventually I will take a look at some embedded operating systems
because I'll probably need one in the future but at this point I just want
to write a program to compute with some devices I have that use some
protocols(ICSP, I2C and SPI). I want it ot be general enough so I program
these protocols in a nice way(instead of hard coding them). That way if the
future if I want to add another one such as modbus or rs-232(emulated on the
parallel port by polling... or just end up using the serial port) I can
without to much trouble.
 
R

Rick Lones

Jon said:
Ok, but I don't understand the "blocked on some event and will not be
rescheduled until the event occurs". How does that happen? I can't picture
how the machinery is setup to do this.

For an example of the type of low-level "machinery" that can be used to
synchronize tasks on resources, see "semaphore". A semaphore can be viewed as a
data structure which represents a resource or event. One key aspect of a
classic OS semaphore structure is a queue to which tasks awaiting the resource
or event can chain their task control blocks.
If we use your example of ReadLine() then essentially what happens is
eventually the call works its way down to a hardware driver. But this
requires a task switch somewhere to go from user mode to kernel mode. Now
wouldn't the scheduler try and "revive" the user mode code that was tasked
switched because it wouldn't know that its waiting for the kernel mode code
to finish?

But "it" does know - because your task has called ReadLine()! That announces
that your task cannot continue until an Environment.NewLine has been registered
at the keyboard. You have blocked yourself until an I/O operation completes.

As for the scheduler: you could think of it as managing a bunch of queues whose
contents are task control blocks. One of these is the Ready queue, consisting
of task which are runnable. A task is removed from the Ready queue when it
blocks and is put back onto this queue when the event it is awaiting occurs.
While not on the Ready queue, it is never even looked at for scheduling.
Caveat: I am describing a simplified and idealized OS here, NOT the down and
dirty details of any particular Windows OS. But the model is accurate enough to
be useful.

And for a quick handwave at the driver level: Somewhere somehow in some form
there is a semaphore or some equivalent which represents the event "a line of
text was entered at the keyboard". Your task's control block was placed on that
semaphore's queue when you called ReadLine() and there it will sit until some
interrupt handler recognizes that the event has occurred and "posts" the event
to the semaphore. This "post" operation causes a task (yours) which has been
waiting on this semaphore to be removed from the semaphore's queue and placed
back on the scheduler's Ready queue. Voila - your awaited event has occurred
andy you are now runnable again. (Again, a generic but useful model of the
kinds of things that take place.)
Or is there some information in a task switch like a lock or something that
tells the scheduler not to revice a process and the kernel mode code would
determine that.

Hence I suppose in the task switch state one might have something like
"IsBlocked". When the kernel mode is entered, if its asynchronous is might
set IsBlocked momentarily but then release it. If its synchronous then it
would set IsBlocked until it is completely finished.

Am I way off? ;) Its just hard for me to understand how the internal
machinery accomplishes this. Maybe is as simple as a lock though?

You sound like someone who is ready to appreciate a good book on Operating
System basics. :) If you want to see the guts of a simple and approachable OS
explained at a very useful level of detail you might start with LaBrosse's
description of his uC/OS. This is a bare-bones multitasking OS intended for
embedded systems.

Regards,
-rick-
 
P

Peter Duniho

Jon said:
Because it uses a specific protocol AFAIK so you cannot use any deviced
attached to the port(only those deviced designed to communicate on it).

Have you tried? The basic parallel port driver should be protocol
agnostic, AFAIK. You open it with CreateFile(), and it's just a
read/write stream.

The driver should take care of all the data integrity stuff, while your
application can worry about the application protocol.

I don't know how to do this using managed code, but I think that using
p/invoke to get at the unmanaged API would be a lot easier than the
hoops you're trying to jump through now.
huh? How can you ensure an average data rate but not ensure it will not be
to fast? Its very easy to ensure a maximum speed... its impossible to
ensure a mimimum speed.

Ensuring an average data rate is easy. You send some fixed amount of
data, and then you wait an appropriate amount of time. The amount of
time you wait and/or measure may be long, but there will always be an
amount of time you can select that provides an accurate-enough average
data rate even using the standard Windows timing mechanisms.

Whether this is practical in your case, I can't say. It really depends
on why the timing is so critical to you. But generally speaking, it's
just not a problem. Average is just that: average. If you average over
a long enough time period, it's trivial to achieve any arbitrary
average. You just need to be able to select a long enough time period.
[...] What I'm trying to
achieve is a way to send data at an average rate so this can be
quantitatively adjusted by the user. So if the user wants 20khz he will get
about 20khz and not 50khz or 5khz(on average).

Well, over what time period is it necessary that this average data rate
be achieved? Will the user care if the data transmission is in short,
rapid bursts that over a second or more still average out to the rate
they've selected?

Or is the requirement more sensitive than that?
[...]
Sure... but if you can come up with a better way then I'm all ears. This is
the method used by all the programs that I am aware that are doing similar
stuff to what I need.

For example? What Windows programs are you talking about that use this
library to access the CPU's i/o port directly, rather than going through
CreateFile to access the parallel port? How have you verified that they
use this technique? And if they use this technique, how to _they_ deal
with the data throttling issues?

Knowing answers to those questions would go a long way to better
understanding your specific situation.
Although I have no idea how SPP mode works with clocked data for, say, a
printer. I'm not sure if they just bit-banged or what. Obviously the newer
modes get around this by having the hardware deal with it but they then
introduce there own protocols that do not work for me cause the devices I'm
attaching do not understand them.

As far as I know, a parallel printer driver does not implement its own
parallel port i/o. It uses the built-in Windows parallel port driver,
and relies on that driver to deal with the low-level issues. This is
true even for the older parallel port modes (often selectable in BIOS,
so Windows has to support them).

If it were me, I would use unmanaged code based on CreateFile using the
parallel port as my first attempt. Only if that didn't allow me to
achieve what I wanted would I mess around with this lower-level stuff.

Pete
 
J

Jon Slaughter

Peter Duniho said:
You call a method that will block on the resource. Which method you need
to call depends on the resource.

For what it's worth, you seem to already have the basic understanding of
what has to happen. That is, you've already deduced the requirements for
being able to have something that allows a thread to block. So it seems
the main thing lacking is a practical experience or knowledge of the
concrete mechanisms involved. So I'll try to focus more on that.

As a simple example, consider Console.ReadLine(). This is a blocking
call. The thread that calls it will block until the user has entered a
full line of data. In reality, behind the scenes there's some work to
deal with checking for the end-of-line, but the basic idea is that as long
as the user isn't entering any data, that thread isn't doing any work.
It's not runnable. It's blocked.

I think I'm starting to understand it. I guess I've always just thought of a
program as sorta getting all the cpu and so I have so preconcieved notions
that are getting in the way.
Well, Sleep() is also a blocking method. In your example above, your
thread will be unrunnable for (roughly) 10 milliseconds. This is your
thread blocking.

Oh right... I think I meant a spinwait there instead of sleep.
But, no...I wouldn't say that blocking happens via "special API routines".
It is true that blocking is a consequence of calling specific methods, but
the number and variety of those methods is so great, I'm not sure the term
"special" really applies. Blocking is almost always more a natural
consequence of a specific design than it is something you do specifically
to block.

If _all_ you really want to do specifically is block, then I'd say that
Sleep() would be the function that does that. So in that respect,
sure...that's the "special API routine" that blocks. But there's many
many other ways for a thread to block.

Well, sleep would be a perfect solution for me if I could do it for less
than 1ms. 1ms means that if I interlaced every output with sleep(1) that I
could run at a maximum of 1khz. This is extremely slow for my application.
if I could get 10us resolution then it would be much better. I know that
this would be counter productive because it would require a task switch
every 10us which is very costly.
See above. But yet, generally speaking when a thread calls a method that
blocks, it has to be using some sort of resource that the OS knows about
so that the OS can wake the thread back up.

It seems there are 3 parts here.

who blocks, who unblocks, and what triggers the unblock.

I suppose the first one doesn't matter because in either case the thread
will get blocked. The second one cannot be the thread because it makes no
sense to unblock itself because then it would have to be running.. So this
must be the controller of the resource because the scheduler has no idea
about the resource.

So really I think the problem is the last one. I imagine that the the
controller of the resource(well, the code that interfaces with it) says
"Hey, The resources is available" and tells the scheduler that it can now
unblock the thread. Just not sure how that works. Maybe the details don't
matter to much though cause its kinda getting off tangent to my original
problem.
In some cases, this resource is some kind of synchronization object, like
a WaitHandle. This would be something one thread would use to communicate
with another, where one thread waits on the object, and another thread
sets the object when it's done whatever the first thread was waiting for.

Another example would be the Monitor class, or even just the lock()
statement. Again, these are synchronization mechanisms, but this time
rather than having one thread explicitly waiting and another explicitly
signaling, they are ways of having both threads indicate to the OS "hey, I
want this resource" and then allowing the OS to do the hard work of
ensuring that only one thread is runnable at a time while they are using
that resource.

Yeah, I guess I need to read about these in the context of an operation
system. I know why they are used but I guess I'm not clear on exactly how
they are implemented... of course its probably not such a simple topic.
Another broad class of blocking methods are i/o methods, such as
Console.ReadLine() I mentioned above. Pretty much any form of i/o you can
do involves at some level a call to a low-level OS function that uses some
internal signaling method to allow the OS to make your thread unrunnable
until the i/o completes.

IMHO, this is probably the most applicable example in your own case. You
seem to be doing i/o, and so this is a natural scenario in which to take
advantage of blocking behavior.

Yes... but ultimately it doesn't matter because I have to spinwait to get
the timing right... So if I write a kernel mode driver then I have to
spinwait in there and sure I end up blocking the calling thread but also
everything else... but I don't seem to have any choice because I have to
time things but pc's don't have the ability to do such high resolution
timing(AFAIK).
Yes, to some extent that's a fair description of how it works. The exact
mechanism varies, but it always comes down to some means of the thread
registering itself (in practically all cases, this "registering" is
implicit in whatever blocking technique being used) with the OS so that
the OS itself knows under what condition the thread will become runnable
again. Then the OS manages all aspects, including moving the thread back
to the runnable state when appropriate so that the scheduler can allow the
thread to start executing again.


Well, you are correct when you observe that if a thread is blocked, it
cannot unblock itself. A blocked thread _always_ depends on some other
thread to unblock it. Sometimes that's as simple as the time period given
in the Sleep() call expiring, sometimes it's something else, like some i/o
completing, or another thread releasing a shared resource.

Yeah. I think I was grouping unblocking with blocking. Its easy to block
yourself(just terminate will work ;) but you cannot unblock yourself(unless
you were not blocked in the first place).

So I guess when I call sleep I put a block out but whateve mechanism is
behind sleep(the os I guess) has the code that will unblock my code after
the time has elapsed?
Saying that a thread can "block itself" is only self-referential in the
way that any reflexive verb is. That is, I suppose it technically is in
fact self-referential, but I don't see that as a problem.

Yeah, I was just misunderstand the term or misusing it. I was, for some
reason, thinking if you blocked yourself you had to unblock yourself. Not
sure why ;/
[...]
How could thread B block itself and know when to "unblock" since it has
no idea when the resource will be available?

It can't. It relies on cooperation between the OS and whatever has the
resource in order to signal a state that will allow the OS to know that
the thread can be unblocked.
The only thing that knows this is the code that sits on top of the
resource and it would seem that it would have to do the blocking and
unblocking.

That's right. Because of the way the OS is designed, the holder of the
resource doesn't need to know who else wants it though. All it needs to
do is signal to the OS that it's done with the resource, and the OS will
take care of the rest.

Keeping in mind that synchronizing a shared resource isn't the _only_ way
to block, it's just one way to block. But in any other way, there is a
similar mechanism involve that allows the OS to know all of the details
necessary in order to unblock the blocked thread when appropriate.

yeah. I'm sure there are many ways but I think fundamentally its pretty much
all the same? I mean, if ya gotta block then ya gotta block?
Define "controller of the resource". In some respects, the OS is the
"controller". That is, ultimately it is the OS managing who gets the
resource at any given time. On the other hand, you could say that
whatever thread currently has acquired the resource is the "controller".
That is, until that thread releases the resource, the resource is
controlled by that thread and unavailable to any other thread.

I mean the lowest level code that works with the resource like a driver.

Yes, that's exactly what happens. By making the synchronous call, the
thread is implicitly telling the OS "block my thread until the resource is
available".

Like I said, it seems to me you already know how it works. You just don't
realize it. :) You've deduced what must happen; the only missing part is
that you don't seem to realize that these mechanisms do in fact already
exist in the OS.

lol Yeah, well. I'm just not clear on it. I do have the basics to some
degree but I've never read how this stuff worked(well, just a little) so I
have no idea if what I think is correct(although now I am much more
confident).

I mean, basically I reason it out as if I had to implement the stuff from
scratch... but that doesn't mean I would end up getting it right.
So far, so good.


Not quite right here. Thread A doesn't tell the OS to block B. The OS
already knows, by the semantics of whatever synchronization mechanism is
at work, that B needs to be blocked.

The only involvement thread A might have is in managing the resource that
the OS already knows thread B needs to be blocked on. This might mean
that thread A is holding the resource at the moment thread B requests it.
Or thread A might be a device driver thread managing i/o, and by asking
for some specific i/o on that device, thread B implicitly tells the OS to
block itself until thread A has completed whatever i/o task is required to
provide the result thread B wants.

Thread A might have some implicit relationship to thread B, but it's the
OS that manages which threads block and which ones get to run.

Ok. I guess I would need to see exactly how its implmented to feel
comfortable with it but I do see how it could work. I guess its just hard
for me to see what kinda code has to be behind such seemingly simple calls
as readline to get all that stuff to work.

I guess though its very similar how dos works? If you wanted to output text
you would call an interrupt routine(although you could do write directly to
video memory I suppose)... the act of interrupting is blocking and the
unblocking is when the interrupt returned from the call?

Sounds like this is very similar but on a more complicated level because of
the multitasking environment.
I have seen your comments in other threads. I have yet to see anything in
your comments that suggest that you really need polling.

I can't rule it out, but because of the way Windows works, polling is
usually not going to solve a problem in the way you might hope it would.

For practically any application, using some form of blocking i/o is the
appropriate solution. If you have an application that has such
time-critical needs that some sort of polling mechanism might be required,
it's almost always the case that that application just will never work
properly on Windows, because of its lack of real-time features.


Ok, I cannot use the parallel port in ECP or EPP mode because the devices
attached are not normally for the parallel port(they are pics, analog
switches, etc..).

Basically they run on protocols that have nothing to do with the protcol's
used by those modes of the port.

But I can still use the port to do the communications because its just
simply sending bits in a predefined way. Its clocked/synchronous
communications so there is a clock and a data line. I can use two pins on
the port and just output bits in the correct order to do my communications.

Problem is, in some cases the protocol specifies that the slave will need to
take over the communications(such as the acknowledgement part). So maybe I
can do something like

out 1
out 0
out 1
out 1

but then I need to wait for an acknowledgement from the slave.

How can I do this without polling or interrupts?

Is the slave sends the acknowledgement and I'm not listening then I've lost
it. If I can't use interrupts then only other option is to poll. (and when
it sends th acknowledgement it only lasts for a several microseconds)

Without knowing the full details of your project, I can't really offer
much specific advice. While I have taken note of some of what you've
written in the other threads, I admit that I haven't been following the
conversation closely. If you've posted all of the gory details, I didn't
happen to catch that.

If you have no managed access to the i/o from the hardware, and no
integrated unmanaged access to the i/o (that is, via one of the
higher-level i/o API in Windows), then I suppose it's possible polling is
your only option. However, even there what you should do is use a call to
Sleep(), with a 1 millisecond timeout, any time you poll and don't have
any work to do, to ensure that your thread immediately yields its
timeslice.

To slow ;/ I would love to use sleep as its the easiest method but its just
to slow. 1ms resolution would just kill my performance.

For one of the protocols each command is about 30 bits long and there are
minimum wait times of about 40ns between each bit sent. This means I would
have to wait ~30ms to send just one command. For sending 10's of kb such as
a program dump would take 1000's of seconds. (because each time a code must
be sent) and thats the best time(thats assumping sleep(1) takes exactly 1ms)

It won't hurt to have large delays sparsely injected into the
transmission(its not the best way but it doesn't cause to many problems with
the slave device) and it just mainly slows down everythign... but doing it
for eveyr bit makes the program useless.

If you do have managed access to the i/o, then it will be much better to
take advantage of the blocking behavior of the synchronous API, or in many
cases even better would be to use the asynchronous API (typically, this
would involve using the methods with the words "Begin..." and "End..." at
the start of the name).

I think the problem is, is that blocking does nothing for my app because its
not a "timed" block. I'll end up blocking the thread anyways but I'll do it
at the command level.

so essentially instead of inserting sleep between every bit, I'll insert
short spinwaits and then unblock after the full command is sent.

Well, yes and no. If you've done DOS programming, then you know that you
can get problems when multiple pieces of code all try to hook the same
timer. Either the hooks get properly chained, in which case you have the
potential for performance issues, or they don't, in which case you simply
wind up with some code not getting timer notifications.

So it's not a panacea. :) But yes, DOS provides a low-level way to do
this sort of thing.

Well, I mean without having all that extra stuff running in the system. I
also plan on not running my application with others to get maximum
performance.
On the other hand, DOS isn't really a multi-tasking OS. Yes, there are
DOS programs that use various techniques to implement a form of
multi-tasking, but these are always fragile and rely on cooperation
between the various pieces of code all trying to run at once.

Windows, on the other hand, is a true multi-tasking OS. Code running on
Windows does not get to explicitly decide if and when it will run, but
Windows does provide other mechanisms to deal with efficiently allowing
multiple pieces of code all to share the same CPU resources.

As is the case in all consumer-level multi-tasking OS's though, Windows
doesn't provide any sort of real-time management. So the price of being
able to use this higher-level, more efficient multi-tasking API is that
control over exactly when your thread will execute is lost.

This doesn't prevent orderly communications. But it does prevent you from
knowing _exactly when_ communications will take place, yes.

Fortunately, for practically all i/o that a Windows application might be
asked to do, the OS switches between threads quickly enough that the
end-user never will notice any difference.


Well, thats a problem though because windows has the benefit of
multi-tasking... which makes it much better for dos but falls short for
timed communications. Would be nice if they added some ability for both.

I mean, what would be really nice is to have a small device dededicated for
doing such a thing. Actually they are all over the pc but they are all
hardware and not programmable.
A kernel mode driver does have access to the same or similar mechanisms
you're familiar with from DOS, including interrupts. And in fact, if for
some reason you need this low-level access to the hardware, writing a
driver is often the best solution, especially if you can use interrupts
(even in a driver, polling has similar problems).

But, it's important to keep in mind that even if you put that sort of
logic into a driver, there's no way for that driver to interact with a
user-mode piece of code in a way that takes the thread-scheduling issues
out of the picture. The driver itself can be less-affected (though it's
subject to the same thread-scheduling rules, so... :) ), but in the end if
you're writing this code on Windows, presumably the data is eventually
presented to the user, or written to a file, or whatever, and at that
point it still has to go through user-mode code and will suffer the same
timing idiosyncrasies that user-mode code always has.

Well, the thing is though I can send the commands at the kernel mode level
and it will take a lot of the performance hit out of it(from the task
switching between priveledge levels).
Sure, it would be nice. You can in fact use other timer mechanisms. For
example, the multimedia timers in WinMM provide higher-resolution timing.
I don't know if there are similar high-resolution timers in .NET, but
there might be.

But even with higher-precision timers, your thread will be subject to the
thread-scheduling rules. There will always be limits to just how much
precision you get under Windows.

Yes, I'm not trying to take control of the system but just trying to maximum
performance. When I run the program I will run it in an environment without
anyother programs if I need the performance.
Well, IMHO that's known as "DMA". :)

Well, but its then a "dumb" piece of a hardware. What I mean is something
that sorta runs a sperate piece of code at a specific timed rate... like a
small thread but that is not part of the main cpu... maybe just 1 register
and a little memory for the stack and buffers that are also not part of the
main memory.


So, for example, it could be used, say, to read the ports of the mouse(just
like a normal mouse driver) and be completely seperate from the main cpu
except for the ability to transfer the data to main memory and maybe send
interrupts for things too.

Not sure if that would work though ;)
I guess part of the question is why do you need this to run "as fast as
possible"? Or is it more a matter if you _don't_ want it to run "as fast
as possible"? Is there something about the i/o where you need to control
the exact timing of the use of the port?

Because if I'm transfering large amounts of data then the faster the better.
I don't want to wait 10 mins to transfer updated code to a pic to find out
that it doesn't work and I made a stupid bug then have to spend another 10
mins to transfer it. (when it might be just several kbs long)
Typically, there is some buffering available for i/o ports. The buffers
aren't huge, but they are large enough for the driver managing that i/o
port. Then the driver itself has larger buffers that it uses to manage
data on a timing frequency appropriate to user-mode thread scheduling. Of
course, typically there's also some sort of handshaking on the i/o port so
that the driver can signal that the buffers are full (forcing the
appropriate end of the i/o connecton to stop sending, whether that's the
hardware or the user-mode code).

But it's true that not all i/o devices support this sort of handshaking
(in particular, in the hardware-to-computer direction...software should
always be able to stop sending when a buffer is full, though no doubt
there are implementations out there that don't), and in those cases it's
actually possible to lose data if it's not read fast enough.

But I'm not clear on how that applies here.

I can't use that because its those methods use handshaking and stuff that is
incompatible with the protocols the devices I'm using.
If what is too fast? Like I said, most i/o devices deal with this
inherently. They use buffering and handshaking to ensure that data is not
transmitted too quickly in either direction. If data is being sent by the
hardware so fast that the software can't keep up, i/o is simply stopped
until the software catches up. Likewise the other direction.

Depends on the device. There is no handshaking, no initalization or anything
for communications. You just clock bits to and from the device except on
some occasions where I think, but not entirely sure, that the slave device
will take over the data and clock lines.

But there still is a maximum rate.

You can find some more info here. These are for PIC18's which are similar to
the PIC24's for ICSP.

http://ww1.microchip.com/downloads/en/DeviceDoc/30277d.pdf

You'll have to scroll quite a bit for some timing diagrams.

For example, in one the maximum clock is 10mhz most of the time, but in some
parts of the command you must stop for several us. (in some cases its much
larger like 400ms for some commands that require a long wait(such as erasing
memory))


Of course, if outputing to the port takes more than the maximum wait time
bit-banging will take care of the timing itself. This is what most people
do. For example, my original code that used Inpout32, which is a kernel
driver that lets you simulate out and in, took about 3us per call. So in
this case I would never have to insert any delays.. But unfortunately I know
that this isn't always the case and it does have to be slowed down a great
deal in some instances.



For what it's worth, it's hard for me to imagine hardware so fast that the
software can't keep up. The memory controller is one of the fastest i/o
devices in a computer, if not THE fastest, and software still winds up
waiting on memory on a regular basis. So _on average_, all code can
execute plenty fast enough to keep up with any i/o device attached to a
computer.

The reason I want it fast is two-fold. One is to keep the data to the
device quickly and the other is so there is less blocking going on.
You seem to be asking about the other direction; software being too fast
for the hardware. But that begs the question, why is the hardware driver
not taking care of this already. I seem to recall this was a regular
parallel port; am I misremembering? A standard parallel port driver
should handle all of the buffering you need for it to work correctly with
whatever parallel device is attached to the hardware.

Maybe it does? I didn't know the parallel port buffered anything in SPP or
if it does its a vey small buffer, But even if it did then it might run to
fast?

The problem is, atleast what I have had experience with in other programs,
is that if you send bits to fast then the device on the other end doesn't
pick them up... even if its suppose to support extremely high data rates.

I know this because I've used a program called WinPic that does this and
when I programmed several pic18's I had to reduce the speed because they
would not program at those high data rates... unfortunately I have no idea
how fast that was beacause it didn't mention the rate as it was just a
slider. But It was somewhat slow already. because it took several mins just
to prorgram the chip.

I want to completely configure this. I want the ability to configure the
speed(average speed) and report that to the user as an actual speed and not
just some slider that corresponds to the waits in SpinWait.(So I'll do some
profiling and determine what that means in frequency).

Of course I don't want to run it at 50khz if I can run it at 100khz because
I want it to be fast as possible because it gives the user more headroom.
(why limit it to such a lower speed if its not necessary) In either case
its probably still blocking the cpu because the calls are synchronous to the
kernel mode driver(I believe, I could be wrong).

I guess I'm still not clear on why the timing is so critical. Perhaps
this is a consequence of me not paying close enough attention to the other
threads.

Its not critical, luckily. But I want to maximize the speed. If it was
critical then I couldn't do it(for some devices they actually time out if it
takes to long to send a command but luckily most do not).

Basically what is "critical" about the speed is that it makes the user wait
a long time. Its analogous to printing without a spooler. Would you rather
wait 10 mins to print or 1 min?
I don't recall the exact length of a timeslice on Windows (and if I
remember correctly, it can vary according to the specific version and
configuration of Windows you're using). However, I think that typically
10ms is less than the timeslice.

So, assuming your code starts sending immediately at the beginning of its
timeslice, it should be able to send all of the data within a single
timeslice. One way to manage this would be to have the thread blocked
until you are ready to send, and then unblock it. For example, use a
WaitHandle, where the thread uses the WaitHandle.WaitOne() method. Some
other thread would set the WaitHandler, and the thread waiting on the
WaitHandle would begin sending immediately after returning from WaitOne().

Assuming you've made sure the handle is not set before calling WaitOne(),
this will ensure that the sending starts right at the beginning of a
timeslice, and as long as 10ms is less than a timeslice (which it
generally should be), you can finish all of the work within a timeslice,
ensuring that your thread is not interrupted while sending the data.

Now, as far as inserting delays into the data sending code, that's outside
the scope of this thread IMHO. You asked about how threads block, and
that's what I've been trying to explain.

But none of the blocking mechanisms allow you to block a thread and then
start executing it again with the sort of resolution you want. Even if
you raise the priority of your thread, just making it runnable won't cause
it to start executing again until there's a free CPU unit. This won't
happen until any threads currently executing either yield their timeslice,
or use up their timeslice entirely (and so are preempted by the OS).

Those things are out of my control. If windows has to interrupt my thread
when I'm half sending a command then I can't do anything about it... it just
slows down data rate... this is one reason why the speed is "critical". (not
that its critical in the sense that it has to be fast to work but it has to
fast to work well and in this case = more productivity)
If you are dealing with some sort of custom hardware that has very
specific timing needs with regards to its i/o, then yes...it's possible
you'll need to write a driver, and it's possible that driver might need to
be a kernel-mode driver. I don't have enough details to answer that
question.

But even drivers are subject to the thread-scheduling rules. There are
mechanisms provided to drivers to help manage timing issues, but at the
end of the day, Windows is simply not a real-time OS and so getting
precise timing of code execution is not always possible.

You can insert delays into your code to try to ensure a specific _minimum_
delay between operations, but you have much less control over the maximum
delay, and using the built-in thread-scheduling mechanisms, including
those that block threads, isn't likely to be a good way to implement the
minimum delay aspect.


I don't really see what choice I have. Basically either use the hardware to
do the communications, which AFAIK isn't going to work because its specific
to some protocol that the devices I use don't use; Communicate at a very
slow rate, which is so slow that no one, including myself, will use my
program; Or do it fast as possible, as an upper bound, so that even if it
slows down due to whatever uncontrollable factors, will still be fast
enough(on average).


Wow, I think this is the long post I've ever been apart of ;/

lol..

Thanks for your time and writing all that ;)

Maybe I should write up specifically what I'm trying to do in another thread
so its more clear?

Jon
 
P

Peter Duniho

Jon said:
Ok, these are locks... but who implements them? I can't see how a program
can block itself to task scheduling... except maybe it blocks but then
something else has to unblock.

That's right.
Ok, so your saying essentially when a synchronous call is made that requires
a resource that you block yourself?

I don't think that's what he's saying at all. All he's pointing out is
that by calling the synchronous read, you signal to the OS that you
can't proceed until the appropriate event occurs (in this case, a
newline character is read from the console input stream). Your thread
is not the one doing the read; it's waiting on some OS component that
does the read and returns the data to you. And the OS handles
maintaining your thread in an unrunnable state and unblocking the thread
once it's runnable.

(Caveat: the ReadLine case is a little misleading, because in fact I
believe that what blocks is the really low-level single character input;
once even one character is available, that thread gets to run and look
at it, and it is in fact the thread that decides whether the character
is a newline or not, and whether the method returns to the caller. But
the basic blocking behavior itself, being based on the individual
character reads and not the newline per se, _is_ in fact managed by the
OS, and not your thread).
Somehow you say "I'll wait until the
resource is free"... but then something else must unblock.

Yup. Like I said, a thread cannot ever unblock itself. It's not
running, so it can't do anything itself that would change its state.
Some other thread has to.
But it would
seem then to do any type of blocking you have to know explicitly the
resource to block on so whatever else can unblock.

No. As I've said, in many situations the blocking is implicit in the
function you've called. ReadLine (or Console.ReadKey) is a great
example. You have no idea what resource you're blocking on. You just
know that your thread will block as desired until the desired data has
arrived.
Seems more complicated
than just having the other thing block and unblock.

It's actually the other way around. Requiring a thread holding a
resource to be aware of every other thread depending on that resource is
the more complicated solution.

There's a good reason that all OS's use this design. It's simple and
works well.
What I mean is that it makes more sense to me for whatever that is
controlling the resource to control blocking and unblocking.

The OS controls the blocking. Neither the thread holding the resource,
nor the thread desiring the resource manages the blocking explicitly.
Those threads simply use the appropriate OS objects to allow the OS to
understand the state of each thread and manage the blocking behavior.
Example.

Program:
Hey, Save this file

Resource Handler:
(Internally to scheduler: Block Program,
Save the file,
Return status msg and unblock program)
Ok, I saved it.
------

instead of

Program:
Hey, Save this file
(tell scheduler to block me)

Resource Handler:
(Internally to scheduler:
Save the file,
Return status msg and unblock program)
Ok, I saved it.

Well, one major problem with the above pseudo-code is that at _some_
point the "Program" has to stop and wait. You don't provide any
indication in the first example of where that might happen. Without
something to block the thread, a thread will just keep executing code
until the thread simply exits. So, once the program has executed the
code that does the "Hey, Save this file" step, where does it go from there?

Without something blocking it, it will just keep on executing code.
What code is it going to continue to execute until the "Resource
Handler" part gets a chance to block it? Are you going to put dummy
code in there, just so the thread has something to do until the Resource
Handler gets a chance to block it?

You see, it actually makes a lot of sense for the thread itself to do
the blocking. The thread knows _exactly_ where an appropriate
stop-and-wait point is, and so having it work so that the thread signals
the OS via some function call that it's ready to wait makes a lot of
sense and provides a much more orderly arrangement than what you're
suggesting.

Again, there's a reason this is how all OS's do this.
I guess though it doens't quite matter where it does it and maybe its
actually better for the program to block itself.. just seems like theres no
context there and it could block itself for any reason even if the resource
handler doesn't need to block.(which I guess would be asynchronous commm)

Yes, a thread can block itself for any reason. That said, some
synchronous API's are designed to return immediately if they can. For
example, a call to recv() in Winsock will return immediately if there is
data available, and it will block if there's not. As long as data is
available, the thread doesn't give up its timeslice and there's no
context switch slowing things down.

Just because something can block, that doesn't mean it always does. And
usually when something does block, it's because the thread really does
need to stop and wait. Blocking is the most efficient use of the CPU
resource, because the alternative is for the thread to waste time doing
nothing while some other thread could be doing something useful but isn't.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top