The Interlocked on the Edge of Forever

Jon Skeet [C# MVP] · May 8, 2007

William Stacey said:
| Hmm. That sounds like it's a broken implementation then. Basically, if
| the .NET CLR executes code in a way which violates the spec, the
| implementation is flawed. If the processor could reorder things, the
| CLR should make sure that that is invisible to the user, beyond what is
| possible within the CLI spec.
|
| I'd be interested to hear Joe Duffy's opinion on that part of the
| article.

I would like to see Joe comment on that as well. If that article is
correct, it sounds like "volatile" may not work as people expect it to be
working.

Well, not just how people expect it to work, but how it's *specified*
to work.

I've now seen your post about it basically being a potential bug in the
..NET 1.1 CLR.

| No - because only the thread which *calls* the Interlocked method knows
| that interlocked is involved, whereas with volatile both the reading
| thread *and* the writing thread know to use memory barriers. Of course,
| if you use Interlocked in both threads, to both read *and* write the
| value, then everything will be okay.

The reader does not need to know as it gets the read fence automatically (at
the hardware layer) because the Interlocked buts up a *full memory barrier
fence so reads can not be reordered at the instruction level.

But the read could *already* have been reordered by the JIT, ages
before this occurs. How could the JIT know whether a variable would
*ever* be used with Interlocked, and avoid reordering?

vista has
some other versions with seperate read aquire/write aquire methods, but the
ones in the framework use the full fence. I am not sure how volatile is
implemented under the covers, but it would not suprise me if it also uses an
interlocked operation.

But there's more to it than that - there's the restriction on JIT
reordering, and *that* can't be known about ahead of time.

| If we care about atomicity, there needs to be some kind of locking.
| If we only care that the change to y is seen after the change to x
| (i.e. you can't see x=0, y=1) then volatile will do the job but I don't
| believe that changing the variable with Interlocked and then reading it
| directly in another thread is guaranteed to work.

hmm. We have seem to cover the same ground. Everything I have seen (more
then that link) says it does. I would be interested if you find something
to contrary as that would be good info. Also, most (if not all) locks are
based on Interlocked in their guts, so if it does not work, then everything
in the history of the world is broke. Ok, that is bit drastic... tia

I'm not saying that Interlocked doesn't work. I'm saying that it only
works if you use it consistently across all threads.

I'll try to get round to mailing Joe about it...

William Stacey [C# MVP] · May 8, 2007

| But the read could *already* have been reordered by the JIT, ages
| before this occurs. How could the JIT know whether a variable would
| *ever* be used with Interlocked, and avoid reordering?

I assume it knows the same way it knows a var is volatile - by inspecting
the code as it compiles and keeping some state if it sees a var wrapped with
interlocked or volatile, it does not reordering. Not sure. I am not sure
when and where is actually does these optimizations. I think I read
something about tight local loops, but not sure. It would be interesting to
see a good article on the subject.

| I'll try to get round to mailing Joe about it...

Thanks. That would be great to see what he says.

Jon Skeet [C# MVP] · May 8, 2007

William Stacey said:
| But the read could *already* have been reordered by the JIT, ages
| before this occurs. How could the JIT know whether a variable would
| *ever* be used with Interlocked, and avoid reordering?

I assume it knows the same way it knows a var is volatile - by inspecting
the code as it compiles and keeping some state if it sees a var wrapped with
interlocked or volatile, it does not reordering. Not sure. I am not sure
when and where is actually does these optimizations. I think I read
something about tight local loops, but not sure. It would be interesting to
see a good article on the subject.

When the JIT has to create native code for one method, it might not
even have loaded all the assemblies which will manipulate the same
variables as that method does. I can't see that it could *possibly*
know whether or not some code, somewhere, is going to use it with the
Interlocked class.

Compare that with finding out whether or not the variable is volatile -
you look in a single place, and that's it.

| I'll try to get round to mailing Joe about it...

Thanks. That would be great to see what he says.

Yeah. This relies on me finding the time, of course

Jon Skeet [C# MVP] · May 9, 2007

Yeah. This relies on me finding the time, of course

I've found the time:

http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile-
reads-and-interlocked-and-how-they-interact.aspx

All comments welcome. I'm sorry that it's clearly written from my
viewpoint rather than that of an impartial observer - I hope I haven't
misrepresented any views. I'm happy to edit where appropriate.

William Stacey [C# MVP] · May 10, 2007

"Ecma-335:2006
....
12.6.5

5. Explicit atomic operations... These operations (e.g. Increment,
Decrement, Exchange, and CompareExchange) perform implicit acquire/release
operations. ..."

So by my read, this contract would require the CLI to respect the same
optimization rules as volatile. System.Thread.MemoryBarrier would seem to
have to respect proper order too even if the vars are not decorated with
volatile. Do you read it different? Thanks.

--
William Stacey [C# MVP]

|
| <snip>
|
| > > | I'll try to get round to mailing Joe about it...
| > >
| > > Thanks. That would be great to see what he says.
| >
| > Yeah. This relies on me finding the time, of course

|
| I've found the time:
|
| http://msmvps.com/blogs/jon.skeet/archive/2007/05/09/non-volatile-
| reads-and-interlocked-and-how-they-interact.aspx
|
| All comments welcome. I'm sorry that it's clearly written from my
| viewpoint rather than that of an impartial observer - I hope I haven't
| misrepresented any views. I'm happy to edit where appropriate.
|
| --
| Jon Skeet - <[email protected]>
| http://www.pobox.com/~skeet Blog: http://www.msmvps.com/jon.skeet
| If replying to the group, please do not mail me too

Jon Skeet [C# MVP] · May 10, 2007

William Stacey said:
"Ecma-335:2006
...
12.6.5

5. Explicit atomic operations... These operations (e.g. Increment,
Decrement, Exchange, and CompareExchange) perform implicit acquire/release
operations. ..."

So by my read, this contract would require the CLI to respect the same
optimization rules as volatile. System.Thread.MemoryBarrier would seem to
have to respect proper order too even if the vars are not decorated with
volatile. Do you read it different? Thanks.

The operations on Interlocked *do* perform implicit acquire/release
operations - but the *reading* operations don't. That's the problem.

To put it an alternative way, it is (in some ways - not all) a bit like
acquiring a lock while writing a value, but not acquiring the same lock
when reading. Yes, you've made sure that the data is available to be
read with the memory barrier on the writing thread, but you haven't
made sure that the reading thread actually performs the read when you
expect it to.

Basically, both threads need a memory barrier in order to be effective.
Using Interlocked puts a memory barrier on the calling thread, but
that's all - IMO

William Stacey [C# MVP] · May 11, 2007

| Basically, both threads need a memory barrier in order to be effective.
| Using Interlocked puts a memory barrier on the calling thread, but
| that's all - IMO

Hi Jon. Two issues here I think:
1) Interlocked and barrier. Interlocked does provide a full memory barrier.
At least from the docs, this is clear.
2) Does it prevent reads from being reordered by the CLI? This is not as
clear. It seems that if it implicitly expresses volatile read/write as it
claims in ECMA, then it would also prevent such optimizations. Same way you
would expect Thread.MemoryBarrier() to do. IIRC, these CLI optimization
rules must be very localized and are very strict. They can't just do them
all over the place. They can only do them them when they *know there can be
no side effects.

http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx
"We've spent a lot of time thinking about what the correct memory model for
the CLR should be. If I had to guess, we're going to switch from the ECMA
model to the following model. I think that we will try to persuade other
CLI implementations to adopt this same model, and that we will try to change
the ECMA specification to reflect this.

1.. Memory ordering only applies to locations which can be globally
visible or locations that are marked volatile. Any locals that are not
address exposed can be optimized without using memory ordering as a
constraint since these locations cannot be touched by multiple threads in
parallel.
2.. Non-volatile loads can be reordered freely.
3.. Every store (regardless of volatile marking) is considered a release.
4.. Volatile loads are considered acquire.
5.. Device oriented software may need special programmer care. Volatile
stores are still required for any access of device memory. This is
typically not a concern for the managed developer."
This was wrote back in May 2003, and don't know if this made it into 2.0 or
latter or at all. But he seems to touch on this very issue we are talking
about. Order changes can only be applied to locals that are not address
exposed and do not have some volatile symatics around them. As interlocked
and MemoryBarrier has implicit volatile contract, I would assume CLI
inspection would have to honor those same as with volatile keyword. Maybe
not. Would be nice to get a clearer writing on this.

Barry Kelly · May 11, 2007

William Stacey [C# MVP] wrote:

I think we're starting to lose some context. Here's your original reply
to Chris:

Chris wrote:
| ... but, if I just want to read the value, I CANNOT do:
|
| int myValue = _firstTime;
| or
| if (_fistTime==0)
|
| Doing these requires a memory barrier of some type (volatile variable,
| monitor, etc) to have any degree of reliability.

you can. The Interlocked write ensures this across all cpus and cache.

This is the basic point that Jon (and I, and the standard, and Chris
Brumme's post that you quote) disagree with you on.

An interlocked operation is a full fence (read and write barrier, both
acquire and release) on the thread that it was *executed* on. It doesn't
affect reads from *other* threads.

| Basically, both threads need a memory barrier in order to be effective.
| Using Interlocked puts a memory barrier on the calling thread, but
| that's all - IMO

Hi Jon. Two issues here I think:
1) Interlocked and barrier. Interlocked does provide a full memory barrier.
At least from the docs, this is clear.
2) Does it prevent reads from being reordered by the CLI? This is not as
clear.

There is no way it could perform its advertised purpose if it didn't
prevent related reads moving before it. The CLI needs to pass the
location of the target to the interlocked operation, so it can't have
related values cached in (e.g.) a register or it would be plainly faulty
with only a *single* thread of operation. Since that discounts compiler
optimizations that would move related reads before the interlocked
operation, the only thing left is CPU semantics - and that's what the
Interlocked operation guarantees, it guarantees a full fence for the
calling thread.

http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx :
[extra quoted for clarity -- Barry]

"We've spent a lot of time thinking about what the correct memory model for
the CLR should be. If I had to guess, we're going to switch from the ECMA
model to the following model. I think that we will try to persuade other
CLI implementations to adopt this same model, and that we will try to change
the ECMA specification to reflect this.

1.. Memory ordering only applies to locations which can be globally
visible or locations that are marked volatile. Any locals that are not
address exposed can be optimized without using memory ordering as a
constraint since these locations cannot be touched by multiple threads in
parallel.
2.. Non-volatile loads can be reordered freely.
3.. Every store (regardless of volatile marking) is considered a release.
4.. Volatile loads are considered acquire.
5.. Device oriented software may need special programmer care. Volatile
stores are still required for any access of device memory. This is
typically not a concern for the managed developer."

Click to expand...

This was wrote back in May 2003, and don't know if this made it into 2.0 or
latter or at all. But he seems to touch on this very issue we are talking
about.

Interlocked operations pass the address (reference to the location), so
they are "address exposed" and therefore the read can't move before the
interlocked operation. Volatile semantics aren't required for this; in
fact, not even threads are required for this. An optimization that moved
the read before the interlocked call (or any other call by reference) is
wrong in every case.

Order changes can only be applied to locals that are not address
exposed and do not have some volatile symatics around them.

That is to say, order changes are irrelevant (and in fact, can't even be
detected, if the compiler has no bugs) for locals that are not address
exposed, since only a single thread of execution can ever see their
values. That is to say, he says almost exactly the *opposite* of what
you said - that order changes can *only* be detected, are *only*
relevant, for locations that are globally visible, address exposed or
marked volatile.

As interlocked
and MemoryBarrier has implicit volatile contract, I would assume CLI
inspection would have to honor those same as with volatile keyword. Maybe
not. Would be nice to get a clearer writing on this.

All this is irrelevant to your original point at the root of the thread
though, in that, contrary to what you claim, a memory barrier is still
required for reads from threads *other* than the one that performed the
interlocked operation, if you're going to guarantee that you read what's
really there and not some stale value.

-- Barry

William Stacey [C# MVP] · May 11, 2007

| All this is irrelevant to your original point at the root of the thread
| though, in that, contrary to what you claim, a memory barrier is still
| required for reads from threads *other* than the one that performed the
| interlocked operation, if you're going to guarantee that you read what's
| really there and not some stale value.

Hi Barry. I address this part first as it is key to everything else. There
*is a full fence on interlocked methods (save the newer win32 ones like
InterlockedIncrementAcquire that provide acquire and release semantics). It
is *gaureenteed* by the OS and hw. It is documented in many places (see
some below). There can never be stale data, as hw ensures this across all
cpus and caches. After an Interlock completes, any and all threads will see
only the last data. This is one reason why it is a relatively expensive
call. Where is it documented there is not a full barrier?

Here is some linq:
http://msdn2.microsoft.com/en-us/library/ms684122.aspx
http://www.gamasutra.com/features/20060630/paquet_01.shtml
http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx

Jon Skeet [C# MVP] · May 11, 2007

William Stacey said:
| All this is irrelevant to your original point at the root of the thread
| though, in that, contrary to what you claim, a memory barrier is still
| required for reads from threads *other* than the one that performed the
| interlocked operation, if you're going to guarantee that you read what's
| really there and not some stale value.

Hi Barry. I address this part first as it is key to everything else. There
*is a full fence on interlocked methods (save the newer win32 ones like
InterlockedIncrementAcquire that provide acquire and release semantics). It
is *gaureenteed* by the OS and hw. It is documented in many places (see
some below). There can never be stale data, as hw ensures this across all
cpus and caches. After an Interlock completes, any and all threads will see
only the last data. This is one reason why it is a relatively expensive
call. Where is it documented there is not a full barrier?

There *is* a full barrier, *within that thread*. Not within all others.

Any thread which actually *performs* a read after the Interlock
completes will indeed see the new value. The point is that with
reordering in the reading thread, there's no guarantee that it will
make any attempt to read the value after the Interlock completes - it
could have read it before the Interlocked operation executes, even if a
value is later read which looks like it should have been read
beforehand. Suppose the source code is:

Thread 1
int b = y;
int a = x;

Thread 2
Interlocked.Increment (ref x);
Interlocked.Increment (ref y);

The first thread's code can be reordered by the JIT, and then executed
as:

Thread 1 Thread 2
int a = x;
Interlocked.Increment (ref x);
Interlocked.Increment (ref y);
int b = y;

Note the reordering. The guarantee that threads reading after
Interlocked.Increment see the new values is still intact - it's just
that x has been read before y, which appears to be wrong according to
the original source code.

This is because there aren't any memory barriers in the reading thread.

Could you point to where in the ECMA spec it says anything about a
memory barrier on one thread affecting what is possible in terms of
reordering on another thread?

It's important to understand that this *isn't* just a case of hardware
caches and memory fences etc - the JIT itself can reorder the reads,
which is what I've been trying to show above.

Here is some linq:
http://msdn2.microsoft.com/en-us/library/ms684122.aspx
http://www.gamasutra.com/features/20060630/paquet_01.shtml
http://blogs.msdn.com/oldnewthing/archive/2004/05/28/143769.aspx
http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx

I don't see anything in those pages to prevent the JIT from reordering
the reads on the reading thread. Half of them aren't even about .NET,
which means they won't be addressing the JIT-reordering issue at all.

I haven't heard back from Joe yet - I might mail Chris Brumme to see if
he'd care to comment.

William Stacey [C# MVP] · May 11, 2007

| There *is* a full barrier, *within that thread*. Not within all others.

What would be the point of a full barrier if it did not work across threads?
You would not need it for a single thread. The full barrier ensures the cpu
can not reorder reads and all reads will see the last write. But, I think
your talking about reorder by the CLI as below.

| Any thread which actually *performs* a read after the Interlock
| completes will indeed see the new value. The point is that with
| reordering in the reading thread, there's no guarantee that it will
| make any attempt to read the value after the Interlock completes - it
| could have read it before the Interlocked operation executes, even if a
| value is later read which looks like it should have been read
| beforehand. Suppose the source code is:
|
| Thread 1
| int b = y;
| int a = x;
|
| Thread 2
| Interlocked.Increment (ref x);
| Interlocked.Increment (ref y);
|
| The first thread's code can be reordered by the JIT, and then executed
| as:
|
| Thread 1 Thread 2
| int a = x;
| Interlocked.Increment (ref x);
| Interlocked.Increment (ref y);
| int b = y;
|
| Note the reordering. The guarantee that threads reading after
| Interlocked.Increment see the new values is still intact - it's just
| that x has been read before y, which appears to be wrong according to
| the original source code.
|
| This is because there aren't any memory barriers in the reading thread.

I understand your point. I am playing devils advocate with myself trying to
find proof either way. But, by my read (reading thru the sparse lines) the
JIT is *not allowed to make this optimization here. It can only reorder
when:
1) There is no volatile
2) The var is not address exposed (e.g. local).
3) There is no other explicit or implicit memory barriers protecting the var
(i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

So this would fail test #3 and not allow reorder by JIT. The ecma I posted
hints at this, but is far from detailed on this. Chris B talks more about
this, but don't know when/if he implemented it in the CLR. Naturally, I
could be nuts. It would not be the first time. Thank you.

Jon Skeet [C# MVP] · May 11, 2007

2.. Non-volatile loads can be reordered freely.

And that's the absolutely key one. As far as the reading thread is
concerned, it's performing non-volatile loads. It doesn't know that an
Interlocked operation will occur anywhere else, and it's not loading a
volatile variable - therefore it's a non-volatile load, and the
operations can be reordered as shown in the post I've just made.

<snip>

Jon Skeet [C# MVP] · May 11, 2007

William Stacey said:
| There *is* a full barrier, *within that thread*. Not within all others.

What would be the point of a full barrier if it did not work across threads?

The same point as locking has. Both threads need to try to acquire a
lock before it becomes particularly useful, but I don't think you'd
argue that it *isn't* useful.

You would not need it for a single thread. The full barrier ensures the cpu
can not reorder reads and all reads will see the last write.

.... if they perform a genuine read *after* that point, yes.

But, I think your talking about reorder by the CLI as below.

Yes - although from the memory model point of view (which doesn't know
about CPU fences) it's irrelevant. The memory model makes certain
guarantees without saying whether the optimisations which can be
performed are due to the JIT or the CPU.

| Note the reordering. The guarantee that threads reading after
| Interlocked.Increment see the new values is still intact - it's just
| that x has been read before y, which appears to be wrong according to
| the original source code.
|
| This is because there aren't any memory barriers in the reading thread.

I understand your point. I am playing devils advocate with myself trying to
find proof either way. But, by my read (reading thru the sparse lines) the
JIT is *not allowed to make this optimization here. It can only reorder
when:
1) There is no volatile
2) The var is not address exposed (e.g. local).
3) There is no other explicit or implicit memory barriers protecting the var
(i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

You seem to be treating these as an "and" whereas it's not - otherwise
you could never have reordering of *any* non-local reads.

I agree with 1. I believe you're misreading the spec on point 2 - it's
not a condition, it's just that *any* reads/writes can be reordered.

So this would fail test #3 and not allow reorder by JIT. The ecma I posted
hints at this, but is far from detailed on this. Chris B talks more about
this, but don't know when/if he implemented it in the CLR. Naturally, I
could be nuts. It would not be the first time. Thank you.

I don't see where it hints at this at all. Out of the 5 rules you
mentioned in the spec, the important one is this: "Non-volatile loads
can be reordered freely."

The statement:

int a = x;

is *not* a volatile load when x isn't a volatile variable. Just because
some other thread may or may not perform an interlocked operation
doesn't make loading it volatile. I think that's the crux of our
disagreement.

So, having reduced it to this (if you agree with me on that much) can
you provide any indication that accessing a variable at any point on
any thread with an Interlocked operation makes every load of that
variable on every thread count as being volatile? I haven't seen any
evidence of that.

Ben Voigt · May 11, 2007

cpus and caches. After an Interlock completes, any and all threads will

see
only the last data. This is one reason why it is a relatively expensive
call. Where is it documented there is not a full barrier?

It is a full memory barrier. That means that the new value is actually in
RAM, coherent across all processors (some architectures might choose write
snooping instead of invalidating caches in other cores) before the
Interlocked call returns. This falls far short of forcing other threads to
"see the last data", because another thread could have cached the value in a
CPU register (hoisting a complex expression outside a loop, anyone?).
Access to the memory address is guaranteed to yield the new value, but only
volatile variables result in memory being addressed on every use of the
variable name. Otherwise the compiler can perform alias analysis, determine
that the reading thread can't change the variable in the interim, and thus
uses the cached value.

Chris Mullins [MVP] · May 11, 2007

There's quite a bit of dicusssion going on with regard to memory models and
interlocked writes and standard reads.

I figured I would write a little program and see if I can make the problem
show up.
The sample app I wrote below does, every once in a while, go "Boom".

This means the memory model is indeed reordering the reads of test and
test2.

My environment for running this test is:
Dual-Core AMD Athlon
Windows XP x64
Compiled in Release Mode, using "Any CPU"
Running without a debugger attached.

I've run the test using "Any CPU", "x86" and "x64" as specific compile
targets. Same result each time.

Sometimes it runs for a long time without failing, other times I get a ton
of failures. For example:
....
Boom : 287884447, 287884446
Boom : 287885420, 287885419
Boom : 287885780, 287885779
Boom : 287886016, 287886015
Boom : 287887440, 287887439
Boom : 287887798, 287887797

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;

namespace ConsoleApplication4
{
class Program
{
static long test = 0;
static long test2 = 0;

static void Main(string[] args)
{
Thread t1 = new Thread(ReadThreadProc);
Thread t2 = new Thread(ReadThreadProc);
Thread t3 = new Thread(WriteThreadProc);

t1.IsBackground = true;
t2.IsBackground = true;
t3.IsBackground = true;

t1.Start();
t2.Start();
t3.Start();

Console.WriteLine("Press any key to exit");
Console.ReadLine();

}

private static void WriteThreadProc()
{
while (true)
{
Interlocked.Increment(ref test);
Interlocked.Increment(ref test2);
}
}

private static void ReadThreadProc()
{
while (true)
{
long v1 = test;
long v2 = test2;

if (v1 > v2)
Console.WriteLine("Boom : {0}, {1}", v1, v2);
}
}
}
}

Chris Mullins [MVP] · May 11, 2007

There's a (very obvious) bug in my test.

Please ignore that last post...

Jon Skeet [C# MVP] · May 11, 2007

Chris Mullins said:
There's a (very obvious) bug in my test.

Please ignore that last post...

If it's any consolation, I made the exact same mistake when I first
made the blog post

Barry Kelly · May 11, 2007

William said:
| There *is* a full barrier, *within that thread*. Not within all others.

What would be the point of a full barrier if it did not work across threads?

A full barrier is a combination of both a read barrier and a write
barrier.

A write barrier makes sure that all pending writes are retired, and is
useful just before publishing a value, so that when it is read on a
different thread / CPU, there are no pending writes that could catch any
readers by surprise if they were to end up being retired a little later
(out of order). The hardware-level concept of a write barrier affects
only the CPU executing the write barrier, ensuring that all pending
writes are retired before the publishing the final value. It's an
ordering primitive for that CPU. Yes it does affect other threads, but
only in so far as what they can observe from memory, in conjunction with
the *other* half of the equation - a read barrier.

A read barrier makes sure that no reads / prefetches / what have you
have snuck before retrieve a particular value from a location, a value
that you expect to have been published by another thread. The read
barrier only affects the CPU performing the read. It's directly
complementary to a write barrier on the publishing thread, it ensures
that you read what you think you read, and not what the CPU inferred you
we're going to indirectly read by (e.g.) sneaking ahead and peeking at
the instruction stream.

Read and write barriers are complementary. Write barriers stop writes
moving forward in time, while read barriers stop reads moving backwards
in time. If you don't have balanced write and read barriers on threads
that are communicating via shared memory without locks, then their reads
and writes will overlap, and consequently linear, intuitive, von Neumann
expectations of invariants will be broken [unless of course, a finite
state machine model for all possible combinations have been worked out
and tested, see e.g. Cliff Click's work on a lock-free hash table for
Java:

http://blogs.azulsystems.com/cliff/2007/05/hardware_visibi.html

..]

Again, a read barrier only affects the thread / CPU that it's executed
on. It doesn't affect any writers on other threads. And read barriers of
some kind (even the implicit ones with 'lock') aren't optional even when
the writing thread is using volatile / interlocked / write barriers /
other mechanisms.

You would not need it for a single thread.

Barriers aren't needed for single threads because then there's only one
thread context, one CPU, and the OS would have responsibility for any
read / write barriers before context switching into / out of your
application's thread of control, since there's no concurrency possible
with only a single thread.

The full barrier ensures the cpu
can not reorder reads and all reads will see the last write. But, I think
your talking about reorder by the CLI as below.

Memory barriers / fences only affect ordering of reads and writes of the
thread / CPU they were executed on, though.

I understand your point. I am playing devils advocate with myself trying to
find proof either way. But, by my read (reading thru the sparse lines) the
JIT is *not allowed to make this optimization here. It can only reorder
when:
1) There is no volatile
2) The var is not address exposed (e.g. local).
(i.e. Thread.MemoryBarrier(), Interlocked, Monitor, etc)

I believe you're misreading something big. Quoting from Chris Brumme's
post:

http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx:

1. Memory ordering only applies to locations which
can be globally visible or locations that are marked
volatile. Any locals that are not address exposed can be
optimized without using memory ordering as a constraint
since these locations cannot be touched by multiple threads
in parallel.

This implies that loads and stores can be reordered freely when its
non-exposed locals being dealt with. ("Reordered freely" is always
subject to correctness on a single-threaded basis, of course, so there
are limits to this reordering.)

2. Non-volatile loads can be reordered freely.

Now here's something that you seem to have missed: any and all reads
(loads) that aren't volatile can be reordered *freely*, even though the
location being loaded *isn't* local.

3. Every store (regardless of volatile marking) is considered a release.

This was an upgrade in the CLR's memory model over the ECMA memory
model, IIRC, though Intel x86 architecture keeps it ok, but it did need
specifying for IA64. It implies that a write barrier isn't required when
publishing a value, because the CLR (but not ECMA) will handle that
situation automatically.

4. Volatile loads are considered acquire.

And by contrast, from point 2, non-volatile loads are not considered
acquire, so they can freely move backwards in time. This is what makes
your original post to the thread incorrect.

5. Device oriented software may need special programmer care.

I've little to say about devices in CLR

So this would fail test #3 and not allow reorder by JIT.

I believe the rules you write are mistaken and not supported by the
standards or links you post, and in any case, interlocked operations
only affect a *single* thread, not all threads.

-- Barry

Jos Scherders · May 11, 2007

Hi,

One of the posts in this very interesting threat referred to this article :

http://msdn2.microsoft.com:80/en-us/library/ms686355.aspx

This article includes the following piece of code (to fix the race
condition):

volatile int iValue;
volatile BOOL fValueHasBeenComputed = FALSE;
extern int ComputeValue();

void CacheComputedValue()
{
if (!fValueHasBeenComputed)
{
iValue = ComputeValue();
fValueHasBeenComputed = TRUE;
}
}

BOOL FetchComputedValue(int *piResult)
{
if (fValueHasBeenComputed)
{
*piResult = iValue;
return TRUE;
}

else return FALSE;
}

My (complete ignorent) question is this:

Must fValueHasBeenComputed really be declared volatile for the program to
produce correct results ? Isn't it true that , because iValue is declared
volatile, it is impossible that fValueHasBeenComputed = TRUE and the result
of ComputeValue() is not stored in iValue ?
Of course, because fValueHasBeenComputed is not declared volatile it may be
that FetchComputedValue() returns FALSE when in fact iValue does contain the
computed value but the other case can never occur. E.g. FetchComputedValue
returns TRUE and piResult is incorrect.

Are my assumptions correct ?

Jos

Stelrad Kypski · May 12, 2007

not obvious to me. what is it?

Chris Mullins said:
There's a (very obvious) bug in my test.

Please ignore that last post...

--
Chris Mullins, MCSD.NET, MCPD:Enterprise, Microsoft C# MVP
http://www.coversant.com/blogs/cmullins

Chris Mullins said:

There's quite a bit of dicusssion going on with regard to memory models
and interlocked writes and standard reads.

I figured I would write a little program and see if I can make the
problem show up.
The sample app I wrote below does, every once in a while, go "Boom".

This means the memory model is indeed reordering the reads of test and
test2.

My environment for running this test is:
Dual-Core AMD Athlon
Windows XP x64
Compiled in Release Mode, using "Any CPU"
Running without a debugger attached.

I've run the test using "Any CPU", "x86" and "x64" as specific compile
targets. Same result each time.

Sometimes it runs for a long time without failing, other times I get a
ton of failures. For example:
...
Boom : 287884447, 287884446
Boom : 287885420, 287885419
Boom : 287885780, 287885779
Boom : 287886016, 287886015
Boom : 287887440, 287887439
Boom : 287887798, 287887797

using System;
using System.Collections.Generic;
using System.Text;
using System.Threading;

namespace ConsoleApplication4
{
class Program
{
static long test = 0;
static long test2 = 0;

static void Main(string[] args)
{
Thread t1 = new Thread(ReadThreadProc);
Thread t2 = new Thread(ReadThreadProc);
Thread t3 = new Thread(WriteThreadProc);

t1.IsBackground = true;
t2.IsBackground = true;
t3.IsBackground = true;

t1.Start();
t2.Start();
t3.Start();

Console.WriteLine("Press any key to exit");
Console.ReadLine();

}

private static void WriteThreadProc()
{
while (true)
{
Interlocked.Increment(ref test);
Interlocked.Increment(ref test2);
}
}

private static void ReadThreadProc()
{
while (true)
{
long v1 = test;
long v2 = test2;

if (v1 > v2)
Console.WriteLine("Boom : {0}, {1}", v1, v2);
}
}
}
}

Click to expand...

Getting correct value from Interlocked operations?	23	Dec 2, 2006
what does the term 'state data' mean?	3	Dec 20, 2006
How to preserve the value of a variable on the same aspx page?	8	Oct 6, 2004
file class	1	Aug 14, 2007
Selecting x cells to the left of the one you are on	1	Oct 7, 2003
Behaviour depends on the placement of DoCmd Close, acform...	2	Jan 31, 2008
Garbage in the 0th element of a char* array	2	Jun 22, 2005
How to reset the SelectCommand of a SqlDataSource on the client(JavaScript)?	0	Dec 10, 2008

The Interlocked on the Edge of Forever

Jon Skeet [C# MVP]

William Stacey [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

William Stacey [C# MVP]

Jon Skeet [C# MVP]

William Stacey [C# MVP]

Barry Kelly

William Stacey [C# MVP]

Jon Skeet [C# MVP]

William Stacey [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Ben Voigt

Chris Mullins [MVP]

Chris Mullins [MVP]

Jon Skeet [C# MVP]

Barry Kelly

Jos Scherders

Stelrad Kypski

Ask a Question

Similar Threads