When is "volatile" used instead of "lock" ?

P

Peter Ritchie

Sorry for the repost, this seems to only appear in the Microsoft web
side newsgroup front-end:
Jon Skeet said:
It would mean that if it weren't for the other clause which talks about
not moving references past volatile operations. I don't see that clause
12.6.4 can overrule other clauses. For instance, it wouldn't allow you
to use a "technology" which didn't bother to call methods which were
known not to include volatile operations (but might output to the
screen, for instance). Taken entirely in isolation, the start of 12.6.4
sounds like you can do *anything* so long as you don't interfere with
volatile operations and exceptions. So any operations which just change
the values of non-volatile variables can be optimised away completely,
right? No, of course not - that would make a mockery of the whole
framework.

Jon, it took me a while; but here's an example where lock doesn't work
and
you *must* use volatile to get the correct behaviour:
// compile: csc -optimize test.cs
using System;
using System.Threading;

internal class Tester {
public /*volatile*/ bool continueRunning = true;
public void ThreadEntry() {
int count=0;
Object locker = new Object();
lock (locker) {
while (continueRunning) {
count++;
}
}
}
}

public class Program {
static void Main() {
Tester tester = new Tester();
Thread thread = new Thread(new ThreadStart(tester.ThreadEntry));
thread.Name = "Job";
thread.Start();

Thread.Sleep(2000);

tester.continueRunning = false;
}
}

.... uncomment the volatile on continueRunning and it runs as
expected,
terminating after 2 seconds.
 
J

Jon Skeet [C# MVP]

Jon Skeet said:
It appears I was wrong about the language spec not mentioning the
memory model.

Section 17.4.3 talks about volatile fields and release/acquire
semantics, and mentions the lock statement, but *doesn't* have the same
bit as the CLI spec in terms of specifying that acquiring a lock
performs an implicit volatile read and releasing a lock performs an
implicit volatile write.

I'll mail the C# team to see if this can be fixed.

Word back from the C# team:

1) They'll look at making it clearer
2) The spec says that lock is exactly equivalent to calling
Monitor.Enter/Exit, so the CLI rules apply

I'm uncomfortable with the second part, as it's got too much of a tie
between the two specs, but it certainly indicates the intention that
the C# compiler should respect the CLI model.
 
B

Ben Voigt [C++ MVP]

Peter Ritchie said:
Sorry for the repost, this seems to only appear in the Microsoft web
side newsgroup front-end:


Jon, it took me a while; but here's an example where lock doesn't work
and
you *must* use volatile to get the correct behaviour:
// compile: csc -optimize test.cs
using System;
using System.Threading;

internal class Tester {
public /*volatile*/ bool continueRunning = true;
public void ThreadEntry() {
int count=0;
Object locker = new Object();
lock (locker) {
while (continueRunning) {
count++;
}
}
}
}

public class Program {
static void Main() {
Tester tester = new Tester();
Thread thread = new Thread(new ThreadStart(tester.ThreadEntry));
thread.Name = "Job";
thread.Start();

Thread.Sleep(2000);

tester.continueRunning = false;
}
}

... uncomment the volatile on continueRunning and it runs as
expected,
terminating after 2 seconds.

But the lock isn't actually entered or exited in the region of interest.
There would have to be a lock inside the loop.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Jon, it took me a while; but here's an example where lock doesn't work
and you *must* use volatile to get the correct behaviour:

Nah - I'll fix it just using an extra lock.

public class Program {
static void Main() {
Tester tester = new Tester();
Thread thread = new Thread(new ThreadStart(tester.ThreadEntry));
thread.Name = "Job";
thread.Start();

Thread.Sleep(2000);

tester.continueRunning = false;
}
}

... uncomment the volatile on continueRunning and it runs as
expected, terminating after 2 seconds.

But you've violated my conditions for correctness:

<quote>
The situation I've been talking about is where a particular variable is
only referenced *inside* lock blocks, and where all the lock blocks
which refer to that variable are all locking against the same
reference.
</quote>

In particular I said:

<quote>
Now I totally agree that *if* you start accessing the variable from
outside a lock block, all bets are off - but so long as you keep
everything within locked sections of code, all locked with the same
lock, you're fine.
</quote>

but by setting tester.continueRunning outside the lock, you've gone
into the "all bets are off" territory.

Now, we don't want to start trying to acquire the lock that you've
already got: for one thing, the object reference is never available
outside ThreadEntry(), and for a second thing we appear to want to hold
that lock for a long time - we'd end up in a deadlock if the looping
thread held the lock and the thread which was trying to stop the loop
had to wait for the loop to finish before it could acquire the lock, if
you see what I mean.

So, we introduce a new lock to go round every access to
continueRunning. For ease of coding, we'll encapsulate continueRunning
in a property access, so the complete code becomes:

using System;
using System.Threading;

internal class Tester {
bool continueRunning = true;

object continueRunningLock = new object();

public bool ContinueRunning {
get {
lock (continueRunningLock) {
return continueRunning;
}
}
set {
lock (continueRunningLock) {
continueRunning = value;
}
}
}

public void ThreadEntry() {
int count=0;
Object locker = new Object();
lock (locker) {
while (ContinueRunning) {
count++;
}
}
}
}

public class Program {
static void Main() {
Tester tester = new Tester();
Thread thread = new Thread(new ThreadStart
(tester.ThreadEntry));
thread.Name = "Job";
thread.Start();

Thread.Sleep(2000);

tester.ContinueRunning = false;
}
}

It now works without any volatile variables (but plenty of volatile
reads and writes!).
 
P

Peter Ritchie [C#MVP]

Delayed response, I've been away from the newsreader that seems to reliably
post to this thread.

The sample clearly shows that putting a member between Enter/Exit does not
guarantee those members won't have visible side-effects optimized away in
other threads. That was one of the points I made a few days ago; and the
impression I got from you was that you believe compile-time reorders on
members between Enter/Exit could occur. If that's not what you meant to
imply, then we don't disagree on that point and the sample is pointless and
we can ignore that rat hole and "fixing" it is irrelevant.

As I said, I get from the spec. that only the locked object is the volatile
read/write--anything between Enter/Exit is not a volatile read/write. So,
no cross-thread volatility guarantees apply and it falls back to "single
thread of execution" implications and that "there appears to be a hole in
the C# spec", regardless of how correct it *should* be to make anything
possibly cross-thread visible between Enter/Exit volatile, or what a
*particular implementation* is doing. Without clear and unambiguous
documentation, showing a snippet of code does not prove a concept is sound;
only that that snippet of code works in the limited circumstances in which
it is run. (otherwise, I could "prove" cross-thread WinForm data access
"works"). Whereas, it's clear what the "volatile" keyword means.

Besides, I'm not comfortable with compromising an application's vertical
scalability by potentially causing all but one processor to wait while a
member is accessed simply because of volatility concerns. I believe it's
safer to separate dealing with volatility and synchronization; but by your
interpretation the best case is they're equally as safe.

-- Peter
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Delayed response, I've been away from the newsreader that seems to reliably
post to this thread.

The sample clearly shows that putting a member between Enter/Exit does not
guarantee those members won't have visible side-effects optimized away in
other threads.

Other threads that *also* use Enter/Exit (with the same reference)
though? (That *didn't* happen in the broken sample you gave.)
That was one of the points I made a few days ago; and the
impression I got from you was that you believe compile-time reorders on
members between Enter/Exit could occur. If that's not what you meant to
imply, then we don't disagree on that point and the sample is pointless and
we can ignore that rat hole and "fixing" it is irrelevant.

Within Enter/Exit, if there are no other volatile operations involved,
they can indeed by reordered. However, if all other uses of the shared
data involves locking against the same reference, then those
reorderings *won't* be visible in other threads. Consider this sequence
of operations:

A
B
C
D

If A acquired a lock and D releases it, then any thread aquiring the
same lock will only be able to see the results of B and C *after* D has
occurred, so it doesn't matter to that thread whether B occurs before
or after C.
As I said, I get from the spec. that only the locked object is the volatile
read/write--anything between Enter/Exit is not a volatile read/write.

Indeed - and I'd never claimed anything else.
So, no cross-thread volatility guarantees apply and it falls back to "single
thread of execution" implications

No, the same cross-thread volatility guarantees as always apply - a
read can't be reordered to before a volatile read - in this case the
implicit volatile read involved in acquiring a lock.
and that "there appears to be a hole in
the C# spec", regardless of how correct it *should* be to make anything
possibly cross-thread visible between Enter/Exit volatile, or what a
*particular implementation* is doing.

No, there's no need to prohibit reorders within a lock.
Without clear and unambiguous
documentation, showing a snippet of code does not prove a concept is sound;
only that that snippet of code works in the limited circumstances in which
it is run. (otherwise, I could "prove" cross-thread WinForm data access
"works"). Whereas, it's clear what the "volatile" keyword means.

Well, I don't think I've seen anyone else read the spec in the way you
do, and I've seen *lots* of people (including threading experts) use
locks to achieve thread safety in the way that I do.

Is that as good as having a spec which is so unambiguous that there's
only one possible reading? Certainly not. However, with the spec the
way it is, I'll choose to read it in the way that:

1) Is supported by all current runtimes
2) Seems to me to be the way that it's read by experts in the field
3) Allows me to not have volatile performance penalties for code which
doesn't need to share data
4) Allows me to safely share doubles and other types which can't have
the volatile modifier applied to them
5) Allows me to have a simpler mental model for most multi-threaded
development
Besides, I'm not comfortable with compromising an application's vertical
scalability by potentially causing all but one processor to wait while a
member is accessed simply because of volatility concerns.

Whereas I'm not comfortable with putting *potential* performance
considerations above the relative simplicity of having fewer rules when
it comes to implementing thread-safe code.

You already have to know about locking in order to deal with the common
situation where you're operating on more than one piece of data and
don't want to have race conditions. For 99% of the time, I just need to
consider the rule of "don't access shared data outside a lock".

How do you cope with sharing double values in a thread-safe way if you
don't believe locks are enough, by the way? Or do you believe it's
impossible?

Note that from a performance point of view, the "lock when you need
to" case means that I can use a type which *wasn't* implemented using
volatile variables from a type which requires thread-safety. If you
were right, all types should make *everything* volatile just in case
you ever need to use it in a multi-threaded context. At that point
anything *not* needing to share the data has to pay a performance
penalty.
I believe it's safer to separate dealing with volatility and
synchronization; but by your interpretation the best case is they're
equally as safe.

They're equally as safe until you change

number = 1;

to

number++;

at which point volatility isn't good enough, but locking is. In at
least one post in this thread you accidentally gave an example using
"number++" as if it were safe to do so. How sure are you that you
haven't done the same thing in real code?
 
P

Peter Ritchie [C#MVP]

Jon Skeet said:
Well, I don't think I've seen anyone else read the spec in the way you
do, and I've seen *lots* of people (including threading experts) use
locks to achieve thread safety in the way that I do.

It must be me then.

I've been trying to get to your assertion of no JIT optimizations of members
between Enter/Exit because of "volatility guarantees" in the CLI spec. and
the only thing in the spec. that could have done it for me was if everything
between Enter/Exit were considered volatile operations-which is probably why
I erroneously got that you were implying that.

If only visible side-effects can't be reordered, and "...only volatile
operations constitute visible side-effects...", and what's between
Enter/Exit isn't implicitly volatile (then 12.6.4 para 1 wouldn't apply) or
explicitly volatile (and 12.6.7 and 12.6.5 point 3 wouldn't apply), I can't
get to no JIT re-ordering of member fields whose effects would leak out of
an Enter/Exit block.

Ignoring for a moment that 12.6.7 is explicitly talking about the volatile.
prefix and IL between Enter/Exit is not considered volatile and therefore
not generated with this prefix, I can't get to "a read can't be reordered
before a volatile read" from 12.6.7. para 2, with respect to JIT
optimizations. This is mostly due to my reading of "...occur prior to any
references to memory..." applies only to flushing writes to RAM and not JIT
optimizations, because of the consequences. If it doesn't just apply to
flushing writes to RAM and is interpreted as "...any references to *any*
memory..." and not "...any references to *the* memory..." (referring to the
memory used in the volatile operation) then given:

volatile int value;
void int Calculate()
{
int i;
i = 0xA;
value = i;
i = 3;
return i;
}

Calculate() could not be JIT optimized to:

void int Calculate()
{
value = 0xA;
return 3;
}

....because reading it as "...any references to *any* memory..." means local
assignments before and after "value = 20" (being a volatile operation) are
technically references to *any* memory. It would also suggest that the
introduction of a Enter/Exit would cause any reference to any memory to be
unoptimized because in the general case the JIT can't know what will and
won't be called before or after that Enter/Exit. If 12.6.7. para 2 applies
only to flushings of current writes to RAM and volatility of only memory
used in the volatile operation, 12.6.7 para 2 doesn't apply to the
reads/writes within Enter/Exit and there are no reordering guarantees of
anything not volatile by another means between Enter/Exit. This falls back
to single thread of execution guarantees or overriding that with the
"volatile" keyword.

Plus, added details like the Enter/Exit volatility applies only if a
publicly visible object is used as the lock object, just isn't in the spec.

I'm not disputing what a particular implementation is doing or whether what
is doing is safer or not. I'm also not disputing the flushing of writes to
RAM (and never was, I was always separating JIT optimizations).

Anyway, you haven't pointed to anything in the spec. to clarify these
ambiguities other than suggest potential problems if it's not interpreted
the same way as you; problems I'm not disputing. I never said writing MT
code was easy. I don't think the framework is doing what it's doing because
of those clauses, so it's somewhat moot (so, I haven't responded to your
other points, but will if you're interested) unless you want to target
different platforms.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
It must be me then.

I've been trying to get to your assertion of no JIT optimizations of members
between Enter/Exit because of "volatility guarantees" in the CLI spec. and
the only thing in the spec. that could have done it for me was if everything
between Enter/Exit were considered volatile operations-which is probably why
I erroneously got that you were implying that.

It only makes sense to think that I was implying that if I had actually
made an assertion of "no JIT optimizations of members between
Enter/Exit".
If only visible side-effects can't be reordered, and "...only volatile
operations constitute visible side-effects...", and what's between
Enter/Exit isn't implicitly volatile (then 12.6.4 para 1 wouldn't apply) or
explicitly volatile (and 12.6.7 and 12.6.5 point 3 wouldn't apply), I can't
get to no JIT re-ordering of member fields whose effects would leak out of
an Enter/Exit block.

I strongly believe the "visible side-effects" clause (12.6.4 IIRC - I
don't have time to check the spec right now) is a red herring, for
reasons I've pointed out before. It's not meant to contradict 12.6.7.
Ignoring for a moment that 12.6.7 is explicitly talking about the volatile.

No, 12.6.7 *does* talk about the volatile prefix, but not *only* about
the volatile prefix. It's talking about volatile *operations*.
prefix and IL between Enter/Exit is not considered volatile and therefore
not generated with this prefix, I can't get to "a read can't be reordered
before a volatile read" from 12.6.7. para 2, with respect to JIT
optimizations. This is mostly due to my reading of "...occur prior to any
references to memory..." applies only to flushing writes to RAM and not JIT
optimizations, because of the consequences.

No - the spec doesn't (and shouldn't) talk about the differences
between JIT optimizations and CPU optimizations. It just makes
assertions about what the overall effect of a program can be.
If it doesn't just apply to
flushing writes to RAM and is interpreted as "...any references to *any*
memory..." and not "...any references to *the* memory..." (referring to the
memory used in the volatile operation) then given:

volatile int value;
void int Calculate()
{
int i;
i = 0xA;
value = i;
i = 3;
return i;
}

Calculate() could not be JIT optimized to:

void int Calculate()
{
value = 0xA;
return 3;
}

As I've said before, I believe the spec *should* talk about stack
references separately from heap references. There may be some bit of
the spec I've missed, but I'll acknowledge that I can't point to it
right now.
...because reading it as "...any references to *any* memory..." means local
assignments before and after "value = 20" (being a volatile operation) are
technically references to *any* memory. It would also suggest that the
introduction of a Enter/Exit would cause any reference to any memory to be
unoptimized because in the general case the JIT can't know what will and
won't be called before or after that Enter/Exit. If 12.6.7. para 2 applies
only to flushings of current writes to RAM and volatility of only memory
used in the volatile operation, 12.6.7 para 2 doesn't apply to the
reads/writes within Enter/Exit and there are no reordering guarantees of
anything not volatile by another means between Enter/Exit. This falls back
to single thread of execution guarantees or overriding that with the
"volatile" keyword.

Plus, added details like the Enter/Exit volatility applies only if a
publicly visible object is used as the lock object, just isn't in the spec.

I've never said that it only applies if it's a publicly visible object.
The problem is that if you use a reference which no other thread can
lock on, then you can't apply the rest of my reasoning (namely that no
other thread will be reading/writing the value while you hold the
lock).
I'm not disputing what a particular implementation is doing or whether what
is doing is safer or not. I'm also not disputing the flushing of writes to
RAM (and never was, I was always separating JIT optimizations).

Anyway, you haven't pointed to anything in the spec. to clarify these
ambiguities other than suggest potential problems if it's not interpreted
the same way as you; problems I'm not disputing. I never said writing MT
code was easy. I don't think the framework is doing what it's doing because
of those clauses, so it's somewhat moot (so, I haven't responded to your
other points, but will if you're interested) unless you want to target
different platforms.

I think it *is* an important academic question though. If my style of
threading is unsafe, then I suspect the vast majority of supposedly
thread-safe code in the world (including stuff in the framework) is
also unsafe. It also means that no program can ever use doubles in a
theoretically thread-safe fashion.

One thing it may be worth considering is what the authors of the spec
*intended*. If everyone agrees on that, then at least the spec can
hopefully be improved in the future to reflect it. In fact, I think
there's already a problem with the spec which I haven't brought up
before because I didn't want to muddy the waters. Code like this:

int memberVariable;
.....

int a;
lock (someReference)
{
a = memberVariable;
}
Console.WriteLine(a);

could, by my understanding of the memory model, but reordered to:

int a;
lock (someReference)
{
}
a = memberVariable;
Console.WriteLine(a);

The reason for this is that the read is being moved later than the
volatile write rather than earlier than the volatile read - and that's
not forbidden at all. This would muck things up significantly (in
particular, you couldn't change two variables in a way which was
guaranteed to appear to be atomic).
 
P

Peter Ritchie [C#MVP]

It only makes sense to think that I was implying that if I had actually
made an assertion of "no JIT optimizations of members between
Enter/Exit".
Agreed, you never explicitly said that.

So, you're not asserting there can be no JIT optimizations of members
between
Enter/Exit that aren't explicitly involved in a volatile operation, and
you're agreeing that:
int memberVariable;
//....


int a;
lock (someReference) {
a = memberVariable;
}

Console.WriteLine(a);

could, by my understanding of the memory model, but reordered to:

int a;
lock (someReference) {

}


a = memberVariable;
Console.WriteLine(a);

....which is what I was trying to get at originally with "the lock statement
surrounding access to a member doesn't stop the compiler from having
optimized use of a member by caching it to a register"...yes, the compiler
*could* assume that all members within the lock statement block are likely
accessible by multiple threads (implicit volatile); but that's not its
intention and it's certainly not documented as doing that". To which, your
response was quoting 335 12.6.5's "Acquiring a lock
(System.Threading.Monitor.Enter or entering a synchronized method) shall
implicitly perform a volatile read operation, and releasing a lock
(System.Threading.Monitor.Exit or leaving a synchronized method) shall
implicitly perform a volatile write operation." In fairness, by "lock
statement surrounding access to a member" I was intending a member within
the block (i.e. lock(...){member=something;}) not the reference being
locked; so I can see how the conversation side-tracked (I don't dispute the
reference sent to Monitor.Enter/Monitor.Exit is considered volatile, but
you'd think the IL generated for "lock" would have instructions with a
volatile prefix). This made me incorrectly think you were implying all
member access between Monitor.Enter and Monitor.Exit was volatile and not
subject to JIT optimizations. But, I think we agree we were talking about
different things.

A variation in your example would be

int a;
lock(someReference) {
memberVariable = a;
}
Console.WriteLine(memberVariable);

being optimized to

int a
lock(someReference){
}
memberVariable = a;
Console.WriteLine(memberVariable);

....which shouldn't occur if memberVariable were declared with "volatile".
One thing it may be worth considering is what the authors of the spec
*intended*. If everyone agrees on that, then at least the spec can
hopefully be improved in the future to reflect it.

Well, it may be moot at this point what was intended in the spec. I doubt
the .NET JIT can change what it's currently doing should "what's intended"
be different that what was implemented. But, I do agree. I think it's
vital to have a coherent unambigious quantifiable and qualifiable spec so
"compliance" means something so developers *can* develop truly platform
independant code. (*can* because they'll always be able to incorrectly
write code to works on only one platform). Where platform is as granular as
OS/architecture combination.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Agreed, you never explicitly said that.

So, you're not asserting there can be no JIT optimizations of members
between
Enter/Exit that aren't explicitly involved in a volatile operation, and
you're agreeing that:


...which is what I was trying to get at originally with "the lock statement
surrounding access to a member doesn't stop the compiler from having
optimized use of a member by caching it to a register"

Well, that's not caching (which would be an *earlier* read) - it's
delaying. However, I've just reread the spec, and it doesn't just say
that *writes* can't be moved past a volatile write - it says that *no*
memory references can move to later than a volatile write or earlier
than a volatile read.

In other words, I withdraw the concerns I expressed in the previous
post. :)
...yes, the compiler
*could* assume that all members within the lock statement block are likely
accessible by multiple threads (implicit volatile); but that's not its
intention and it's certainly not documented as doing that". To which, your
response was quoting 335 12.6.5's "Acquiring a lock
(System.Threading.Monitor.Enter or entering a synchronized method) shall
implicitly perform a volatile read operation, and releasing a lock
(System.Threading.Monitor.Exit or leaving a synchronized method) shall
implicitly perform a volatile write operation." In fairness, by "lock
statement surrounding access to a member" I was intending a member within
the block (i.e. lock(...){member=something;})

Yes, member access within the lock is what I was talking about too.
not the reference being
locked; so I can see how the conversation side-tracked (I don't dispute the
reference sent to Monitor.Enter/Monitor.Exit is considered volatile, but
you'd think the IL generated for "lock" would have instructions with a
volatile prefix). This made me incorrectly think you were implying all
member access between Monitor.Enter and Monitor.Exit was volatile and not
subject to JIT optimizations. But, I think we agree we were talking about
different things.

Hmm... not sure given what you've written later on.
A variation in your example would be

int a;
lock(someReference) {
memberVariable = a;
}
Console.WriteLine(memberVariable);

being optimized to

int a
lock(someReference){
}
memberVariable = a;
Console.WriteLine(memberVariable);

...which shouldn't occur if memberVariable were declared with "volatile".

It can't occur within the spec either though. That's moving one write
(the one to memberVariable) to *after* the lock is released. That's
prohibited.

I *thought* that the following optimisation could occur - but having
rechecked the spec, I'm happy that it can't.

int a = ...;
memberVariable = a;
lock (someReference)
{
}
Well, it may be moot at this point what was intended in the spec. I doubt
the .NET JIT can change what it's currently doing should "what's intended"
be different that what was implemented. But, I do agree. I think it's
vital to have a coherent unambigious quantifiable and qualifiable spec so
"compliance" means something so developers *can* develop truly platform
independant code. (*can* because they'll always be able to incorrectly
write code to works on only one platform). Where platform is as granular as
OS/architecture combination.

I called for such an unambiguous spec (in reference to a *really,
really* odd reading of it) a while ago. The responses from Joe Duffy
and Joel Pobar are enlightening about the .NET 2.0 memory model (as
opposed to the ECMA spec model):

http://msmvps.com/blogs/jon.skeet/archive/2006/11/26/the-cli-memory-
model-and-specific-specifications.aspx

In particular, there's a reference to an article by Vance Morrison
(http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/)

which includes the following when describing the ECMA spec:

<quote>
1. Reads and writes cannot move before a volatile read.
2. Reads and writes cannot move after a volatile write.
</quote>

It also has this just before the ECMA description:

<quote>
This is one more place where the locking protocol really adds value.
The protocol ensures that every access to thread-shared, read/write
memory only occurs when holding the associated lock. When a thread
exits the lock, the third rule ensures that any writes made while the
lock was held are visible to all processors. Before the memory is
accessed by another thread, the reading thread will enter a lock and
the second rule ensures that the reads happen logically after the lock
was taken. While the lock is held, no other thread will access the
memory protected by the lock, so the first rule ensures that during
that time the program will behave as a sequential program.

The result is that programs that follow the locking protocol have the
same behavior on any memory model having these three rules. This is an
extremely valuable property. It is hard enough to write correct
concurrent programs without having to think about the ways the compiler
or memory system can rearrange reads and writes. Programmers who follow
the locking protocol don't need to think about any of this. Once you
deviate from the locking protocol, however, you must specify and
consider what transformations hardware or a compiler might do to reads
and writes.
</quote>

Interestingly the three rules he refers to only talk about *reads*
moving before entering a lock and *writes* moving after exiting a lock
- in other words, the slightly looser behaviour I was worried about!
However, it makes it pretty clear that using this "locking protocol" is
and always has been intended to be safe. (Given the earlier discussion
of caching, I'm not sure that reads being delayed and writes being
advanced is even considered as a possibility.)
 
P

Peter Ritchie [C#MVP]

Jon Skeet said:
Well, that's not caching (which would be an *earlier* read) - it's
delaying.

Maybe a poor choice of an overload term, but I did say "compiler from having
optimized use a member by caching it to a register".
However, I've just reread the spec, and it doesn't just say
that *writes* can't be moved past a volatile write - it says that *no*
memory references can move to later than a volatile write or earlier
than a volatile read.

....with regard to "acquire semantics" and "release semantics". So 12.6.7
make sense with regard only to processor cachings. Everything I've read
discusses "acquire semantics" and "release semantics" only in the context of
processor caching, not compiler (JIT or otherwise) optimizations.

Joe Duffy's blog on broken double-checked locking [1] describes "ensuring
writes have 'release' semantics on IA-64, via the st.rel instruction. A
single st.rel x guarantees that any other loads and stores leading up to its
execution (in the physical instruction stream) must have appeared to have
occurred to each logical processor at least by the time x's new value
becomes visible to another logical processor." Clearly a CPU instruciton
cannot have an effect on what the JIT does and does not optimize. Joe's
mention of the physical instruction stream only deals with the processor's
caching of writes in relation to that stream.

http://msdn2.microsoft.com/EN-US/library/aa490209.aspx
Discusses the acquire semantics of specific Win32 functions, nothing to do
with compiler optimizations and no existing native compiler I know of
changes it's optimization behaviour in the presence of those Interlocked
functions.

http://msdn2.microsoft.com/en-us/library/ms686355.aspx
Details that prior to VC 2003 "volatile" had no acquire/release semantics
and only dealt with compiler optimizations.

If you take 12.6.7 para 2's "after any memory references" to mean anything
other than processor cachings, as soon as you introduce Monitor.Enter or
Monitor.Exit (or volatile) you can't optimize anything, even locals. That's
clearly not the intention of that paragraph. That paragraph says almost the
same thing as Joe's blog with respect to st.rel and ld.acq, which only
applies to processor caching. That paragraph is the only place 335 doesn't
associate the volatile. prefix when it talks about what volatile reads and
writes do. There's the almost casual mention of Enter and Exit being an
implicite volatile read and write respectively; but that also makes perfect
sense if you're only discussing processor caching.

Relating to origins, the C++ volatile keyword in VC++ never dealt with
acquire or release semantics until VC++ 2005. Prior to that, for the past
30+ years it has been used only to tell the compiler not to optimize that
identifier.

It's not a huge stretch to think of 335:12.6 only in the context of
processor caching. All native Windows synchronization primitives deal with
"acquire semantics" and "release semantics" in the context of processor
caching only, a native compiler simply can't know what it should and should
not optimize based essentially unrelated function calls. MT code (including
the framework) is written with these issues.

It can't occur within the spec either though. That's moving one write
(the one to memberVariable) to *after* the lock is released. That's
prohibited.

Only if you make the leap that "acquire semantics" and "release semantics"
don't refer to anything other than processor caching. The spec make
complete sense if all it's talking about is processor caching with regard to
acquire/relese semantics. The processor does reorder memory accesses, and
similar to Joe's blog, in relation to the instruction stream. The mere
mention of the CIL instruction sequence in 335 doesn't imply that
I called for such an unambiguous spec (in reference to a *really,
really* odd reading of it) a while ago. The responses from Joe Duffy
and Joel Pobar are enlightening about the .NET 2.0 memory model (as
opposed to the ECMA spec model):

I would agree and would join you on a renewed call...

[1]
http://www.bluebytesoftware.com/blog/PermaLink,guid,543d89ad-8d57-4a51-b7c9-a821e3992bf6.aspx
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Maybe a poor choice of an overload term, but I did say "compiler from having
optimized use a member by caching it to a register".
True,.


...with regard to "acquire semantics" and "release semantics". So 12.6.7
make sense with regard only to processor cachings. Everything I've read
discusses "acquire semantics" and "release semantics" only in the context of
processor caching, not compiler (JIT or otherwise) optimizations.

Once again, there's nothing in the spec which distinguishes the two.
Yes, a lot of pages describing the details go into more detail about
different CPU architectures, but please say where in the spec it says
"You can't rely on any of this as the overall semantics of the program,
because the JIT and the CPU are separate."

The JIT needs to take account of what CPU it's running on in order to
make sure that the overall semantics are correct, that's all.
http://msdn2.microsoft.com/EN-US/library/aa490209.aspx
Discusses the acquire semantics of specific Win32 functions, nothing to do
with compiler optimizations and no existing native compiler I know of
changes it's optimization behaviour in the presence of those Interlocked
functions.

http://msdn2.microsoft.com/en-us/library/ms686355.aspx
Details that prior to VC 2003 "volatile" had no acquire/release semantics
and only dealt with compiler optimizations.

You shouldn't try to reason about the CLR term "volatile" with
reference to what it means outside the CLI, in my view.
If you take 12.6.7 para 2's "after any memory references" to mean anything
other than processor cachings, as soon as you introduce Monitor.Enter or
Monitor.Exit (or volatile) you can't optimize anything, even locals. That's
clearly not the intention of that paragraph. That paragraph says almost the
same thing as Joe's blog with respect to st.rel and ld.acq, which only
applies to processor caching. That paragraph is the only place 335 doesn't
associate the volatile. prefix when it talks about what volatile reads and
writes do. There's the almost casual mention of Enter and Exit being an
implicite volatile read and write respectively; but that also makes perfect
sense if you're only discussing processor caching.

But if you don't read the spec as overall semantics, it's entirely
useless.
Relating to origins, the C++ volatile keyword in VC++ never dealt with
acquire or release semantics until VC++ 2005. Prior to that, for the past
30+ years it has been used only to tell the compiler not to optimize that
identifier.

Yup - but again, that doesn't alter what the spec says.
It's not a huge stretch to think of 335:12.6 only in the context of
processor caching.

That's where we disagree. I believe that if the spec says that X will
happen and Y doesn't happen, then if I can show Y happening and X not
happening on a particular implementation, then that implementation is
*broken* - it doesn't conform with the spec.
Only if you make the leap that "acquire semantics" and "release semantics"
don't refer to anything other than processor caching.

No - I'm saying they refer to the overall effect - that you should view
the spec as an absolute in terms of what's allowed to occur and what's
not, regardless of how that's achieved.
The spec make complete sense if all it's talking about is processor
caching with regard to acquire/relese semantics. The processor does
reorder memory accesses, and similar to Joe's blog, in relation to
the instruction stream. The mere mention of the CIL instruction
sequence in 335 doesn't imply that

If any CLI implementation allows the processor to reorder instructions
in a way which means that 12.6.7 can't be relied upon for visible
effects, that implementation is broken.

Did you read the section of Grant's page about the locking protocol, by
the way?
 
P

Peter Ritchie [C#MVP]

Once again, there's nothing in the spec which distinguishes the two.
Yes, a lot of pages describing the details go into more detail about
different CPU architectures, but please say where in the spec it says
"You can't rely on any of this as the overall semantics of the program,
because the JIT and the CPU are separate."

Lack of a clause does not make the opposite a spec. requirement. And why
shouldn't it address the two separately? But, the spec *does* distinguish
the two, The whole 12.6.4 section is about optimization and it only
describes single thread of execution guarantees, and makes no mention of
"semantics" ("acquire" or "release"). Everything else you quote is outside
of 12.6.4 and is always in the context of "acquire semantics" or "release
semantics" (ergo in the context of processor caching). The other sections in
12.6 relevant to our conversation deal with locking (and its relationship to
volatile operations) and how volatile operations affect the processor's
cache. 12.6.4 doesn't come out and say the "JIT compiler" but it does say
"the CLI", which can't mean the C#-to-IL compiler and without a JIT we can't
get from IL to native instructions.
The JIT needs to take account of what CPU it's running on in order to
make sure that the overall semantics are correct, that's all.

Agreed, but irrelevant. I'm discussing what is documented as "compliancy"
requirements for that JIT. I agree it needs to be in the spec. and am
willing to view "the CLI" as "The JIT" for the purposes of what is required
of the JIT.
You shouldn't try to reason about the CLR term "volatile" with
reference to what it means outside the CLI, in my view.

Yes, not to the CLR/CLI. But with regard to C#, reusing the keyword
"volatile" in C# pulls that baggage in. If "volatile" in C# wasn't intended
to be the same as "volatile" in C++ it should have been named differently.
If the goal was to ease compilation of C++ code, then they've done
programmers a hell of a disservice. It also goes to MT issues outside of
..NET 2.0 and that the issues of compiler optimizations exist and can be
dealt with (although much more difficultly) in languages that support
volatility. Some language don't, yes, and they fail miserably at MT; but
that's a separate issue. And I'll admit it's anecdotal to my point; just as
your external references are.

It also brings in the issue that use of "volatile" for variables despite
always being synchronized is common-place and shows separation of volatility
and synchronization, and shouldn't be considered abhorrent in .NET 2.0.
Other than your lock protocol of
only-access-that-member-within-a-lock-block-locked-on-the-same-object,
volatility must be addressed through the "volatile" (or always use
Thread.Volatile* or Threading.Interlocked.*, which I don't recommend unless
"volatile" doesn't apply) in .NET 2.0.
If you take 12.6.7 para 2's "after any memory references" to mean anything
other than processor cachings, as soon as you introduce Monitor.Enter or
Monitor.Exit (or volatile) you can't optimize anything, even locals.
That's
clearly not the intention of that paragraph. That paragraph says almost
the
same thing as Joe's blog with respect to st.rel and ld.acq, which only
applies to processor caching. That paragraph is the only place 335
doesn't
associate the volatile. prefix when it talks about what volatile reads and
writes do. There's the almost casual mention of Enter and Exit being an
implicite volatile read and write respectively; but that also makes
perfect
sense if you're only discussing processor caching.
But if you don't read the spec as overall semantics, it's entirely
useless.

If you do, most of the statements in 12.6 are contradictory and nonsensical
and no more useful. You can't seriously tell me that reference to *any*
unrelated memory is guaranteed to execute before a virtual operation?
Clearly, and makes sense if, that's discussing processor caching and not
optimization restrictions. e.g.

void Method() {
this.memberNumber = 10;
lock(this.syncObject)
{
String temp = this.memberString;
this.memberString = "Frank";
//...
}
}

Who cares if the processor instruction to assign 10 to MemberNumber is
executed before or after the lock? Dealing strictly with processor caching
in 12.6.7 para 2, of course if the processor cache is flushed at a virtual
operation it "...is guaranteed to occur prior to any references to memory
that occur after the [operation's] instruction in the CIL instruction
sequence." and any cached writes to this.memberString are flushed before
it's value is assigned to "temp". Notice there's nothing about when an
instruction is executed in 12.6.7 para 2.
Yup - but again, that doesn't alter what the spec says.

Agreed, and neither does any other external reference.
That's where we disagree. I believe that if the spec says that X will
happen and Y doesn't happen, then if I can show Y happening and X not
happening on a particular implementation, then that implementation is
*broken* - it doesn't conform with the spec.

I agree we disagree (I over generalized with "12.6" I should have said
12.6.4-12.6.8 inclusive, the rest of 12.6 doesn't apply to what we're
talking about), but you haven't referenced anything in the spec that
definitively backs up your opinion. Whereas I believe I've clearly shown
that the 12.6 is only clear and unambiguous if you separate optimizations
from processor caching (and define "acquire semantics" and "release
semantics" as applying only to processor caching) and say that 12.6.4 is the
only section affecting JIT optimizations.

I agree there are pitfalls to MT programming with my view of the spec, but
no more than any other framework. Your view doesn't eliminate pitfalls to
MT programming in .NET 2.0. My view simply means you must consider
volatility, and separately from synchronization and declare members accessed
by multiple threads with "volatile" or with "Volatile{Read|Write}", just
like most other languages with optimizing compilers.
No - I'm saying they refer to the overall effect - that you should view
the spec as an absolute in terms of what's allowed to occur and what's
not, regardless of how that's achieved.

And I disagree 335 is making that implication and that it can be viewed as
absolute with your interpretation; otherwise you have to interpret 12.2.7
para 2 as not really meaning *all* references to memory, with regard to CIL
instructions, from "any references memory", for example.
The spec make complete sense if all it's talking about is processor
caching with regard to acquire/relese semantics. The processor does
reorder memory accesses, and similar to Joe's blog, in relation to
the instruction stream. The mere mention of the CIL instruction
sequence in 335 doesn't imply that

12.6.7 makes no mention of not reordering instructions and is completely
unambiguous if it's viewed only in the context of processor caching.
Did you read the section of Grant's page about the locking protocol, by
the way?

Yes, I have, and it's a wonderful read on the Microsoft 2.0 .NET
implementation. If that were in the spec we wouldn't be having this
conversation, not that I believe Vance's article is normatively worded.
Your assertions are about guarantees in 335. If you have to refer to
external documents then you're basically agreeing with me that 335 isn't
clear with respect to JIT optimizations (or there aren't cross-threading
restrictions on JIT optimizations and your locking protocol requires the
addition of "volatile" or "Thread.Volatile{Read|Write}").

Even Joe Duffy describes Vance's article as a "reference on the 2.0 memory
model, as implemented". If all you're writing for are Windows platforms,
you can be reasonably comfortable that your locking protocol is backed up by
Vance's description; but you can't say there are spec guarantees for it.



__________ NOD32 2390 (20070710) Information __________

This message was checked by NOD32 antivirus system.
http://www.eset.com
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Lack of a clause does not make the opposite a spec. requirement.

But there's already a clause defining the behaviour - 12.6.7. Why are
you assuming it only applies to *part* of the behaviour rather than the
overall system behaviour?
And why shouldn't it address the two separately?

It *could* do, but I don't see that it does.
But, the spec *does* distinguish the two

Where? In particular, where *exactly* does it state that clause 12.6.7
only applies to the JIT, or only applies to the CPU? Surely unless
there is something to explicitly abdicate responsibility from one area,
clauses should apply to overall system behaviour - otherwise the spec
has no value.
The whole 12.6.4 section is about optimization and it only
describes single thread of execution guarantees, and makes no mention of
"semantics" ("acquire" or "release"). Everything else you quote is outside
of 12.6.4 and is always in the context of "acquire semantics" or "release
semantics" (ergo in the context of processor caching). The other sections in
12.6 relevant to our conversation deal with locking (and its relationship to
volatile operations) and how volatile operations affect the processor's
cache. 12.6.4 doesn't come out and say the "JIT compiler" but it does say
"the CLI", which can't mean the C#-to-IL compiler and without a JIT we can't
get from IL to native instructions.

But the CLI's system behaviour is surely defined by the combination of
the JIT *and* the CPU. Put it this way: on a CPU which didn't provide
any atomicity guarantees itself, I wouldn't expect the CLR to say "Oh,
never mind - the clause on atomic reads and writes (12.6.6) just
doesn't apply". The implementation would have to work round the issue
somehow, to make the overall system behaviour comply with the spec.
Agreed, but irrelevant.

How can it be irrelevant?
I'm discussing what is documented as "compliancy"
requirements for that JIT. I agree it needs to be in the spec. and am
willing to view "the CLI" as "The JIT" for the purposes of what is required
of the JIT.

*You're* willing to view it that way - but *I'm* only willing to view
it as overall system behaviour.

In other words, suppose there were two CPU architectures which were
identical except for the guarantees that their own memory models made.
I'm saying that a CLI implementation whose JIT generated the same code
for both architectures could well be compliant on one architecture and
not compliant on the other. Do you disagree with that? If not, how can
the CPU be irrelevant?
Yes, not to the CLR/CLI. But with regard to C#, reusing the keyword
"volatile" in C# pulls that baggage in. If "volatile" in C# wasn't intended
to be the same as "volatile" in C++ it should have been named differently.

It's *not* the same in C# as it is in C++. It's simply not - and I
don't see any reason to believe they *are* the same.

As another example of this, consider "char" - does that mean the same
in C# and in C++? Certainly not - and any reading of the specification
which assumed they were the same would be incorrect.
If the goal was to ease compilation of C++ code, then they've done
programmers a hell of a disservice. It also goes to MT issues outside of
.NET 2.0 and that the issues of compiler optimizations exist and can be
dealt with (although much more difficultly) in languages that support
volatility. Some language don't, yes, and they fail miserably at MT; but
that's a separate issue. And I'll admit it's anecdotal to my point; just as
your external references are.

Which external references do you think I'm relying on, other than how
other people (including the people who really should know!) read the
spec we're discussing?
It also brings in the issue that use of "volatile" for variables despite
always being synchronized is common-place and shows separation of volatility
and synchronization, and shouldn't be considered abhorrent in .NET 2.0.
Other than your lock protocol of
only-access-that-member-within-a-lock-block-locked-on-the-same-object,
volatility must be addressed through the "volatile" (or always use
Thread.Volatile* or Threading.Interlocked.*, which I don't recommend unless
"volatile" doesn't apply) in .NET 2.0.

Other than the lock protocol, I agree - but I don't see any reason to
abandon the lock protocol, nor do I believe you've presented any
evidence to suggest it isn't guaranteed to work according to the spec.
If you do, most of the statements in 12.6 are contradictory and nonsensical
and no more useful. You can't seriously tell me that reference to *any*
unrelated memory is guaranteed to execute before a virtual operation?

Define "virtual operation". If you mean something like "invocation of a
virtual member" then I'd certainly expect any writes occurring in the
IL stream before the invocation to be guaranteed to occur before the
operation, and any reads after the invocation to be guaranteed to occur
after the operation. After all, that's what the spec says.
Clearly, and makes sense if, that's discussing processor caching and not
optimization restrictions.

Not sure what you mean here, but I suspect I disagre...
e.g.

void Method() {
this.memberNumber = 10;
lock(this.syncObject)
{
String temp = this.memberString;
this.memberString = "Frank";
//...
}
}

Who cares if the processor instruction to assign 10 to MemberNumber is
executed before or after the lock?

If it's executed after the lock is released, then another thread which
acquires the lock could see the write to memberString but not the write
to memberNumber, contrary to the spec.
Dealing strictly with processor caching
in 12.6.7 para 2, of course if the processor cache is flushed at a virtual
operation it "...is guaranteed to occur prior to any references to memory
that occur after the [operation's] instruction in the CIL instruction
sequence." and any cached writes to this.memberString are flushed before
it's value is assigned to "temp". Notice there's nothing about when an
instruction is executed in 12.6.7 para 2.

I don't see your point, I'm afraid. As far as I can see, 12.6.7 is
still guaranteeing that the locking protocol will work.
Agreed, and neither does any other external reference.

I don't believe any external reference is really needed for the spec to
guarantee locking. However, I believe it's helpful to see how other
people interpret the same spec, but *not* helpful to bring in specs for
different platforms.
I agree we disagree (I over generalized with "12.6" I should have said
12.6.4-12.6.8 inclusive, the rest of 12.6 doesn't apply to what we're
talking about), but you haven't referenced anything in the spec that
definitively backs up your opinion.

So you'd want to see another clause that says, "By the way, we really
did mean clause 12.6.7, just in case you were wondering"? I don't see
why. Clause 12.6.7 guarantees that in an instruction sequence of:

Volatile read
Memory access
Memory access
Memory access
Volatile write

the three memory accesses can be reordered between the volatile
operations without violation of 12.6.7 (subject to other clauses) but
none of the memory accesses can effectively occur before the volatile
read or after the volatile write.

That, along with "acquiring a lock counts as a volatile read" and the
equivalent for releasing a lock, is all that's required for the locking
protocol to be guaranteed to work.
Whereas I believe I've clearly shown
that the 12.6 is only clear and unambiguous if you separate optimizations
from processor caching (and define "acquire semantics" and "release
semantics" as applying only to processor caching) and say that 12.6.4 is the
only section affecting JIT optimizations.

I see no evidence to suggest that the whole section shouldn't be
applied to the system as a whole. Without that, the spec is worthless.
I agree there are pitfalls to MT programming with my view of the spec, but
no more than any other framework. Your view doesn't eliminate pitfalls to
MT programming in .NET 2.0. My view simply means you must consider
volatility, and separately from synchronization and declare members accessed
by multiple threads with "volatile" or with "Volatile{Read|Write}", just
like most other languages with optimizing compilers.

My view certainly doesn't eliminate pitfalls in MT programming - but it
has a relatively straightforward protocol (the locking protocol) which
is guaranteed (IMO) to work with *all* data types.

You still haven't explained (as far as I can remember) how you deal
with variables which can't be declared volatile, such as variables of
type "double". Do you believe that the spec provides no way of coping
with such data in a thread-safe manner?
And I disagree 335 is making that implication and that it can be viewed as
absolute with your interpretation; otherwise you have to interpret 12.2.7
para 2 as not really meaning *all* references to memory, with regard to CIL
instructions, from "any references memory", for example.
Why?


12.6.7 makes no mention of not reordering instructions and is completely
unambiguous if it's viewed only in the context of processor caching.

It's unambiguous when viewed in the context of the memory model itself,
without regard to the details of CPUs.
Yes, I have, and it's a wonderful read on the Microsoft 2.0 .NET
implementation.

Are we reading the same page? This is the one I was referring to:

http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/

The sections about the locking protocol include the following:

<quote>
Clearly, the ability to arbitrarily move memory accesses anywhere at
all would lead to chaos, so all practical memory models have the
following three fundamental rules
[snip rules]
</quote>

In what way can that be said to only apply to the .NET 2.0 memory
model? Likewise, here's another part of the page:

<quote>
This model is very efficient, but requires that programs follow the
locking protocol or explicitly mark volatile accesses when using low-
lock techniques.
</quote>

That quote is taken directly from the section entitled: "A Relaxed
Model: ECMA". Again, how can that be deemed to apply only to the .NET
2.0 memory model?

Note the "or" in the quote, which clearly implies that using the
locking protocol is a viable alternative to explicitly marking volatile
accesses.
If that were in the spec we wouldn't be having this
conversation, not that I believe Vance's article is normatively worded.

The rules that the locking protocol relies on *are* in the spec.
Your assertions are about guarantees in 335. If you have to refer to
external documents then you're basically agreeing with me that 335 isn't
clear with respect to JIT optimizations (or there aren't cross-threading
restrictions on JIT optimizations and your locking protocol requires the
addition of "volatile" or "Thread.Volatile{Read|Write}").

My only reason for referring to the external documents was to say,
"Hey, here are people who should really understand the ECMA spec, given
that they were either closely involved in writing it, or know the
people who actually wrote it."

In other words, they're in a very good position to understand what the
spec is really saying.
Even Joe Duffy describes Vance's article as a "reference on the 2.0 memory
model, as implemented".

It's that as well, certainly - but the section on the locking protocol
comes before any of the .NET 2.0-specific parts.
If all you're writing for are Windows platforms,
you can be reasonably comfortable that your locking protocol is backed up by
Vance's description; but you can't say there are spec guarantees for it.

I *can* say there are spec guarantees for it, because as far as I can
see 12.6 provides all the rules that are required.
 
P

Peter Ritchie [C#MVP]

But there's already a clause defining the behaviour - 12.6.7. Why are
you assuming it only applies to *part* of the behaviour rather than the
overall system behaviour?

I don't agree, because "acquire semantics" and "release semantics" is
consistently used elsewhere in reference to processor caching, not compiler
optimizations. Have you looked at?:
http://msdn2.microsoft.com/en-us/library/aa490209.aspx
http://msdn2.microsoft.com/en-us/library/ms686355.aspx

How is any compiler supposed to guarantee "acquire semantics" and "release
semantics" of those functions by your definition of "acquire semantics" and
"release semantics".

Also, http://www.intel.com/design/itanium/downloads/25142901.pdf documents
"acquire semantics" and "release semantics" as processor instruction
completers. If "acquire semantics" and "release semantics" are processor
instruction completers, how can a compiler possibly guarantee that without
disassembling every function in every graph it could possibly call.
Or "acquire semantics" from the CLI spec as others have interpreted it:
http://softwarecommunity.intel.com/isn/Community/en-US/forums/thread/30221378.aspx
But the CLI's system behaviour is surely defined by the combination of
the JIT *and* the CPU. Put it this way: on a CPU which didn't provide
any atomicity guarantees itself, I wouldn't expect the CLR to say "Oh,
never mind - the clause on atomic reads and writes (12.6.6) just
doesn't apply". The implementation would have to work round the issue
somehow, to make the overall system behaviour comply with the spec.

The CLI spec must take into account processor caching; other language specs
had not and they have threading issues making it much harder to write
thread-safe code in those languages. I'm not following the "Oh, never
mind..." part, the spec does accommodate for CPUs that don't provide
atomicity guarantees: "An atomic write of a "small data item" (an item no
larger than the native word size) is required to do an atomic
read/modify/write on hardware that does not support direct writes to small
data items", although it is in a non-normative section--which scares me.

How can it be irrelevant?
What it "needs" to do and what the spec says are two different things. I
would suggest the JIT "needs" to optimize; but the spec only says it is
"free" to. After 30+ years of optimizing compilers the Framework would not
go far without optimizing...

I meant your comment was irrelevant to our discussion, I'm not disputing
that the spec does take the CPU into account, it clearly does. It takes the
CPU into account by specifying to flush it's cache before/after volatile
operations. How does flushing the processor cache before and after volatile
operations allow for the same processor instructions on two architectures to
not be compliant?
It's *not* the same in C# as it is in C++. It's simply not - and I
don't see any reason to believe they *are* the same.
As another example of this, consider "char" - does that mean the same
in C# and in C++? Certainly not - and any reading of the specification
which assumed they were the same would be incorrect.

Agreed, I'm only trying to point out that what I'm saying isn't new,
ground-breaking, or far-fetched.
Which external references do you think I'm relying on, other than how
other people (including the people who really should know!) read the
spec we're discussing?

What Vance, Joe, and Chris have written outside of the spec is not part of
the spec. Besides Chris Brumme does say "they screwed up when we specified
the ECMA memory model", which is Section 12 Partition 1 of 335. In Chris'
blog entry he does contradict 335 by saying "non-volatile loads can be
freely reordered" when discussing what the memory model *should* do in
comparison to the spec. (which contradicts Vance)
Other than the lock protocol, I agree - but I don't see any reason to
abandon the lock protocol, nor do I believe you've presented any
evidence to suggest it isn't guaranteed to work according to the spec.

I agree that it appears the current implementation of .NET 2.0 does attempt
to make that lock protocol safe, it's not doing that because of any
volatility guarantees. And, I'm not saying abandon the lock protocol, just
consider volatility separate from synchronization.
Not sure what you mean here, but I suspect I disagre...

I'm reiterating that if you read 12.6.5, 12.6.6, and 12.6.7 (and 12.6.4 for
that matter) defining acquire/release semantics as affecting only processor
caching they all make sense. If you don't 12.6.7 para 2 is *way* to
restrictive.
If it's executed after the lock is released, then another thread which
acquires the lock could see the write to memberString but not the write
to memberNumber, contrary to the spec.

You're missing my point. You're taking 12.6.7 to mean what is convenient to
mean (as in a way that backs up what you've been told the CLI does) and
ignoring the consequences. You can assume that "acquire "semantics" and
"release semantics" don't just apply to processor caching and get to your
guarantee; but if you do, that means it's unduly affecting unrelated
"memory". Another example:

volatile int volatileMember;
void string Method()
{
int value = 10;
this.volatileMember = 5;
return value;
}

"int value = 10" is a CIL instruction. If "acquire semantics" dealt with
CIL instructions *in addition to* processor cachings and the volatility
guarantees you say exist, the above could not be optimized to:
void int Method()
{
this.volatileMember = 5;
return 10;
}

....what would be the point of blocking that optimization, other than to say
programmers are physically incapable of writing thread-safe code?
I don't believe any external reference is really needed for the spec to
guarantee locking. However, I believe it's helpful to see how other
people interpret the same spec, but *not* helpful to bring in specs for
different platforms.

Without formally defining what "acquire semantics" and "release semantics"
means contextually, we can only go on usage based upon other specs. and some
actually make the distinction between "happens before" and "is sequenced
before" with respect to visibility of reads/writes and reordering of
instructions.
So you'd want to see another clause that says, "By the way, we really
did mean clause 12.6.7, just in case you were wondering"? I don't see
why. Clause 12.6.7 guarantees that in an instruction sequence of:

No, I would like to see a formal definition for "acquire semantics" and
"release semantics" that's consistent with usage inside and outside of
Microsoft, and if they do only apply to processor caching then re-write
12.6.7 to agree with what Chris, Joe, and Vance have said the .NET 2.0
implementation does. Or, fix it again...
I see no evidence to suggest that the whole section shouldn't be
applied to the system as a whole. Without that, the spec is worthless.

No it's not worthless, it just means a programmer must treat shared
writes/reads as explicitly volatile when not implicitly volatile. By saying
the spec is worthless with that view you're also suggesting any spec that
has the same stance is also worthless.
You still haven't explained (as far as I can remember) how you deal
with variables which can't be declared volatile, such as variables of
type "double". Do you believe that the spec provides no way of coping
with such data in a thread-safe manner?
I thought I did: Thread.Volatile*, Thread.Interlocked* and synchronization
(Monitor.Enter/Exit, Thread.MemoryBarrier (although, I didn't mention
MemoryBarrier before).


Okay, you don't *have* to, but it's unduly restrictive.
In what way can that be said to only apply to the .NET 2.0 memory
model? Likewise, here's another part of the page:
The spec neither contains "Reads cannot move before entering a lock" nor
"Writes cannot move after exiting a lock". You're reading the spec as
"Reads cannot move before a volatile operation" and "writes cannot move
after a volatile operation"--which is *not* what the .NET JIT is currently
doing. It's doing what Vance has described in
http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/, which is
different than what is described in the spec., as he describes it "Reads and
writes cannot move before a volatile read." and "Reads and writes cannot
move after a volatile write". The .NET 2.0 JIT is clearly not following
335, in that respect.

Anyway, as I said it's somewhat moot as Vance has described what the JIT is
doing differently than the ECMA memory model, therefore it's not following
the spec., good or bad. Besides, I think we've sufficiently broken
Microsoft's online newsgroup reader with regard to this thread.
 
J

Jon Skeet [C# MVP]

Besides, I think we've sufficiently broken
Microsoft's online newsgroup reader with regard to this thread.

Agreed. I can address any of the points in your most recent post if you
really wish me to, but I suspect we're not going to convince each
other. However, we might at least be able to agree what we
agree/disagree on, which could serve as a useful summary of the thread
for anyone looking at it in the future. Here's what I believe - please
correct as appropriate :) (It's at a pretty high level to try to steer
clear of controversy.)


Things we agree on:

1) ECMA 335 is a weaker memory model than the .NET 2.0 model
2) ECMA 335 should be clearer
3) Variables declared as volatile have useful semantics, i.e. a change
made in one thread will be immediately visible in all other threads
4) If you use the locking protocol (as described by Vance) you don't
need to make variables volatile when running the .NET 2.0 CLR


Beliefs I hold:
1) Point 4 is also valid for the ECMA spec, i.e. ECMA 335 guarantees
that the locking protocol "works"
2) The ECMA spec should be deemed to govern overall visible system
behaviour, rather than only specifying what the JIT can do vs what the
CPU can do, unless it explicitly states that


Beliefs I believe you hold:
1) The locking protocol isn't guaranteed to work for all ECMA 335
compliant CLRs
2) There are places in the ECMA spec are implicitly only specifying
behaviour of the CPU or behaviour of the JIT compiler, rather than
overall visible system behaviour
 
P

Peter Ritchie [C#MVP]

I had started a short list of points as well, which follows. But, with
regard to what you think we agree on; 1) I agree that the memory model that
Vance describes in "Understand the Impact of Low-Lock Techniques in
Multithreaded Apps"[1] is stronger than ECMA 335; 2) Agreed; 3) Agreed; 4)
don't agree. And I agree with all the beliefs you believe I hold.

I'm viewing the "locking protocol" as Monitor.Exit/Monitor.Enter pairs on
the same lock object to protect a single region of memory (as described in
the Vance's precusor [2] to the above article, and that it only deals with
CLI-verifiable details).

Here are my positions (which overlap with your post):
1. "Acquire semantics" and "release semantics" have to do with processor
cachings, not compiler optimizations,
2. Sections 12.6.5 and 12.6.7 describe the requirements of processor
write-cache flushing with volatile operations,
3. Section 12.6.4 describes the scope of JIT optimizations,
4. Nothing in the spec guarantees a non-volatile read or write can't be
optimized from within a Monitor.Enter/Monitor.Exit pair to before or after
the pair,
5. Volatility and synchronization need to be considered separately and dealt
with separately.

What I see as your assertions (which likely add detail to your post):
1. "Acquire semantics" and "release semantics" affect the scope of JIT
optimizations,
2. 12.6.7's para 2 means the JIT cannot generate instructions that occur
before instructions generated for a volatile operation where in the source
CIL instruction sequence it is after the volatile operation and the JIT
cannot generate instructions that occur after instructions generated for a
volatile operation where in the source CIL instruction sequence it is before
the volatile operation.

Now, the reason I've detailed these points and why I don't agree that the
Monitor.Enter/Monitor.Exit pairs on the same object for the same region of
memory is a reliable means of dealing with volatility is because, after
recently completing some detailed research, if you don't agree with my
positions 1-4 then I can prove .NET 2.0 is not ECMA 335 compliant and that
the reason lock(obj){ReadOrWrite(member);} *may* work now is because of
exception guarentees and not volatility guarentees.

What you view as the guarantee from 12.6.7 para 2 is not upheld for any lone
volatile operation in .NET 2.0 (i.e. memory references in the CIL sequence
are reordered from before a volatile operation to after) and what you think
is ensuring that guarentee are actually exception guarentees (i.e. you can't
stop non-volatile memory references from being re-sequenced by the JIT
without a try/{catch|finally}).

So, my concern with not using "volatile" with "lock", despite locking the
same object for the same regions of memory, is that you're relying on the
side-effect of exception guarentees, which are documented in 335 as being
able to be relaxed and whose concerns don't necessarily overlap with those
of volatility.

The .NET 2.0 rules that Vance outlines in [1] are also only true, with
regard to observable .NET 2.0 behaviour, if taken in the context of
processor write-cache flushing and not JIT optimizations. And I can show
examples of this with every volatile operation defined in 335.

[1] http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/
[2] http://msdn.microsoft.com/msdnmag/issues/05/08/Concurrency/
 
J

Jon Skeet [C# MVP]

Now, the reason I've detailed these points and why I don't agree that the
Monitor.Enter/Monitor.Exit pairs on the same object for the same region of
memory is a reliable means of dealing with volatility is because, after
recently completing some detailed research, if you don't agree with my
positions 1-4 then I can prove .NET 2.0 is not ECMA 335 compliant and that
the reason lock(obj){ReadOrWrite(member);} *may* work now is because of
exception guarentees and not volatility guarentees.

What you view as the guarantee from 12.6.7 para 2 is not upheld for any lone
volatile operation in .NET 2.0 (i.e. memory references in the CIL sequence
are reordered from before a volatile operation to after) and what you think
is ensuring that guarentee are actually exception guarentees (i.e. you can't
stop non-volatile memory references from being re-sequenced by the JIT
without a try/{catch|finally}).

The violations that you're seeing (well, they're violations of the way
I read the spec, anyway) - do they occur when reading and writing heap
memory "around" a volatile member, or only when reading and writing
stack value? In other words, if you've got:

volatileMember = 10;
nonVolatileMember = 20;

have you seen this being JITted to

nonVolatileMember = 20;
volatileMember = 10;

?

Or is it:

int stackVariable;
volatileMember = 10;
stackVariable = 20;

being JITted to:

int stackVariable = 20;
volatileMember = 10;

?

The latter doesn't break the locking protocol, but the former certainly
would.
 
B

Barry Kelly

Peter said:
Here are my positions (which overlap with your post):
1. "Acquire semantics" and "release semantics" have to do with processor
cachings, not compiler optimizations,

I don't think this is entirely true, for at least one important reason:
the CPU is an implementation detail, and what's important is that the
programmer's intent when the code is written is respected no matter how
many layers of software or hardware through which the code gets
transmitted and transmuted, where the programmer from the CLI's
perspective is the compiler writer. If this wasn't the case, acquire and
release guarantees wouldn't have any weight, since they could be
discarded by the optimizer through its manipulations.

-- Barry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top