When is "volatile" used instead of "lock" ?

P

Peter Ritchie [C#MVP]

Barry Kelly said:
I don't think this is entirely true, for at least one important reason:
the CPU is an implementation detail, and what's important is that the
programmer's intent when the code is written is respected no matter how
many layers of software or hardware through which the code gets
transmitted and transmuted, where the programmer from the CLI's
perspective is the compiler writer. If this wasn't the case, acquire and
release guarantees wouldn't have any weight, since they could be
discarded by the optimizer through its manipulations.

I agree that there needs to be a specification for what can and cannot be
optimized. But, I don't think "acquire semantics" and "release semantics"
are sufficient. They're not defined within 335. Outside of 335, all
references that I've read, deal strictly with flushing of the processor's
write cache and are not in a context relative to any compiler. If a
processor were truly an implementation detail there would be no need for
MemoryBarrier, VolatileRead, etc. There are details to some processors
whose consequences leak out into our programming models and must be
compensated for, good or bad. 335 compensates for this by reusing the
acquire/release semantics terms and mandates they be dealt with with all
volatile operations. The new C++ spec. finally tries to compensate for
these issues as well. I believe the new Java memory model is also
compensating for them as well. Historically these languages did leave the
processor as an implementation detail: a black box; but that has meant it's
very complicated to write thread-safe code. Look at
http://www.devarticles.com/c/a/Cplu...ouble-Check-Lock-Pattern-Isnt-100-ThreadSafe/,
or http://www.javaworld.com/jw-02-2001/jw-0209-double.html?page=1. It's
impossible to currently write a thread-safe double-check lock pattern with
strictly ISO C++ code, you have to inject some sort of platform-specific
code to do it.

As far as I know, 335 is the only ratified standard that does attempt to
deal with processor write caching, and that's great. That's a huge step
towards being able to write truly thread-safe code; but programmers still
needs to use the language syntax given to them to explicitly say what is and
isn't a thread-shared variable, with C++/CLI, ISO C++, Java (prior to 5.0)
this was done with the "volatile" keyword to tell the compiler to emit a
processor instruction to access RAM in the sequence detailed in the source
code, relative to all other volatile operators, the current blocks interface
requirements, and a single thread of execution.
 
P

Peter Ritchie [C#MVP]

Okay, so we know .NET 2.0 is violating 335, i.e. it's not CLI-compliant,
with respect to the memory model. Where else is it violating the 335 memory
model? We don't know.

Not big news that .NET 2.0 isn't CLI-compliant, many have argued the 335
memory model (although stricter than most language-specification and
framework memory models) is broken and Vance details how they've tried to
fix it in .NET 2.0. In the general case, maybe it's true that the ECMA
model is broken; from many programmer's points of view: certainly not; they
can still write thread-safe code.

So, where does that leave designers and programmers with respect to
developing something for a .NET platform that we can be certain is
thread-safe? Kinda up that creek without a compliant paddle. We could fall
back to our language specifications and reference material on MSDN and
assume they're all wrong when contradicted by Vance's .NET 2.0 memory model
article. But, the problem is even Vance's article is contradictory to what
the .NET 2.0 JIT is doing. Even then, I'd argue that Vance's article is
anecdotal and has been contradicted by other's of equal stature. Look at
Vance's assertion: "...so all practical memory models have the following
three fundamental rules: 1. The behaviour of the thread when run in
isolation is not changed. Typically, this means that a read or a write from
a given thread to a given location cannot pass a write from the same thread
to the same location. 2. Reads cannot move before entering a lock. 3. Writes
cannot move after exiting a lock." The only time 2 & 3 hold true, even in
..NET 2.0, is in the context of processor write-cache flushing (e.g. the x86
and IA64 memory models; they can't deal with re-sequencing of instructions
because it's outside their control).

Sure, we can say he "meant" something more detailed (like heap memory, not
stack memory); but how do we get that detail out to the development
community? It's impossible.

But, the JIT isn't following the rules that he's detailed unless they only
have to do with processor write-caching.

Also, why should it distinguish between stack and heap memory? Why should
stack memory be disregarded with respect to these rules? What if I were
writing an synchronous method that called an asynchronous method that
accepted a memory reference? If I pass it a reference to stack memory,
there's nothing I can do to tell the compiler that variable should be
treated as volatile by the JIT. For example:
public int Method( )
{
double value1 = 3.1415;
int value2 = 42;
IAsyncResult result = BeginBackgroundOperation(ref value1, ref value2);

// Sit in a loop waiting for up to 250ms at a time
// doing something with the double value...
do
{
value2 = 5;
// doubles aren't atomic, we need to use
// VolatileRead to read the "latest written" value and
// because BeginBackgroundOperation uses
// Thread.VolatileWrite(ref double).
double temp = Thread.VolatileRead(ref value1);
Thread.Sleep(value2);
// ...
} while (!result.AsyncWaitHandle.WaitOne(250, false));
return 1;
}

If the rules only applied to heap memory then that *could* be optimized to
public int Method( )
{
double value1 = 3.1415;
int value2 = 42;
IAsyncResult result = BeginBackgroundOperation(ref value1, ref value2);

do
{
double temp = Thread.VolatileRead(ref value1);
Thread.Sleep(5);
} while (!result.AsyncWaitHandle.WaitOne(250, false));
return 1;
}

I use Thread.Sleep not because it makes sense in this example, only that it
makes the disassembly easier to read for passing a value to a method.
Something like Console.WriteLine would make more sense but that means the
possibility of injecting a some standard output initialization code that
complicates the disassembly.

Good design? Probably not; if I had to review that code I'd likely send it
back for re-write. But if those rules apply to JIT optimization as well as
processor write-caching then the only description of the .NET 2.0 memory
model and the ECMA memory model say it's thread-safe with regard to value1.

Jon Skeet said:
Peter Ritchie [C#MVP] <[email protected]> wrote:
The violations that you're seeing (well, they're violations of the way
I read the spec, anyway) - do they occur when reading and writing heap
memory "around" a volatile member, or only when reading and writing
stack value?
<snip>
 
J

Jon Skeet [C# MVP]

Also, why should it distinguish between stack and heap memory? Why should
stack memory be disregarded with respect to these rules? What if I were
writing an synchronous method that called an asynchronous method that
accepted a memory reference? If I pass it a reference to stack memory,
there's nothing I can do to tell the compiler that variable should be
treated as volatile by the JIT. For example:
public int Method( )
{
double value1 = 3.1415;
int value2 = 42;
IAsyncResult result = BeginBackgroundOperation(ref value1, ref value2);

Do you have an example of this sort of asynchronous behaviour which
actually works? Delegate.BeginInvoke has different behaviour for
ref/out parameters than for value parameters - you fetch them
separately when you call EndInvoke. In other words it doesn't update
the values you provide in the call to BeginInvoke. That's not to say
that other framework calls *don't* work in the way you describe - but
I've never seen one. Do you have an example?

The problem is - what would you expect to happen if Method() just
returned at this point, thus popping the stack frame? It would be
*hugely* unsafe to let another thread start arbitrarily writing to
stack data for a different thread. I don't *believe* it's possible to
do in safe C# code, although it could be possible with unsafe code.

I think it's fundamentally a bad idea for one thread to access memory
in another thread's stack (except in very controlled environment such
as a debugger) - I think it's reasonable to put that (explicitly, mind)
outside the restrictions of the locking protocol.
// Sit in a loop waiting for up to 250ms at a time
// doing something with the double value...
do
{
value2 = 5;
// doubles aren't atomic, we need to use
// VolatileRead to read the "latest written" value and
// because BeginBackgroundOperation uses
// Thread.VolatileWrite(ref double).

Even if doubles were treated atomically, that wouldn't mean you'd still
get the "latest written" value. It just means you'd either get the new
value or the old one, not a "half way house".
 
P

Peter Ritchie [C#MVP]

I was thinking more in the context of a PInvoke call; but I did mix in some
asynchronous call patterns in there... But, with PInvoke it's going to
marshal those pointers and the background thread wouldn't be directly
updating the stack variable's value either. Although not relevant, it
doesn't cause an access violation if the loop doesn't end while the
background thread is running (didn't try breaking the loop and exiting the
method before the thread completed; so, I don't know if it causes a
violation, my guess it wouldn't until the next GC...).

I haven't tried it; but my guess is you'd have to use an unsafe method in C#
to get the address of a stack variable to a background thread. Not really
worth doing because of the obvious problems; but the JIT doesn't know that.

Oddly, the JIT doesn't do any optimizations on the integer as long as I pass
them by ref to another method. It's almost as if the JIT thinks the address
could have been passed to another thread (I can't think of any other reason
why it should't before that optimization). But, far from proof of
anything...

You're more comfortable with saying that .NET 2.0 is violating both the spec
and Vance's .NET 2.0 memory model article (as written) than to say they deal
with processor write-cache flushing? 335 does have that over other language
specs., which goes a long way towards being able to reliably write
platform-independent thread-safe code. Something you can't say for some
other languages yet.

I've gone through Vance's article [1] again and it does say the double-check
lock pattern works without making the instance member volatile; while Joe
Duffy says on IA64 no instructions are generated with the acq or rel
completer without declaring a member "volatile" [2]... To save you scouring
Vance's article, it's the last sentence paragraph 3 of the section Technique
4: Lazy Initialization. "In the .NET Framework 2.0 model, the code works
without the volatile declarations.", in reference to Figure 7 which is:
public class LazyInitClass {
private static LazyInitClass myValue = null;
private static Object locker = new Object();

public static LazyInitClass GetValue() {
if (myValue == null) {
lock(locker) {
if (myValue == null) {
myValue = new LazyInitClass();
}
}
}
return myValue;
}
};

With a per-method JIT it's not possible to optimize member fields. If they
ever improved the optimizations, I think we'd see more examples. i.e. if the
JIT knew it was JITting the last method in a class it could go back and
re-optimize other methods (Java does this?), or a single-method class that
updated a private member field could conceivably optimize use of that field.
Not to mention the possibilities with NGEN. The fact that it doesn't,
coupled with the required behaviour of try blocks, I think you're only
getting the appearance of those guarantees. Simply adding "volatile" to
fields shared amongst more than one thread despite following the locking
protocol is best case not going to impact performance on x86 and worst case,
as you've put it with making the double-check locking pattern faster, not a
significant performance difference. You're concern with also declaring
fields uses with the locking protocol (that aren't implicitly volatile) is
only performance? You don't think declaring such fields volatile makes it
not thread-safe?
Even if doubles were treated atomically, that wouldn't mean you'd still
get the "latest written" value. It just means you'd either get the new
value or the old one, not a "half way house".

I wasn't implying that VolatileRead(ref double) makes the access atomic.
Using VolatileRead or VolatileWrite to always access a particular field
VolatileReadI(ref double) is documented as "obtains the very latest written
to a memory location by any processor" and able "to synchronize access to a
field that can be updated by another thread, or by hardware" and "provides
effective synchronization for a field". Locking is usually a better idea;
but in my example it shows that a memory access can move from before a
volatile operation in the CIL instruction sequence to after the operation in
the processor's instruction sequence and synchronize access to a non-atomic
type with Monitor.Enter/Exit.

I'm just not comfortable with the lack of consistency, the contradictions
with viewing "acquire semantics" and "release semantics" as having anything
to do with anything other than processor write-cache flushing. Not to
mention the lack of spec for the .NET 2.0 memory model.

[1] http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/
[2]
http://www.bluebytesoftware.com/blog/PermaLink,guid,543d89ad-8d57-4a51-b7c9-a821e3992bf6.aspx
 
W

Willy Denoyette [MVP]

Peter Ritchie said:
I was thinking more in the context of a PInvoke call; but I did mix in some
asynchronous call patterns in there... But, with PInvoke it's going to
marshal those pointers and the background thread wouldn't be directly
updating the stack variable's value either. Although not relevant, it
doesn't cause an access violation if the loop doesn't end while the
background thread is running (didn't try breaking the loop and exiting the
method before the thread completed; so, I don't know if it causes a
violation, my guess it wouldn't until the next GC...).

I haven't tried it; but my guess is you'd have to use an unsafe method in
C# to get the address of a stack variable to a background thread. Not
really worth doing because of the obvious problems; but the JIT doesn't
know that.

Oddly, the JIT doesn't do any optimizations on the integer as long as I
pass them by ref to another method. It's almost as if the JIT thinks the
address could have been passed to another thread (I can't think of any
other reason why it should't before that optimization). But, far from
proof of anything...

You're more comfortable with saying that .NET 2.0 is violating both the
spec and Vance's .NET 2.0 memory model article (as written) than to say
they deal with processor write-cache flushing? 335 does have that over
other language specs., which goes a long way towards being able to
reliably write platform-independent thread-safe code. Something you can't
say for some other languages yet.

I've gone through Vance's article [1] again and it does say the
double-check lock pattern works without making the instance member
volatile; while Joe Duffy says on IA64 no instructions are generated with
the acq or rel completer without declaring a member "volatile" [2]...

What Joe says is this:
"The 2.0 memory model does not use ld.acq’s unless you are accessing
volatile data (marked w/ the volatile modifier keyword or accessed via the
Thread.VolatileRead API)."
which means that reads have acquire semantics when interlocked (reads of
"volatile" fields or volatile read operations), all writes have release
semantics since v 2.0, that means that all stores have release semantics
(st.rel instruction on IA64), irrespective the current HW platform.

To save you scouring
Vance's article, it's the last sentence paragraph 3 of the section
Technique 4: Lazy Initialization. "In the .NET Framework 2.0 model, the
code works without the volatile declarations.", in reference to Figure 7
which is:
public class LazyInitClass {
private static LazyInitClass myValue = null;
private static Object locker = new Object();

public static LazyInitClass GetValue() {
if (myValue == null) {
lock(locker) {
if (myValue == null) {
myValue = new LazyInitClass();
}
}
}
return myValue;
}
};

Above sample works on IA64 as is , what doesn't work and what Joe is talking
about is a" slight variant " of the above, marking the fields volatile in
Joe's sample doesn't work either, you must use explicit barriers.



Willy.
 
P

Peter Ritchie [C#MVP]

Willy Denoyette said:
"The 2.0 memory model does not use ld.acq’s unless you are accessing
volatile data (marked w/ the volatile modifier keyword or accessed via the
Thread.VolatileRead API)."
which means that reads have acquire semantics when interlocked (reads of
"volatile" fields or volatile read operations), all writes have release
semantics since v 2.0, that means that all stores have release semantics
(st.rel instruction on IA64), irrespective the current HW platform.
Right, changing the double-checked lock pattern to use an additional boolean
won't work (and is no longer the double-check lock pattern) because it's not
guarding the complex (non-atomic) invariant (the bool AND the instance) so
it must change to the single-check lock pattern and take the performance
hit. I wasn't saying the non-double-check lock pattern worked, only that
Joe has detailed that ld.acq is not emitted unless the field is declared
volatile.

In the "working" double-check lock the first read of "myValue" (comparing to
null) is not volatile if "myValue" is not volatile. There's no acquire
semantics until the lock statement. The write to "myValue" within the lock
block is also not volatile, but it's guarded. So, immediately after the
write to "myValue" any non-volatile access to "myValue" by another processor
may not see that write as it may have been cached. The end of the lock
block flushes the write-cache; and if that write was cached it is now
flushed. And if there's any code after that write and before the end of the
lock block (that isn't a volatile operation), you increase the liklihood of
executing code after the write in the processors' instruction sequence
without "seeing" it. Declaring "myValue" volatile causes the comparision of
null to see any cached writes as well as the write to "myValue" in the lock
block to be visible to all other processors.
To save you scouring

Above sample works on IA64 as is
Do you have IA64 disassembly of Vance's sample code to show that
"if(myValue==null)" emits an instruction with the "acq" completer contrary
to Joe's "The 2.0 memory model does not use ld.acq’s unless you are
accessing volatile data..." statement?
 
B

Barry Kelly

Peter said:
Also, why should it distinguish between stack and heap memory? Why should
stack memory be disregarded with respect to these rules?

Because the stack is associated with a single thread of execution, and
references to stack locations cannot be communicated between threads
using verifiable code.
What if I were
writing an synchronous method that called an asynchronous method that
accepted a memory reference?

For standard .NET asynchronous method pattern methods, these variables
are copied in and out. A reference isn't kept.

-- Barry
 
P

Peter Ritchie [C#MVP]

Yes, from a design standpoint it's really bad idea. But the JIT has to
support unverifiable code and cannot make the decision what is and isn't bad
code and has to assume whatever the language-to-IL compiler gave it is valid
code. So, it stands to reason that it still has to consider stack
variables. Regardless, the .NET 2.0 memory model description and the CLI
spec both don't distinguish and just detail "any memory" or just "memory".

C# and .NET move the gun much farther away from your foot; but if you aim
really carefully you can still shoot it.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
Yes, from a design standpoint it's really bad idea. But the JIT has to
support unverifiable code and cannot make the decision what is and isn't bad
code and has to assume whatever the language-to-IL compiler gave it is valid
code.

I think it's entirely reasonable for the memory model to allow some
optimisations which will only break code which is doing fundamentally
dangerous things in the first place.

In particular, if you're using unsafe code you're *likely* to be
calling into native code - at which point you've got a whole other
memory model to worry about to start with. You've got to worry about
when the native code will publish writes, as well as when the managed
code will read them.
So, it stands to reason that it still has to consider stack
variables. Regardless, the .NET 2.0 memory model description and the CLI
spec both don't distinguish and just detail "any memory" or just "memory".

Indeed - and I believe they should do so.
C# and .NET move the gun much farther away from your foot; but if you aim
really carefully you can still shoot it.

True - but I see no reason why they should help to keep you thread-safe
while you shoot yourself.
 
P

Peter Ritchie [C#MVP]

Jon Skeet said:
I think it's entirely reasonable for the memory model to allow some
optimisations which will only break code which is doing fundamentally
dangerous things in the first place.

If there were ambiguities there or concessions for "undefined behaviour" I
would agree. But, I think opens a whole level of complexity that
cross-platform-framework designers just aren't willing to deal with. You're
delving into a prescriptive realm that borders on opinion that would take
ages to get a committee to agree upon. It's pretty easy to argue that any
multithreading is fundamentally dangerous.

It's fine to say that type of code is fundamentally unsafe in various ways;
but the teams writing the optimizers have to take that into account, along
with specified behaviour, in a way so that it can be compatible across
platforms, architectures, and implementing organizations.
 
W

Willy Denoyette [MVP]

Peter Ritchie said:
Right, changing the double-checked lock pattern to use an additional
boolean won't work (and is no longer the double-check lock pattern)
because it's not guarding the complex (non-atomic) invariant (the bool AND
the instance) so it must change to the single-check lock pattern and take
the performance hit. I wasn't saying the non-double-check lock pattern
worked, only that Joe has detailed that ld.acq is not emitted unless the
field is declared volatile.

In the "working" double-check lock the first read of "myValue" (comparing
to null) is not volatile if "myValue" is not volatile. There's no acquire
semantics until the lock statement. The write to "myValue" within the
lock block is also not volatile, but it's guarded. So, immediately after
the write to "myValue" any non-volatile access to "myValue" by another
processor may not see that write as it may have been cached. The end of
the lock block flushes the write-cache; and if that write was cached it is
now flushed. And if there's any code after that write and before the end
of the lock block (that isn't a volatile operation), you increase the
liklihood of executing code after the write in the processors' instruction
sequence without "seeing" it. Declaring "myValue" volatile causes the
comparision of null to see any cached writes as well as the write to
"myValue" in the lock block to be visible to all other processors.

Do you have IA64 disassembly of Vance's sample code to show that
"if(myValue==null)" emits an instruction with the "acq" completer contrary
to Joe's "The 2.0 memory model does not use ld.acq’s unless you are
accessing volatile data..." statement?


Yes I do and it's not emitting a ld.acq on IA64, the memory read has no
acquire semantics on X86/X64 either, but this is not relevant in case of
Vance's sample (fig 7) . What makes this work in V2. (as explained in
Vance's article you are referring to) is the strong write ordering in V2's
memory model, this irrespective the underlying HW platform.

Willy.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
If there were ambiguities there or concessions for "undefined behaviour" I
would agree. But, I think opens a whole level of complexity that
cross-platform-framework designers just aren't willing to deal with. You're
delving into a prescriptive realm that borders on opinion that would take
ages to get a committee to agree upon. It's pretty easy to argue that any
multithreading is fundamentally dangerous.

There's "risk of getting the wrong result" and there's "risk of trying
to access data in a different stack frame from what you expected" -
they're pretty different, IMO.
It's fine to say that type of code is fundamentally unsafe in various ways;
but the teams writing the optimizers have to take that into account, along
with specified behaviour, in a way so that it can be compatible across
platforms, architectures, and implementing organizations.

There's plenty of room for unspecified behaviour - isn't that what
you've been arguing all along, in fact, that ECMA 335 doesn't specify
that the locking protocol works? (I still believe it does :)

Why would it not be reasonable to specify lots of things in detail,
including specifying that the system (compiler+CPU) is free to optimise
stack operations (within certain well-specified boundaries, of course).

I'm not saying that's what the spec currently *does* say - I'm saying
that's what I think it *ought* to say.
 
P

Peter Ritchie [C#MVP]

I'd be interested in the disassembly if you can post it, it's going to take
me a while to get access to an IA64...

I should have asked this at the same time: does the "myValue = new
LazyInitClass();" code cause to emit an instruction with a "rel" completer?
Without that, one processor could have executed that instruction and has yet
to call Monitor.Exit while another processor is executing the read of
"myValue".
 
J

Jon Skeet [C# MVP]

<snip>

Is anyone still interested in me replying to this post? I've left it
for a while because it'll take a fair amount of replying, but I won't
bother if no-one's interested any more :)
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jon said:
Is anyone still interested in me replying to this post? I've left it
for a while because it'll take a fair amount of replying, but I won't
bother if no-one's interested any more :)

I consider it very likely that someone will find it useful
within the next 5-10 years.

Besides the original poster and participants, then there
are always those that find the thread via Google later.

Arne
 
J

Jon Skeet [C# MVP]

Arne Vajhøj said:
I consider it very likely that someone will find it useful
within the next 5-10 years.

Besides the original poster and participants, then there
are always those that find the thread via Google later.

Okay - I'll try to reply on Sunday/Monday then. I'm away until then.
 
B

Brian Gideon

I consider it very likely that someone will find it useful
within the next 5-10 years.

Besides the original poster and participants, then there
are always those that find the thread via Google later.

Arne

I second that. I haven't been participating in this thread, but I
have been watching it with interest.
 
J

Jon Skeet [C# MVP]

Peter Ritchie said:
I was thinking more in the context of a PInvoke call; but I did mix in some
asynchronous call patterns in there... But, with PInvoke it's going to
marshal those pointers and the background thread wouldn't be directly
updating the stack variable's value either. Although not relevant, it
doesn't cause an access violation if the loop doesn't end while the
background thread is running (didn't try breaking the loop and exiting the
method before the thread completed; so, I don't know if it causes a
violation, my guess it wouldn't until the next GC...).

I haven't tried it; but my guess is you'd have to use an unsafe method in C#
to get the address of a stack variable to a background thread. Not really
worth doing because of the obvious problems; but the JIT doesn't know that.

The JIT doesn't know that it's worth doing, but I think it's reasonable
for the specification to not make any attempt to make code like that
particularly easy to write in a thread-safe manner.
Oddly, the JIT doesn't do any optimizations on the integer as long as I pass
them by ref to another method. It's almost as if the JIT thinks the address
could have been passed to another thread (I can't think of any other reason
why it should't before that optimization). But, far from proof of
anything...

It's possible that that's the easiest way of making sure that
Interlocked.Increment etc work from the point of view of that stack
frame. It could be that there are other corner cases we haven't thought
about which just make it easier to fix by prohibiting such
optimisations than working out whether or not they apply.
You're more comfortable with saying that .NET 2.0 is violating both the spec
and Vance's .NET 2.0 memory model article (as written) than to say they deal
with processor write-cache flushing? 335 does have that over other language
specs., which goes a long way towards being able to reliably write
platform-independent thread-safe code. Something you can't say for some
other languages yet.

I'm suggesting that the memory model (in terms of moving access)
doesn't apply to stack memory in a way that either *should* be
specified or *is* specified in a place that neither of us have noticed.
I believe it makes sense to allow more memory access reordering on the
stack than on the heap.

12.6.1 states: "By "memory store" we mean the regular process memory
that the CLI operates within. Conceptually, this store
is simply an array of bytes. The index into this array is the address
of a data object. The CLI accesses data objects in the memory store via
the ldind.* and stind.* instructions."

That's about the only thing I can see which describes which bits of
memory are being talked about.
I've gone through Vance's article [1] again and it does say the double-check
lock pattern works without making the instance member volatile; while Joe
Duffy says on IA64 no instructions are generated with the acq or rel
completer without declaring a member "volatile" [2]... To save you scouring
Vance's article, it's the last sentence paragraph 3 of the section Technique
4: Lazy Initialization. "In the .NET Framework 2.0 model, the code works
without the volatile declarations.", in reference to Figure 7 which is:
public class LazyInitClass {
private static LazyInitClass myValue = null;
private static Object locker = new Object();

public static LazyInitClass GetValue() {
if (myValue == null) {
lock(locker) {
if (myValue == null) {
myValue = new LazyInitClass();
}
}
}
return myValue;
}
};

One of the interesting parts of the .NET 2.0 model is that all writes
during a constructor are made visible before the new reference itself
is made visible. Maybe that's something to do with the apparent
contradiction?
With a per-method JIT it's not possible to optimize member fields. If they
ever improved the optimizations, I think we'd see more examples. i.e. if the
JIT knew it was JITting the last method in a class it could go back and
re-optimize other methods (Java does this?)

Not sure about that particular one, but it does some very clever stuff.
For instance, a virtual method can still be inlined so long as
nothing's been loaded which overrides it - when the first overriding
class is loaded, the inlining optimisation is undone. Scary.
or a single-method class that
updated a private member field could conceivably optimize use of that field.
Not to mention the possibilities with NGEN. The fact that it doesn't,
coupled with the required behaviour of try blocks, I think you're only
getting the appearance of those guarantees. Simply adding "volatile" to
fields shared amongst more than one thread despite following the locking
protocol is best case not going to impact performance on x86 and worst case,
as you've put it with making the double-check locking pattern faster, not a
significant performance difference. You're concern with also declaring
fields uses with the locking protocol (that aren't implicitly volatile) is
only performance? You don't think declaring such fields volatile makes it
not thread-safe?

Using "volatile" *instead* of the locking protocol makes it easier to
make mistakes, IMO - things like incrementing a value.

Using "volatile" *as well as* the locking protocol could easily lead
the reader into thinking that it's required - which then begs the
question of what they're meant to do when they can't make the field
volatile (doubles, fields in classes they don't have access to etc).
For instance, if I have a class like this:

class Dummy
{
int counter;
double average;
object dataLock;
}

(and appropriate methods, of course)

then it looks pretty odd to me to make "counter" volatile when
"average" isn't (and can't be). It looks like some safety is being
added when in my view it's not, if you're also using the locking
protocol. It means that in *some* cases you can get away without using
the locking protocol - but I rarely try to write lock-free multi-
threaded code, as I like to keep things as simple as possible.
I wasn't implying that VolatileRead(ref double) makes the access atomic.

Why did you bring up atomicity then? This code seems pretty misleading
to me:

// doubles aren't atomic, we need to use
// VolatileRead to read the "latest written" value and
// because BeginBackgroundOperation uses
// Thread.VolatileWrite(ref double).
double temp = Thread.VolatileRead(ref value1);

I don't think it's too much of a stretch to infer that there's some
connection between the first clause of the sentence and the second.
Without the first part, I agree with it completely - but it would still
be true even if doubles *were* atomic, and value2 should also be
read/written using VolatileRead/VolatileWrite if it's expected to be
changed by different threads.
Using VolatileRead or VolatileWrite to always access a particular field
VolatileReadI(ref double) is documented as "obtains the very latest written
to a memory location by any processor" and able "to synchronize access to a
field that can be updated by another thread, or by hardware" and "provides
effective synchronization for a field". Locking is usually a better idea;
but in my example it shows that a memory access can move from before a
volatile operation in the CIL instruction sequence to after the operation in
the processor's instruction sequence and synchronize access to a non-atomic
type with Monitor.Enter/Exit.

Your example shows that it can happen with stack access, yes. I think
I've covered my beliefs on that matter elsewhere :)
I'm just not comfortable with the lack of consistency, the contradictions
with viewing "acquire semantics" and "release semantics" as having anything
to do with anything other than processor write-cache flushing.

Basically we're both suggesting that part of the spec only applies to a
certain situation: I believe that it only applies to heap access, and
you believe that it only applies to processor write-cache flushing.

If I'm right, it means the memory model is doing what I think it should
do: abstracting away the hardware specifics, to specify how the system
as a whole should observably behave. To my mind, that's what layered
specifications are all about. We agree that the spec needs
clarification: if the clarification required needs to break the veneer
of the memory model to show the processor cache, I'll be disappointed.
Not to mention the lack of spec for the .NET 2.0 memory model.

We certainly concur on this matter.
 
B

Ben Voigt [C++ MVP]

One of the interesting parts of the .NET 2.0 model is that all writes
during a constructor are made visible before the new reference itself
is made visible. Maybe that's something to do with the apparent
contradiction?

I'm pretty sure this is not true in .NET. The "this" keyword is available
in the constructor. Only Spec# complains if you try to use it, because the
invariants may not yet hold.
 
J

Jon Skeet [C# MVP]

Ben Voigt said:
I'm pretty sure this is not true in .NET. The "this" keyword is available
in the constructor. Only Spec# complains if you try to use it, because the
invariants may not yet hold.

I think we may be talking at cross-purposes. I'm talking about a
situation which is theoretically possible in the CLI model, but not in
the .NET 2.0 model:

public class Foo
{
public int Bar; // Yeah, but with a property etc

public Foo(int b)
{
Bar = b;
}
}


Thread 1: Thread 2:
sharedVar = new Foo(10);
Console.WriteLine(sharedVar.Bar);


In theory (with the ECMA model), sharedVar's new value can be visible
to thread 2 before the write to sharedVar.Bar - meaning that thread 2
could end up printing "0".

That has been tightened up in the .NET 2.0 memory model.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top