Peter Ritchie said:
I was thinking more in the context of a PInvoke call; but I did mix in some
asynchronous call patterns in there... But, with PInvoke it's going to
marshal those pointers and the background thread wouldn't be directly
updating the stack variable's value either. Although not relevant, it
doesn't cause an access violation if the loop doesn't end while the
background thread is running (didn't try breaking the loop and exiting the
method before the thread completed; so, I don't know if it causes a
violation, my guess it wouldn't until the next GC...).
I haven't tried it; but my guess is you'd have to use an unsafe method in C#
to get the address of a stack variable to a background thread. Not really
worth doing because of the obvious problems; but the JIT doesn't know that.
The JIT doesn't know that it's worth doing, but I think it's reasonable
for the specification to not make any attempt to make code like that
particularly easy to write in a thread-safe manner.
Oddly, the JIT doesn't do any optimizations on the integer as long as I pass
them by ref to another method. It's almost as if the JIT thinks the address
could have been passed to another thread (I can't think of any other reason
why it should't before that optimization). But, far from proof of
anything...
It's possible that that's the easiest way of making sure that
Interlocked.Increment etc work from the point of view of that stack
frame. It could be that there are other corner cases we haven't thought
about which just make it easier to fix by prohibiting such
optimisations than working out whether or not they apply.
You're more comfortable with saying that .NET 2.0 is violating both the spec
and Vance's .NET 2.0 memory model article (as written) than to say they deal
with processor write-cache flushing? 335 does have that over other language
specs., which goes a long way towards being able to reliably write
platform-independent thread-safe code. Something you can't say for some
other languages yet.
I'm suggesting that the memory model (in terms of moving access)
doesn't apply to stack memory in a way that either *should* be
specified or *is* specified in a place that neither of us have noticed.
I believe it makes sense to allow more memory access reordering on the
stack than on the heap.
12.6.1 states: "By "memory store" we mean the regular process memory
that the CLI operates within. Conceptually, this store
is simply an array of bytes. The index into this array is the address
of a data object. The CLI accesses data objects in the memory store via
the ldind.* and stind.* instructions."
That's about the only thing I can see which describes which bits of
memory are being talked about.
I've gone through Vance's article [1] again and it does say the double-check
lock pattern works without making the instance member volatile; while Joe
Duffy says on IA64 no instructions are generated with the acq or rel
completer without declaring a member "volatile" [2]... To save you scouring
Vance's article, it's the last sentence paragraph 3 of the section Technique
4: Lazy Initialization. "In the .NET Framework 2.0 model, the code works
without the volatile declarations.", in reference to Figure 7 which is:
public class LazyInitClass {
private static LazyInitClass myValue = null;
private static Object locker = new Object();
public static LazyInitClass GetValue() {
if (myValue == null) {
lock(locker) {
if (myValue == null) {
myValue = new LazyInitClass();
}
}
}
return myValue;
}
};
One of the interesting parts of the .NET 2.0 model is that all writes
during a constructor are made visible before the new reference itself
is made visible. Maybe that's something to do with the apparent
contradiction?
With a per-method JIT it's not possible to optimize member fields. If they
ever improved the optimizations, I think we'd see more examples. i.e. if the
JIT knew it was JITting the last method in a class it could go back and
re-optimize other methods (Java does this?)
Not sure about that particular one, but it does some very clever stuff.
For instance, a virtual method can still be inlined so long as
nothing's been loaded which overrides it - when the first overriding
class is loaded, the inlining optimisation is undone. Scary.
or a single-method class that
updated a private member field could conceivably optimize use of that field.
Not to mention the possibilities with NGEN. The fact that it doesn't,
coupled with the required behaviour of try blocks, I think you're only
getting the appearance of those guarantees. Simply adding "volatile" to
fields shared amongst more than one thread despite following the locking
protocol is best case not going to impact performance on x86 and worst case,
as you've put it with making the double-check locking pattern faster, not a
significant performance difference. You're concern with also declaring
fields uses with the locking protocol (that aren't implicitly volatile) is
only performance? You don't think declaring such fields volatile makes it
not thread-safe?
Using "volatile" *instead* of the locking protocol makes it easier to
make mistakes, IMO - things like incrementing a value.
Using "volatile" *as well as* the locking protocol could easily lead
the reader into thinking that it's required - which then begs the
question of what they're meant to do when they can't make the field
volatile (doubles, fields in classes they don't have access to etc).
For instance, if I have a class like this:
class Dummy
{
int counter;
double average;
object dataLock;
}
(and appropriate methods, of course)
then it looks pretty odd to me to make "counter" volatile when
"average" isn't (and can't be). It looks like some safety is being
added when in my view it's not, if you're also using the locking
protocol. It means that in *some* cases you can get away without using
the locking protocol - but I rarely try to write lock-free multi-
threaded code, as I like to keep things as simple as possible.
I wasn't implying that VolatileRead(ref double) makes the access atomic.
Why did you bring up atomicity then? This code seems pretty misleading
to me:
// doubles aren't atomic, we need to use
// VolatileRead to read the "latest written" value and
// because BeginBackgroundOperation uses
// Thread.VolatileWrite(ref double).
double temp = Thread.VolatileRead(ref value1);
I don't think it's too much of a stretch to infer that there's some
connection between the first clause of the sentence and the second.
Without the first part, I agree with it completely - but it would still
be true even if doubles *were* atomic, and value2 should also be
read/written using VolatileRead/VolatileWrite if it's expected to be
changed by different threads.
Using VolatileRead or VolatileWrite to always access a particular field
VolatileReadI(ref double) is documented as "obtains the very latest written
to a memory location by any processor" and able "to synchronize access to a
field that can be updated by another thread, or by hardware" and "provides
effective synchronization for a field". Locking is usually a better idea;
but in my example it shows that a memory access can move from before a
volatile operation in the CIL instruction sequence to after the operation in
the processor's instruction sequence and synchronize access to a non-atomic
type with Monitor.Enter/Exit.
Your example shows that it can happen with stack access, yes. I think
I've covered my beliefs on that matter elsewhere
I'm just not comfortable with the lack of consistency, the contradictions
with viewing "acquire semantics" and "release semantics" as having anything
to do with anything other than processor write-cache flushing.
Basically we're both suggesting that part of the spec only applies to a
certain situation: I believe that it only applies to heap access, and
you believe that it only applies to processor write-cache flushing.
If I'm right, it means the memory model is doing what I think it should
do: abstracting away the hardware specifics, to specify how the system
as a whole should observably behave. To my mind, that's what layered
specifications are all about. We agree that the spec needs
clarification: if the clarification required needs to break the veneer
of the memory model to show the processor cache, I'll be disappointed.
Not to mention the lack of spec for the .NET 2.0 memory model.
We certainly concur on this matter.