Correct usage of the volatile keyword

Z

Zuma

Assuming I have a class like the following;

public class Family
{
private *volatile* Person _child;

public Person Child
{
get { return _child; }

set { _child = value; }
}
}

An instance of this class is shared by multiple threads in a multiprocessor
system and the Child property can be read or written by any of them. In such
a case, should I define the _child variable as volatile because of cache
coherency? I'm asking it because the source code (.NET Reflector) of "thread
safe" classes in the .NET Framework do not specify the volatile keyword when
defining publicly readable/writable member variables. Most of the documents
I've found on the Internet talk about the synchronization, memory barriers
and instruction reordering issues when explaining the volatile keyword, but
none of them mentions the cache coherency in detail.
 
J

Jon Skeet [C# MVP]

Zuma said:
Assuming I have a class like the following;

public class Family
{
private *volatile* Person _child;

public Person Child
{
get { return _child; }

set { _child = value; }
}
}

An instance of this class is shared by multiple threads in a multiprocessor
system and the Child property can be read or written by any of them. In such
a case, should I define the _child variable as volatile because of cache
coherency? I'm asking it because the source code (.NET Reflector) of "thread
safe" classes in the .NET Framework do not specify the volatile keyword when
defining publicly readable/writable member variables. Most of the documents
I've found on the Internet talk about the synchronization, memory barriers
and instruction reordering issues when explaining the volatile keyword, but
none of them mentions the cache coherency in detail.

Very few classes in the framework are marked as being thread-safe -
could you give an example of the kind of class you're talking about?

Usually it's best to let the caller do appropriate locking (which gives
the effect of volatility, when done correctly) - after all, when the
Person reference has been returned, they'll need to manipulate that in
a thread-safe manner too.
 
B

Ben Voigt [C++ MVP]

Jon Skeet said:
Very few classes in the framework are marked as being thread-safe -
could you give an example of the kind of class you're talking about?

Usually it's best to let the caller do appropriate locking (which gives
the effect of volatility, when done correctly) - after all, when the
Person reference has been returned, they'll need to manipulate that in
a thread-safe manner too.

Volatile access in .NET is a joke, because it isn't supported for
parameters.

Most of the various optimizations that volatile helps with aren't allowed in
..NET anyway, just use the lock statement or synchronization classes
(Monitor, Event, etc).
 
J

Jon Skeet [C# MVP]

Ben Voigt said:
Volatile access in .NET is a joke, because it isn't supported for
parameters.

I think that's putting it a bit strongly. Could it be more useful? Yes.
Is it useful as it stands? I believe so.
Most of the various optimizations that volatile helps with aren't allowed in
.NET anyway, just use the lock statement or synchronization classes
(Monitor, Event, etc).

True - I tend to use locks (as discussed seemingly unendingly before)
but being able to use volatile for a single flag is occasionally handy.
 
Z

Zuma

Thanks for your replies;

Anohter example first;

public class Test
{
private static Test _singleton = new Test();

public static Test Singleton
{
get { return _singleton; }
}
}

This is just the classic singleton pattern in the .NET Framework. There is
no requirement for a lock or any other synchronization primitive. However
*theoratically* if you use it in multiple threads on a processor
architecture with a weak cache coherency model, it might happen that some of
the threads get null as the singleton.

The processor might fetch the memory page containing the _singleton variable
to its cache line before _singleton is initialized. (This might happen
because of a completely unrelated read operation near in the memory)
Thereafter when the thread calls the getter of the Singleton property, the
variable will be served from the cache, which is null.

So in my understanding if you plan to support architectures with weak cache
coherency rules, every variable used by more than one thread (regardless of
its synchronization needs) must be volatily read and written.

I *must* be wrong with my assumption above, because none of the classes I've
seen in the Framework has such a use of volatile fields. However I cannot
find any reasonable explanation for it. Every article refers to memory
models and synchronization issues only.
 
J

Jon Skeet [C# MVP]

Anohter example first;

public class Test
{
private static Test _singleton = new Test();

public static Test Singleton
{
get { return _singleton; }
}

}

This is just the classic singleton pattern in the .NET Framework. There is
no requirement for a lock or any other synchronization primitive. However
*theoratically* if you use it in multiple threads on a processor
architecture with a weak cache coherency model, it might happen that some of
the threads get null as the singleton.

I don't believe it ever will. Reasoning about the memory model is very
tricky (as we've seen before) and I haven't seen anything explicitly
about this in the CLI spec, but I've asked about it before now and
received the answer that it really isn't a problem. There already has
to be some synchronization in order to verify whether or not the type
has been initialized - I suspect it drops naturally out of that.

So in my understanding if you plan to support architectures with weak cache
coherency rules, every variable used by more than one thread (regardless of
its synchronization needs) must be volatily read and written.

I *must* be wrong with my assumption above, because none of the classes I've
seen in the Framework has such a use of volatile fields. However I cannot
find any reasonable explanation for it. Every article refers to memory
models and synchronization issues only.

Volatility *is* a memory model issue - but most of the time you solve
it using locking
rather than the volatile keyword.

..NET 2.0 has more guarantees than the CLI memory model, although
they're not exactly well documented. The closest I've come is an MSDN
article:
http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/

Jon
 
Z

Zuma

Hi again;

Thanks a lot for your answer. I think my mistake was considering the
volatile keyword mainly as a cache coherency solution. The MSDN
documentation and the C# specification differ in their explanation of it. In
the MSDN documentation it states only

"The volatile keyword indicates that a field might be modified by multiple
concurrently executing threads. Fields that are declared volatile are not
subject to compiler optimizations that assume access by a single thread.
This ensures that the most up-to-date value is present in the field at all
times."

However the specification mentions nothing about this fact. It only explains
the instruction reordering constraints with volatile reads/writes. So there
are two different descriptions. From the articles I've read I came to the
conclusion that the MSDN documentation is misleading. In a multithreaded
environment, from the programmers perspective there is always cache
coherency no matter of the underlying cache levels. The variables you read
always contain the most up-to-date value written by any other thread. The
purpose of the volatile keyword is only to create a memory fence usually to
have lock-free synchronization blocks.

Am I correct? :)
 
J

Jon Skeet [C# MVP]

Thanks a lot for your answer. I think my mistake was considering the
volatile keyword mainly as a cache coherency solution. The MSDN
documentation and the C# specification differ in their explanation of it. In
the MSDN documentation it states only

"The volatile keyword indicates that a field might be modified by multiple
concurrently executing threads. Fields that are declared volatile are not
subject to compiler optimizations that assume access by a single thread.
This ensures that the most up-to-date value is present in the field at all
times."

However the specification mentions nothing about this fact. It only explains
the instruction reordering constraints with volatile reads/writes. So there
are two different descriptions.

It's not talking necessarily about CPU instruction reordering so much
as logical *memory model* instruction reordering.

They're different descriptions of the same thing.
From the articles I've read I came to the
conclusion that the MSDN documentation is misleading. In a multithreaded
environment, from the programmers perspective there is always cache
coherency no matter of the underlying cache levels. The variables you read
always contain the most up-to-date value written by any other thread. The
purpose of the volatile keyword is only to create a memory fence usually to
have lock-free synchronization blocks.

No, that's not right. The two descriptions are of the same thing - the
result of all the memory ordering prohibitions described in the spec
is the observable behaviour described by MSDN.
The thing with your example is that the type initializer guarantees
ensure that no thread will see the value of the singleton variable
before the type initializer has completed. It's very poorly documented
in that regard, unfortunately.

Jon
 
Z

Zuma

Hi again,

So, when a variable is prefetched in a cache line prior actual reading, then
theoratically this is also a reordering (like storing the value in a temp
variable first) and a volatile read prohibits this condition (because of the
acquire semantics) and forces a read from the main memory. So the memory
model rules apply to the C# compiler, JIT compiler *and hardware*. Is this
correct?

If yes, I will have another related question :)
 
J

Jon Skeet [C# MVP]

So, when a variable is prefetched in a cache line prior actual reading, then
theoratically this is also a reordering (like storing the value in a temp
variable first) and a volatile read prohibits this condition (because of the
acquire semantics) and forces a read from the main memory. So the memory
model rules apply to the C# compiler, JIT compiler *and hardware*. Is this
correct?

Absolutely - in my reading of the spec, at least :)
I believe it applies to the observable behaviour of the system - a CLI
has to take account of the hardware it's running on in order to meet
the spec guarantees.

Jon
 
W

Willy Denoyette [MVP]

Jon Skeet said:
Absolutely - in my reading of the spec, at least :)
I believe it applies to the observable behaviour of the system - a CLI
has to take account of the hardware it's running on in order to meet
the spec guarantees.

Jon


That's correct Jon. The JIT is responsible to maintain correct semantics for
a given target processor by emitting the necessary instruction for that
processor, including processor-specific memory ordering ops like
load-acquire, fences ...

Willy.
 
Z

Zuma

Hi again!

After reading the x86 and Itanium manuals I think I've completely grasped
the concept. Now I have a question, just for curiosity.

public class Person
{
private String _name;

public Person(String name) { _name = name; }

public String Name {
get { return _name; }
}
}

int main()
{
Person p = new Person("Zuma");

new Thread(Thread2).Run( p );
}

void Thread2(Object o)
{
**Thread.MemoryBarrier();**

Person p = (Person) o;

Console.WriteLine(p.Name);
}

On a x86 or x64 system this code will always work without the MemoryBarrier
call. However on a different multicore architecture (Itanium for example)
with weak cache coherency the second thread must first invalidate the
processor cache, because the memory locations belonging to the Person object
might have been prefetched. (which will return null for the _name variable).
MemoryBarrier causes a full fence which has the effect of flushing and
invalidating the processor cache. What do you think about this assumption?

And I realized that the VolatileRead/Write methods in the x86 version of the
Framework are just plain reads and assignments with an extra call to the
MemoryBarrier function (.NET Reflector). :)
 
J

Jon Skeet [C# MVP]

Zuma said:
After reading the x86 and Itanium manuals I think I've completely grasped
the concept. Now I have a question, just for curiosity.

public class Person
{
private String _name;

public Person(String name) { _name = name; }

public String Name {
get { return _name; }
}
}

int main()
{
Person p = new Person("Zuma");

new Thread(Thread2).Run( p );
}

void Thread2(Object o)
{
**Thread.MemoryBarrier();**

Person p = (Person) o;

Console.WriteLine(p.Name);
}

On a x86 or x64 system this code will always work without the MemoryBarrier
call.

Well, in practice it always will - but it's worth bearing in mind that
the memory model doesn't just apply to CPU memory model reordering. It
*also* applies to the constraints the JIT has to run under.

I don't believe the JIT would ever reorder this so that it can't work,
but *if* somehow the call to create and run a thread didn't involve a
"release" operation, then *in theory* a JIT could reorder them.

As I say, I don't think it's an issue in this case - but there are
other situations where you need to make things volatile (or use a lock)
due to JIT issues rather than CPU caches. For instance, this kind of
code:

bool keepGoing;

....

void Foo()
{
while (keepGoing)
{
...
}
}

is flawed because the JIT could enregister the value of keepGoing and
never read a new value written to by a different thread.
However on a different multicore architecture (Itanium for example)
with weak cache coherency the second thread must first invalidate the
processor cache, because the memory locations belonging to the Person object
might have been prefetched.

I don't believe it's a problem for new threads - I don't believe the
read can be moved earlier than thread creation - but the reordering of
the write is a theoretical (though not practical) problem I believe.
 
W

Willy Denoyette [MVP]

Zuma said:
Hi again!

After reading the x86 and Itanium manuals I think I've completely grasped
the concept. Now I have a question, just for curiosity.

public class Person
{
private String _name;

public Person(String name) { _name = name; }

public String Name {
get { return _name; }
}
}

int main()
{
Person p = new Person("Zuma");

new Thread(Thread2).Run( p );
}

void Thread2(Object o)
{
**Thread.MemoryBarrier();**

Person p = (Person) o;

Console.WriteLine(p.Name);
}

On a x86 or x64 system this code will always work without the
MemoryBarrier call. However on a different multicore architecture (Itanium
for example) with weak cache coherency the second thread must first
invalidate the processor cache, because the memory locations belonging to
the Person object might have been prefetched. (which will return null for
the _name variable). MemoryBarrier causes a full fence which has the
effect of flushing and invalidating the processor cache. What do you think
about this assumption?

And I realized that the VolatileRead/Write methods in the x86 version of
the Framework are just plain reads and assignments with an extra call to
the MemoryBarrier function (.NET Reflector). :)

The Whidbey memory model (Framework V2) targets both IA-32 and IA-64, this
memory model assumes that every shared write (ordinary as well as
interlocked) becomes globally visible to all other processors
simultaneously. This is implicitly true because all writes have release
semantics on IA-32 and X64 CPU's, on IA-64 it's just a matter of emitting a
st.rel (store release)instruction for every write to perform each
processor's stores in order and to make them visible in the same order to
other processors (the that is, the execution environment is Processor
Consistent).
Above means that the JIT has to emit a st.rel for _name = name; when run on
IA-64 (64 bit managed code), so the other thread will actually see p.Name
pointing to the string.
Add to that that a thread creation implies a full barrier (fence), so in
this particular case it's not required to include a MemoryBarrier in the
thread procedure. Note that the CLR contains other services that implicitly
raise memory barriers, think of Monitor.Enter, Monitor.Exit, ThreadPool
services, IO services..... So I think that for all except the extreme cases,
you can live without thinking about MemoryBarriers in managed code even
when compiled for IA-64.

Willy.

Willy.
 
Z

Zuma

Hi Jon!

I greatly appreciate your help. Thank you very much for your support. I have
a more solid understanding of all these memory model stuff now.
 
Z

Zuma

Thanks for the explanation Willy!

Willy Denoyette said:
The Whidbey memory model (Framework V2) targets both IA-32 and IA-64, this
memory model assumes that every shared write (ordinary as well as
interlocked) becomes globally visible to all other processors
simultaneously. This is implicitly true because all writes have release
semantics on IA-32 and X64 CPU's, on IA-64 it's just a matter of emitting
a st.rel (store release)instruction for every write to perform each
processor's stores in order and to make them visible in the same order to
other processors (the that is, the execution environment is Processor
Consistent).
Above means that the JIT has to emit a st.rel for _name = name; when run
on IA-64 (64 bit managed code), so the other thread will actually see
p.Name pointing to the string.
Add to that that a thread creation implies a full barrier (fence), so in
this particular case it's not required to include a MemoryBarrier in the
thread procedure. Note that the CLR contains other services that
implicitly raise memory barriers, think of Monitor.Enter, Monitor.Exit,
ThreadPool services, IO services..... So I think that for all except the
extreme cases, you can live without thinking about MemoryBarriers in
managed code even when compiled for IA-64.

Willy.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top