Memory barrier note

William Stacey [MVP] · Jun 10, 2004

I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.
Cheers.

Jon Skeet [C# MVP] · Jun 10, 2004

William Stacey said:
I think the cpu instruction sent is the same to flush the cache (not one for
read and one write.) If both, please advise or provide link for detail.

It may well be a single CPU instruction for x86, which has a fairly
strong memory model. That's no guarantee about what will happen
elsewhere though.

William Stacey [MVP] · Jun 10, 2004

I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

Scott Allen · Jun 10, 2004

Hi William:

As Jon points out, it really is all about ordering. The lock can only
ensure a consistent view of the memory if everyone follows the
protocol: acquire lock, work with shared memory, release lock.

The problem with double check locking is that only one thread will
ever follow the protocol, everyone else cheats and tries to look at
shared memory without acquiring the same lock. Because of this we have
to strictly control the ordering of the memory operations inside of
the lock. Other threads will be peaking at our work while we still
have a work in progress.

We can use a memory barrier to force a strong order - all memory
writes will be seen by an external observer to happen in the same
order as we programmed them. That's really what it's all about.

Jon Skeet [C# MVP] · Jun 10, 2004

William Stacey said:
I posted this to badbrams block and chrisbrumme blog. Post here to get more
eyes.

Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked
works with mem barriers and cache, etc. TIA -- William

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in

Scott Allen · Jun 10, 2004

On P4s and above there is:

MFENCE (full memory barrier)
LFENCE (read (load) barrier)
SFENCE (write (store) barrier)

Note that none of these instructions "flush the cache" of the
processor they execute on or any other processor. They strongly order
instructions. It's up to the cache coherency protocols in Intel
systems to ensure consistency, barriers and locks don't mean "cache
flush".

See:
http://developer.intel.com/design/pentium4/manuals/253666.htm
and
http://developer.intel.com/design/pentium4/manuals/253667.htm

Scott Allen · Jun 10, 2004

On IA-32 architectures I'm pretty sure this would be using cmpxchg8b
with a lock prefix for MP machines. This instruction provides an
atomic read/compare/store operation and acts as a full memory
barrier. A lock(syncRoot) would boil down to the same instruction.

William Stacey [MVP] · Jun 10, 2004

Thanks Scott. That helps.

William Stacey [MVP] · Jun 10, 2004

I *suspect* it will work if Interlocked.Exchange performs a full
bidirectional memory barrier (which it sounds like it does).

Thanks Jon. That is what I hoped was going on. Otherwise I would be more
confused.

I suspect it forms no better than using a lock every time, but I guess
that wasn't what you were interested in

Other then the fact that this is non-blocking after the first creation of
the singleton and comparechange is a faster then taking out a lock before
every test. Not that I would normally do this, but helps in understanding
some different threading problems. Cheers!

William Stacey [MVP] · Jun 10, 2004

Thanks Scott. Glad I posted this. You have any paper you write on this?

William Stacey [MVP] · Jun 10, 2004

Also. So I take it (assuming my singleton example.) That I would also not
have any issue with instance vars inside the singleton that where created
during construction? Say a ref var that was another object. This
interlocked "fence" should protect everything between the fence start and
fence end (assuming no other lazy init is going on inside the first class)?

Scott Allen · Jun 10, 2004

No, I'm afraid not, but I'm sure you can find some if you dig around.
There has to be someone left still slinging code in assembly - I gave
it up about 7 years ago

One reason I remember the cmpxchg8 instruction so well is because it
was the instruction involved in the dreaded Pentium F00F bug - you
could lock up the CPU from user mode code:

http://www.google.com/search?hl=en&lr=&ie=UTF-8&q=cmpxchg8b+bug

Memory barrier note

William Stacey [MVP]

Jon Skeet [C# MVP]

William Stacey [MVP]

Scott Allen

Jon Skeet [C# MVP]

Scott Allen

Scott Allen

William Stacey [MVP]

William Stacey [MVP]

William Stacey [MVP]

William Stacey [MVP]

Scott Allen