MemoryBarrier vs volatile vs lock

G

Guest

Hi All,

I'm trying to get my head around synchronization.

Documentation seems to say that creating a volatile field gives a
memorybarrier.

Great! But when I do a little performance testing Thread.MemoryBarrier is
40 times slower than accessing a volatile int. Also I don't see any
significant difference between the time to increment a volatile int and a
non-volatile int.

Could the difference all be attributable to the overhead of a function call?

When I read documentation on Thread.VolatileWrite it says that all other
processors are guaranteed to see your write immediately.

But when I use reflector to decompile VolatileWrite I get:

public static void VolatileWrite(ref int address, int value)
{
Thread.MemoryBarrier();
address = value;
}

Now if the MemoryBarrier is before the write couldn't I throw the
MemoryBarrier on processor1 read value (through the cache) on processor 2 and
then write to address on processor 1 now the next read of address on
processor2 gets its "cached" value from before the new write.

(Boy it's as hard to write about this stuff as it is to read).
processor1: Thread.MemoryBarrier
processor2: read address getting old_value (old_value cached)
processor1: address = new_value
processor2: read address still returning old_value.

VolatileWrite the way it is written seems to guarantee that any write before
it will get to main memory before this write but makes no guarantees about
this write.

Thanks
 
G

Guest

My Results of running following code:

baseline = 00:00:00.0318235
volatile_int = 00:00:00.0320384
Memory Barriers = 00:00:01.1449614
VolatileMethods = 00:00:02.3170651
locked = 00:00:00.4858723
Interlocked = 00:00:00.0907154

Code:

using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.Threading;

namespace ConsoleApplication1
{
class Program
{
private object _syncobj = new Object();
private int m_a;
private volatile int m_v;

public void Test()
{
Stopwatch sw = new Stopwatch();

m_a = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
m_a++;

Console.WriteLine("baseline = " + sw.Elapsed);

sw.Reset();
m_v = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
m_v++;

Console.WriteLine("volatile_int = " + sw.Elapsed);

sw.Reset();
m_v = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
{
// Thread.MemoryBarrier();
m_v++;
Thread.MemoryBarrier();
}

Console.WriteLine("Memory Barriers = " + sw.Elapsed);

sw.Reset();
m_a = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
Thread.VolatileWrite(ref m_a, Thread.VolatileRead(ref m_a)+1);

Console.WriteLine("VolatileMethods = " + sw.Elapsed);

sw.Reset();
m_a = 0;
sw.Start();

for (int i = 0; i < 10000000; ++i)
{
lock(_syncobj)
{
m_a++;
}
}

Console.WriteLine("locked = " + sw.Elapsed);

sw.Reset();
m_a = 0;
sw.Start();

for (int i = 0; i < 10000000; ++i)
{
Interlocked.Increment(ref m_a);
}

Console.WriteLine("Interlocked = " + sw.Elapsed);

sw.Reset();
}
static void Main(string[] args)
{
Program p = new Program();
p.Test();
}
}
}
 
G

Guest

DanGo said:
My Results of running following code:

baseline = 00:00:00.0318235
volatile_int = 00:00:00.0320384
Memory Barriers = 00:00:01.1449614
VolatileMethods = 00:00:02.3170651
locked = 00:00:00.4858723
Interlocked = 00:00:00.0907154

Note that using a lock is considerably faster than using a memorybarrier.

Doesn't a lock imply a memory barrier ( a read barrier at the top and a
write barrier at the bottom) Since the lock is 4-5 times faster than a pair
of memory barriers and twice as fast as a single barrier. We could use lock
in place of calls to MemoryBarrier.

If a volatile field really gives us a barrier it is much faster. In
addition it also gives the compiler hints on how to treat the field.
 
D

Daniel O'Connell [C# MVP]

DanGo said:
Note that using a lock is considerably faster than using a memorybarrier.

Are you using a multiple processor machine? I ran your test on my machine
and got these results:

baseline = 00:00:00.0191971
volatile_int = 00:00:00.0233645
Memory Barriers = 00:00:00.9672369
VolatileMethods = 00:00:01.9528108
locked = 00:00:01.7800082
Interlocked = 00:00:00.7257996

Memory Barrier is considerably faster than using a lock. I don't know the
synchronization structure well, but its possible that lock doesn't actually
use a memory barrier if there is only one processor.
 
J

Jon Skeet [C# MVP]

Memory Barrier is considerably faster than using a lock. I don't know the
synchronization structure well, but its possible that lock doesn't actually
use a memory barrier if there is only one processor.

It should do - otherwise you can still have problems, if (for instance)
another thread has a variable value cached in a register.
 
D

Daniel O'Connell [C# MVP]

Jon Skeet said:
It should do - otherwise you can still have problems, if (for instance)
another thread has a variable value cached in a register.

Good point, perhaps a more limited memory barrier, no processor cache flush,
just a register flush to processor cache?

Thats conjecture mostly, I have no clue how the performance of sync
primatives work, other than his machine does is not the same as what mine
does.
 
J

Jon Skeet [C# MVP]

Daniel O'Connell said:
Good point, perhaps a more limited memory barrier, no processor cache flush,
just a register flush to processor cache?

Possibly - but then couldn't MemoryBarrier do the same thing?
Thats conjecture mostly, I have no clue how the performance of sync
primatives work, other than his machine does is not the same as what mine
does.

Right. For the record, here are the results on my single processor
Pentium-M laptop:

baseline = 00:00:00.0388890
volatile_int = 00:00:00.0393999
Memory Barriers = 00:00:00.2775134
VolatileMethods = 00:00:00.6350966
locked = 00:00:00.3517676
Interlocked = 00:00:00.1730569

Interesting to see that your baseline and volatile int are better than
mine, but all the rest of mine are better than yours. Odd.
 
D

Daniel O'Connell [C# MVP]

Jon Skeet said:
Possibly - but then couldn't MemoryBarrier do the same thing?

Very possibly, but I have no clue if it does or not. Peformance gets to be a
sticky thing when you scale out in processors.

Right. For the record, here are the results on my single processor
Pentium-M laptop:

baseline = 00:00:00.0388890
volatile_int = 00:00:00.0393999
Memory Barriers = 00:00:00.2775134
VolatileMethods = 00:00:00.6350966
locked = 00:00:00.3517676
Interlocked = 00:00:00.1730569

Interesting to see that your baseline and volatile int are better than
mine, but all the rest of mine are better than yours. Odd.

Might be raw speed, might be due to multiple processors, might be because
yours is a laptop and mine not, might be luck. This is a dual 2.2ghz with
hyperthreading enabled.
 
G

Guest

Here is an updated version of the source:

This version uses baseline as a normalization for the other tests:

class Program
{
private object _syncobj = new Object();
private int m_a;
private volatile int m_v;

public void Test()
{
Stopwatch sw = new Stopwatch();

m_a = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
m_a++;
sw.Stop();
Console.WriteLine("baseline = " + ": 1.0");
float baselineNormalization = sw.ElapsedTicks;
sw.Reset();
m_v = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
m_v++;

sw.Stop();
Console.WriteLine("volatile_int = " + ": " +
(sw.ElapsedTicks/baselineNormalization).ToString());

sw.Reset();
m_v = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
{
// Thread.MemoryBarrier();
m_v++;
Thread.MemoryBarrier();
}

sw.Stop();
Console.WriteLine("Memory Barriers = " + ": " + (sw.ElapsedTicks
/ baselineNormalization).ToString());

sw.Reset();
m_a = 0;
sw.Start();
for (int i = 0; i < 10000000; ++i)
Thread.VolatileWrite(ref m_a, Thread.VolatileRead(ref m_a)+1);

sw.Stop();
Console.WriteLine("VolatileMethods = " + ": " + (sw.ElapsedTicks
/ baselineNormalization).ToString());

sw.Reset();
m_a = 0;
sw.Start();

for (int i = 0; i < 10000000; ++i)
{
lock(_syncobj)
{
m_a++;
}
}

sw.Stop();
Console.WriteLine("locked = " + ": " + (sw.ElapsedTicks /
baselineNormalization).ToString());

sw.Reset();
m_a = 0;
sw.Start();

for (int i = 0; i < 10000000; ++i)
{
Interlocked.Increment(ref m_a);
}

sw.Stop();
Console.WriteLine("Interlocked = " + ": " + (sw.ElapsedTicks /
baselineNormalization).ToString());

sw.Reset();
}
static void Main(string[] args)
{
Program p = new Program();
p.Test();
}
 
D

Daniel O'Connell [C# MVP]

DanGo said:
Here is an updated version of the source:

This version uses baseline as a normalization for the other tests:
Still seeing quite a bit of difference, what did you get?:

baseline = : 1.0
volatile_int = : 1.007561
Memory Barriers = : 53.18861
VolatileMethods = : 105.0428
locked = : 97.86585
Interlocked = : 39.32712
 
G

Guest

Normalized results:

baseline = : 1.0
volatile_int = : 0.8990427
Memory Barriers = : 30.83289
VolatileMethods = : 59.72657
locked = : 12.79946
Interlocked = : 2.360716


Notes:
1) volatile field is about the same performance as non volatile
2) interlocked has great performance.
3) locked is faster than memory barrier
4) VolatileRead and VolatileWrite suck. (technically speaking)


Normalizing Daniel's results:
baseline = : 1.0
volatile_int = : 1.2
Memory Barriers = : 50.38
VolatileMethods = : 101.72
locked = : 92.7
Interlocked = : 37.8

5) volatile field still looks great
6) interlocked is a lot slower
7) Memory Barrier and VolatileMethods are about twice as slow
8) locked is 7 times slower and slower than a memory barier


Jon Skeets results:
baseline = 1.0
volatile_int = 1.013137391
Memory Barriers = 7.136038468
VolatileMethods = 16.33100877
locked = 9.045426727
Interlocked =4.450021857

9) Ok volatile still is great.
10) Interlocked performance is much better than Daniel's
11) locked is only marginally slower than MemoryBarrier (not faster as on my
laptop)

12) VolatileRead/Write are slower but not nearly as slow as Daniel and I are
seeing.

I'm surprised to see such a variety.
I'm using Beta 2. No multiprocessor no hyperthreading.

What are each of you using?
 
G

Guest

So what about VolatileWrite?

Does this look like a correct implementation:
public static void VolatileWrite(ref int address, int value)
{
Thread.MemoryBarrier();
address = value;
}
 
D

Daniel O'Connell [C# MVP]

DanGo said:
So what about VolatileWrite?

Does this look like a correct implementation:
public static void VolatileWrite(ref int address, int value)
{
Thread.MemoryBarrier();
address = value;
}

It looks alright to me, although it is a change since the 1.x framework used
an internal call for Thread.VolatileWrite(), but that should result in a
safe operation.
 
D

Daniel O'Connell [C# MVP]

Daniel O'Connell said:
Beta 2, multiprocessor with hyperthreading

Just to extend, its a dual 2.2 ghz. Old motherboard though, still uses good
ole RDRAM.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top