Is *all* thread cached data flushed at MemoryBarrier

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Obviously wrapping a critical section around access to some set of
shared state variables flushes any cached data, etc so that the threads
involved don't see a stale copy. What I was wondering is *what*
exactly gets flushed. Does the compiler some how determine the data
that is accessible from that thread, and flush just that set? (Seems
unlikely to me). Is it all data cached in registers etc? Or am I
overthinking this and instead it's more along the lines that a memory
barrier is just invalidating pages of memory such that when another
thread goes to access that memory it checks first to see if that page
needs to be refetched from main memory?

Thanks for any insights,
Tom
 
Hi,

I do not understand clearly what is your question. MemoryBarrier (according
to MSDN) is only significative in Itanium processors, not sure if the .net
is even ported to the itanium to be honest.

My suggestion is to try to see what is the equivalent in the unmanaged
world.
 
When a lock is performed (or Monitor enetr/exit) in implicit read and
write memory barrier is performed to assure that the current thread
does not look at a "stale" value (one that was in a
cache/register/etc). This is the reason (for example) that you cannot
perform a loop on a simple boolean, waiting for it to be changed by
another thread. The "watching" thread is likely to continue to loop
after the boolean has changed value because it is seeing a stale value.


My question is, when this memory barrier is performed, what is the set
of data that gets flushed or gets invalidated (forcing a readthrough)
or gets written-through, or whatever.

Tom
 
When a lock is performed (or Monitor enetr/exit) in implicit read and
write memory barrier is performed to assure that the current thread
does not look at a "stale" value (one that was in a
cache/register/etc). This is the reason (for example) that you cannot
perform a loop on a simple boolean, waiting for it to be changed by
another thread. The "watching" thread is likely to continue to loop
after the boolean has changed value because it is seeing a stale
value.


My question is, when this memory barrier is performed, what is the set
of data that gets flushed or gets invalidated (forcing a readthrough)
or gets written-through, or whatever.

It's defined by the hardware architecture. In the case of x86, the amount
of memory flushed is 0, becuase x86 processors have strong cache coherency
guarantees. In other architectures it will be different, but in all cases,
following a memory barrier, all writes issued before the barrier will be
visible to all CPUs. Whether that's done by cache invalidation, updating
other caches, etc., is defined by the hardware architecture and generally
not visible to the programmer.

-cd
 
OK.

So my example of watching a boolean is only unsafe on x86 if
instrucution (re)ordering is an issue, not because multiple threads
will see different values for that variable.

That's wasn't completely clear to me before.

Thanks!
Tom
 
Tom,

I have to ask, why are you using MemoryBarrier instead of the lock
statement?
 
To be honest, the original question was for informational purposes
only.

I one of those people that always wants to know the "why", not just the
"how".

Any place where I am forced to say to myself "I know if I do this it
will work, but I don't really completely know *why* it does" is a place
where I start buying books, downloading articles, and hitting Google.

On this topic, I've found dedicated books on advanced concurreny to be
thin at best in the .NET world. Java on the other, which has a less
feature-rich set of concurrency options, has a number of excellent
texts available. If anyone can recommend a few highly-detailed books
on the topic (NOT books with just a chapter or two on the topic),
please let me know!

Tom

Tom,

I have to ask, why are you using MemoryBarrier instead of the lock
statement?

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Obviously wrapping a critical section around access to some set of
shared state variables flushes any cached data, etc so that the threads
involved don't see a stale copy. What I was wondering is *what*
exactly gets flushed. Does the compiler some how determine the data
that is accessible from that thread, and flush just that set? (Seems
unlikely to me). Is it all data cached in registers etc? Or am I
overthinking this and instead it's more along the lines that a memory
barrier is just invalidating pages of memory such that when another
thread goes to access that memory it checks first to see if that page
needs to be refetched from main memory?

Thanks for any insights,
Tom
 
Tom,

While I can't really recommend any FULL books on the topic, I can tell
you that for the most part, you will want to use the lock statement (which
in turn is really a call to Monitor.Enter/Monitor.Exit) over MemoryBarrier.
Monitor.Enter/Monitor.Exit is specified in the spec as having to work, and
you should always be able to depend on that.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

To be honest, the original question was for informational purposes
only.

I one of those people that always wants to know the "why", not just the
"how".

Any place where I am forced to say to myself "I know if I do this it
will work, but I don't really completely know *why* it does" is a place
where I start buying books, downloading articles, and hitting Google.

On this topic, I've found dedicated books on advanced concurreny to be
thin at best in the .NET world. Java on the other, which has a less
feature-rich set of concurrency options, has a number of excellent
texts available. If anyone can recommend a few highly-detailed books
on the topic (NOT books with just a chapter or two on the topic),
please let me know!

Tom

Tom,

I have to ask, why are you using MemoryBarrier instead of the lock
statement?

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Obviously wrapping a critical section around access to some set of
shared state variables flushes any cached data, etc so that the threads
involved don't see a stale copy. What I was wondering is *what*
exactly gets flushed. Does the compiler some how determine the data
that is accessible from that thread, and flush just that set? (Seems
unlikely to me). Is it all data cached in registers etc? Or am I
overthinking this and instead it's more along the lines that a memory
barrier is just invalidating pages of memory such that when another
thread goes to access that memory it checks first to see if that page
needs to be refetched from main memory?

Thanks for any insights,
Tom
 
Tom,

No, I do not believe it is safe. And even if it technically were I
certainly wouldn't bank on it because you may later port the code to
another framework version or hardware platform.

Maybe I'm wrong, but as I understand it the x86 memory model only
guarentees that writes cannot move with respect to other writes, but it
doesn't make any guarentees about reads. So it seems to me that you're
example is unsafe. But, I bet you'd have a hard time reproducing the
issue in reality. You'd almost certainly have to have a SMP system to
see it.

Here are some excellent links regarding memory barriers the .NET
framework.

<http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx>
<http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375>
<http://www.yoda.arachsys.com/csharp/threads/volatility.shtml>
<http://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/>

Brian
 
So my example of watching a boolean is only unsafe on x86 if
instrucution (re)ordering is an issue, not because multiple threads
will see different values for that variable.

That's wasn't completely clear to me before.

Reordering *is* an issue, however. Memory barriers are about preventing
*effective* reordering, whether that's done by the JIT or due to caches
etc.
 
Brian Gideon said:
Maybe I'm wrong, but as I understand it the x86 memory model only
guarentees that writes cannot move with respect to other writes, but it
doesn't make any guarentees about reads. So it seems to me that you're
example is unsafe. But, I bet you'd have a hard time reproducing the
issue in reality. You'd almost certainly have to have a SMP system to
see it.

I thought that, but it's very easy to see "effective" memory read moves
- where a value is basically only read once instead of being reread
each time through a loop:

using System;
using System.Threading;

public class Test
{
static volatile bool stop;

static void Main()
{
ThreadStart job = new ThreadStart(ThreadJob);
Thread thread = new Thread(job);
thread.Start();

// Let the thread start running
Thread.Sleep(500);

// Now tell it to stop counting
stop = true;

thread.Join();
}

static void ThreadJob()
{
int count=0;
while (!stop)
{
count++;

}
}
}

That stops half a second after you start it. Take the "volatile" bit
out, and it'll run forever (at least it does on my single processor P4,
when compiled with optimisation enabled).
 
Jon said:
I thought that, but it's very easy to see "effective" memory read
moves
- where a value is basically only read once instead of being reread
each time through a loop:

using System;
using System.Threading;

public class Test
{
static volatile bool stop;

static void Main()
{
ThreadStart job = new ThreadStart(ThreadJob);
Thread thread = new Thread(job);
thread.Start();

// Let the thread start running
Thread.Sleep(500);

// Now tell it to stop counting
stop = true;

thread.Join();
}

static void ThreadJob()
{
int count=0;
while (!stop)
{
count++;

}
}
}

That stops half a second after you start it. Take the "volatile" bit
out, and it'll run forever (at least it does on my single processor
P4, when compiled with optimisation enabled).

But this is just due to code hoisting by the JIT and has nothing to do with
the memory model at the CLR or CPU level. The volatile modifier inhibits
the hoising of the read out of the loop, so the thread stops like you'd
expect. Without voliatile, the read is hoisted and the variable only read
once, since the compiler can easily prove that nothing in the loop affects
the value of the variable.

-cd
 
Carl Daniel [VC++ MVP]

But this is just due to code hoisting by the JIT and has nothing to do with
the memory model at the CLR or CPU level. The volatile modifier inhibits
the hoising of the read out of the loop, so the thread stops like you'd
expect. Without voliatile, the read is hoisted and the variable only read
once, since the compiler can easily prove that nothing in the loop affects
the value of the variable.

But it's the memory model which specifies what the JIT can do. That's
what I'm saying - regardless of CPU architecture, the JIT can do
optimisations which change the "apparent" read time of a variable. The
optimisations it's able to do are controlled by the memory model at the
CLR level.
 
First off, thanks to everyone contributing to the thread...this is why
I post here!

I have never used MemoryBarrior in any code other than tests for my own
education, as I said the original question was more theoritcal.

By the way, I have seen caching/reordering (it's often hard to
effectively tell which) in common environments.

This is kind of what I was investiagting. While I am well aware of
"how" to prevent it (and I of course do so), I wanted to know more
about what was going on under the covers.

And the fact that a simple question on the underlying behavior of a
memory barrier has blossomed into this debate on behavior only
underlines what I was saying before. There seems to be nothing
authoratative out there on this topic. If there us room for debate,
then there is room for error and misunderstanding.

As an example of what I'd like see, I do a lot of p/invoke and COM
interop and the text ".NET and COM - The Complete Interoperabiliy
Guide" is my idea of a great book on that topic. I can only hope such
a volume is created in regards to concurency on the .NET / Windows
platform.

Thanks again,
Tom
 
Jon said:
That stops half a second after you start it. Take the "volatile" bit
out, and it'll run forever (at least it does on my single processor P4,
when compiled with optimisation enabled).


Which framework version were you using? I tried it with 1.1 and 2.0 on
my dual core laptop and I could only see it run forever with 2.0. I
guess 2.0 is more aggressive in its optimizations. At the very least
this proves that those who naively rely on it being safe in 1.1 will
get burned when they port their code to 2.0.
 
Brian said:
Which framework version were you using? I tried it with 1.1 and 2.0 on
my dual core laptop and I could only see it run forever with 2.0. I
guess 2.0 is more aggressive in its optimizations. At the very least
this proves that those who naively rely on it being safe in 1.1 will
get burned when they port their code to 2.0.

I only tried it with 2.0 yesterday, but I think I've tried similar
programs with 1.1 before. I wouldn't like to swear to it though...

Jon
 
Tom,

One thing I should point out, which Jon already eluded to, is that we
code using the CLR memory model. The hardware memory model is mostly
irrelevant from a .NET developer's perspective because the CLR sits on
top of it. So your example is certainly unsafe because the CLR
specification says it is. We shouldn't be too concerned with the
differences between x86, AMD64, IA64, etc. architectures. That's the
job of the CLR. But, I do share your interest in learning exactly what
is going on behind the scenes.

Brian
 
This little sample shows reordering (or some type of caching) on 1.1:


class ConcurrencyTest
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{

ConcurrencyTest test = new ConcurrencyTest();
test.Start();

}

private uint m_First;
private uint m_Second;
private System.Threading.Thread m_Incrementor;
private System.Threading.Thread m_Inspector;

public void Start()
{

Console.WriteLine("Tets running");

m_Incrementor = new Thread(new ThreadStart(Increment));
m_Inspector = new Thread(new ThreadStart(CheckValues));
m_Incrementor.Start();
m_Inspector.Start();

}

private void Increment()
{
while(true)
{
m_First++;
m_Second++;
}
}

private void CheckValues()
{
while(true)
{

uint first = m_First ;
uint second = m_Second;

if (first < second)
{
Console.WriteLine("First is {0} and Second is {1}", first,second);
Thread.Sleep(1000);
}

}

}
}
 
This little sample shows reordering (or some type of caching) on 1.1:

I don't think so. It would be masked by the race condition between the
reads of m_First and m_Second. m_First could be read and m_Second
incremented several times before it is eventually read.
 
Back
Top