J
Jon Harrop
Ben said:Right. Because the Gen0/Gen1/Gen2 heaps... aren't. They are really
stacks.
Sure, but bumping a pointer means reading it, incrementing it and writing
it. To make that atomic you must wrap it in a lock, right?
Ben said:Right. Because the Gen0/Gen1/Gen2 heaps... aren't. They are really
stacks.
Jon Harrop said:Sure, but bumping a pointer means reading it, incrementing it and writing
it. To make that atomic you must wrap it in a lock, right?
From above (incomplete) posted code I see that you call into unmanaged
code ( pinnedArray_Set_To_Zero_WithIntelIPPSetValUsingSSE), this involves
transitioning from managed to unmanaged code, which takes time when not
done with "unmanaged code security" walk disabled,
and the pinning takes time too.
Also, you didn't consider the time needed for a thread switch to zero the
memory from another thread.
Zeroing the memory as done by by the CLR's memory manager, needs no
transition from managed code to unmanaged code and there is no pinning
involved either . Allocating 64KByte using VirtualAlloc is extremely fast,
there is absolutely no reason to burn a separate thread for this, a thread
switch may actually be much slower than the actual VirtualAlloc call.
Jon said:Would you consider Interlocked.Increment to be taking out a lock?
Jon Harrop said:I assume that is a pointer bump wrapped in a lock, yes.
Jon said:Sure, but bumping a pointer means reading it, incrementing it and
writing it. To make that atomic you must wrap it in a lock, right?
Atmapuri said:Hi!
Security was disabled on that call.
Not when pinned outside of the timed loop.
Zeroing by VirtualAlloc is not faster than fastest zeroing you can
make especially since VirtualAlloc is not able to make use of
SSE2/SSE3, since it has to run code that runs on all CPU's. The
zeroing I timed is close to the fastest zeroing you can make "ever".
Posting a code is not a problem, but I was hoping that you will take
the suggestion as well meant and try to do something about it
rather than denying that an issues exists.
Jon said:The point of the Interlocked class is to avoid requiring a lock - at
least a user-mode lock. I don't know the details of the implementation,
but I believe it uses "native" (i.e. CPU level) atomic operations.
They're likely to be significantly faster than CLR-level locking.
Yes, there will still be overhead compared with incrementing in a non-
threadsafe way - but it's not at the same level as most people mean
when they talk about acquiring a lock.
Ben said:No. See the x86 instruction set, which provides atomic bus operations.
They still use the term "LOCK prefix",
but it is not a lock in the software sense of the word, it uses bus
arbitration instead.
Jon Harrop said:Indeed, I just benchmarked it here and the lock used by the Interlocked.Add
member is ~5x slower than direct access but still 2x faster than using
Monitor directly from F# code.
This raises another interesting question: if only a single thread is bumping
pointers in a GC heap and the GC stays out until the minor heap is
exhausted then the GC could let the user thread know that the lock isn't
required and make the pointer bump 5x faster. Looks like the current CLR
implementation doesn't do that but it would give a huge performance boost
to many heavily-allocating programs...
Jon said:Potentially, yes. It would be interesting to have each 4 threads on a 4
proc box, and each allocating from its own GC heap. So long as each
thread stays on its own processor, you could do wonderful things.
It would certainly make an interesting research project.
Jon Harrop said:Even if the user thread is moved to a different CPU there is still no
contention for its heap.
This approach is already commonplace in other systems.
Jon Harrop said:Ben said:Jon said:Ben Voigt [C++ MVP] wrote:
"object allocation" is nothing more than bumping a pointer in an
atomic operation.
Right. Because the Gen0/Gen1/Gen2 heaps... aren't. They are really
stacks.
Sure, but bumping a pointer means reading it, incrementing it and
writing it. To make that atomic you must wrap it in a lock, right?
No. See the x86 instruction set, which provides atomic bus operations.
That's a lock...
They still use the term "LOCK prefix",
See, they even call it a lock.
Whether or not you regard this inherent mutual exclusion as a lock, it is
the source of .NET's slow allocation.
I'm actually less concerned about the current 5x performance hit and more
concerned about how that performance hit will change in the future, as the
memory gap continues to expand.
Ben Voigt said:Right. Because the Gen0/Gen1/Gen2 heaps... aren't. They are really
stacks.
Jon said:True - although it would make it slower in terms of caching etc.
I suspect the difficulty is knowing when to apply it. It's no doubt
highly suitable for some situations, but probably awful for others.
Willy said:What 5x performance hit are you talking about, what are you comparing and
how did you measure?.
The number of threads and processors involved in this exercise are alsoJon said:Time taken to increment a counter 100M times using each of four different
methods:
. Direct access
. Interlock.Add
. Monitor.Enter and Exit
. F#'s higher-order "lock" function
I can post the code if you like.
Jon Harrop said:Time taken to increment a counter 100M times using each of four different
methods:
. Direct access
. Interlock.Add
. Monitor.Enter and Exit
. F#'s higher-order "lock" function
I can post the code if you like.
Jon said:Time taken to increment a counter 100M times using each of four
different methods:
. Direct access
. Interlock.Add
. Monitor.Enter and Exit
. F#'s higher-order "lock" function
I can post the code if you like.
Atmapuri said:Hi!
Security was disabled on that call.
Not when pinned outside of the timed loop.
Zeroing by VirtualAlloc is not faster than fastest zeroing you can make
especially since VirtualAlloc is not able to make use of SSE2/SSE3, since
it has to run code that runs on all CPU's. The zeroing I timed is close
to the fastest zeroing you can make "ever".
Posting a code is not a problem, but I was hoping that you will take the
suggestion as well meant and try to do something about it
rather than denying that an issues exists.
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.