unsafe loop not as fast as id like

colin · Oct 16, 2008

Hi, I have a binary sorted tree wich I find my app is spending most of its time in
so I decided to try speed it up a bit ...

I tested converting it from classes to an indexed array of structs so I could use
unsafe pointers, however although it is definatly faster it only manages
about 90 million iterations per second, wich doesnt seem a lot
on my AMD64 3200+ cpu

I have played around with the data size, and made sure ive kept
it within the cpu cache, as it goes a lot slower if I increase
the data size above about 1Mb.

this is just a test, so the tree simply has nodes of 3 ints,
one for the value, and a high and low index.

using randomised data and no tree balancing its slower than the dictionary class,
but with balancing should improve marginaly.
however dictionary class requires a key and value,
and doesnt allow duplicate entries so needs 2 operations and handling for duplicates
so I assumed there was some room for improvment.

public int Find(int hash)
{
int index = 0;
searchItemCnt++;
unsafe
{
fixed (IndexedBinaryTreeNode* nodeArrayPtr = &NodeArray[0])
{
while (true)
{
IndexedBinaryTreeNode* node = nodeArrayPtr + index;
searchNodeCnt++;
if (hash < node->intValue)
{
if (node->_LowerNode < 0)
return -1;
index = node->_LowerNode;
}
else if (hash > node->intValue)
{
if (node->_HigherNode < 0)
return -1;
index = node->_HigherNode;
}
else
{
return index;
}
}
}
}
}

I thought unsafe code would go faster ..

is there any other options I should know about ?
ive turned off overflow check and run it in release from
outside the debugger.

other than that I may consider using a lower level language,
although I imagine there may well already be such implementations available.
or am I expecting too much?

thanks
Colin

colin · Oct 16, 2008

[...]

Click to expand...

90 million iterations per second sounds pretty fast to me. That's only a little over 10 nanoseconds per iteration. (I assume by
"iteration", you mean "calls to Find()").

no, the number of iterations i was refering to is the number of times the
loop is gone round. not the number of function calls.
theres a bout 3 million Finds per second.
this is about 30 searches per find becuase its unbalanced,
whereas idealy it would be 20.

thats about 35 instruction cycles to do a simple compare and
advance.

A binary tree is never going to be as fast, on average, as a hash table
(e.g. Dictionary). So if that's your point of comparison, you can stop now. Use a Dictionary if speed is your utmost priority
and you don't need to order (sort) your data.

dictionary does little better at 5 million finds per second.
that would be barely faster than my tree if it was balanced.

how is a hash table faster than a binary tree ?
im only using an int32 wich is the same size as a hash key.

either way I need to speed this significantly.

use of unsafe is no concern as other parts of code not only
use the unsafe keyword but are actualy more unsafe in nature.

thanks
Colin.

colin · Oct 16, 2008

Peter Duniho said:
[...]
dictionary does little better at 5 million finds per second.
that would be barely faster than my tree if it was balanced.

Click to expand...

If and when you've tested with a balanced tree, then you can say that. Until then, your data shows a hash table to be nearly
twice as fast for the data as the binary tree.

thanks, ive tested with a balanced tree but with the classes rather than
structures, I was simply going by the number of iterations through the loop.
the balance algorithm is quite complicated and fair bit of work to
convert it to use indexed structs, but balancing it is going to make 30% better by my estimate.
so dictionary would stil be faster.

The algorithmic complexity is constant order with respect to the collection size. That is, O(1). A binary tree is logarithmic
for searches: O(log n).

Actual time cost will depend somewhat on implementation and the size of the data. But a binary tree is only going to be
equivalent to a hash table in speed for relatively small data sets.

the amount, size and nature of data is quite varied.
typically is spending a lot of time on huge lists of 3d vectors,
trying to find vectors wich are close together to decide if they
realy should be the same point or not.

ive used 3D Octal trees and reduced the amount i need to process in one list
but theres still lots of other lists, some sorted some not,
some I cant use such 3D spacial trees.

The size of the key has very little to do with the performance characteristics.

I tried c++ loop wich worked out at 13 instructions per iteration,
but the managed overhead killed the performance so it was marginaly less,
I may have to pass down the whole array of data to manipulate to get a performance boost.

one way would be to use the 32 bit data as an index but the memory needed would be inconvenient.

the 32 bit data is often actualy a hash code, one problem I had was that
the managed directx vector has hash code size of only 8 bits,
this crippled the performance of the dictionary class.
I now use a 32 bit crc type of hash code.

the large data sets of 100k+ points are the problem
and need a reasonable hash code size.

However ive looked up hash and binary tree comparison
and found some interesting ideas, such as a skip list.

However I didnt understand skip list exactly,
but im already using octal trees for 3d vector data
and I extended this idea to a BaseN tree.

using a base N=16 tree so each node has an array of 16 sub nodes etc..
this cuts down the maximum searches needed to a maximum of 8
and the memory needed isnt to much, the performance is quite remarkable,
its twices as fast as dictionary for insert, and 7 times
faster than dictionary for retrieval.

ive tried a few diferent bases and compared using structs and/or classes
and the results vary a fair bit, its just a question of trying it on real live data.
idealy i can get it fast enough not using unsafe keyword.

thanks again
Colin

Dictionary<K,V> and Case Insensitive key compare?	2	Aug 7, 2004
Problem Setting the Time Zone with SetTimeZoneInformation	1	Mar 30, 2006
any preformance tips ?	3	Apr 6, 2007
C# compiler fails to optimize for loop same as foreach	18	Jan 5, 2005
CSharp Coding Standards	18	Mar 29, 2007
Got it working! First post from inside Damn Small Linux! Need morehelp pls	32	Jun 8, 2010
Outlook problem, cache mode, outlook0.ost, windows desktop search	0	Oct 31, 2008
Porting C++ to C++(.Net) Problem	1	Apr 20, 2004

unsafe loop not as fast as id like

colin

colin

colin

Ask a Question

Similar Threads