Dictionary and std::multimap

T

Trecius

Hello, Newsgroupians:

I've a question regarding dictionaries. I have an array elements that I
created, and I'm trying to sort those elements into various sections. Here's
the template of my data type.

DT
{
int nSize;
...
}

I'm trying to create a dictionary, which resembles a jagged array, but I
can't figure but how to do this. I'm trying to sort it according to the
nSize.

EXA: Here's a jagged-array of the structure. nSize is the key, which is
before the colon.

1: Car1, Car2, Car3
3: Car4
8: Car5, Car6

From the example, I have three sizes -- but this, of course can vary. How
can I create a dictionary that would allow me to mimic the std::multimap<>?
I've been trying Dictionary<int, List<DT>>, but I can't figure out how to add
or access the dictionary. One thing I would like to do is determine the
number of keys in the dictionary, which I can do by dictionary.Key.Count, but
then how can I get a count of all the values associated with a specific key?

Thank you, all.


Trecius
 
T

Trecius

Thank you, Mr. Duniho. That is exactly what I'm looking for. Thank you again.


Trecius

Peter Duniho said:
[...]
From the example, I have three sizes -- but this, of course can vary.
How
can I create a dictionary that would allow me to mimic the
std::multimap<>?

I don't know anything about std::multimap said:
I've been trying Dictionary<int, List<DT>>, but I can't figure out how
to add
or access the dictionary.

To add:

Assuming:

Dictionary<int, List<DT>> dict;
DT dtAdd;

Then:

if (dict.Contains(dtAdd.nSize))
{
dict[dtAdd.nSize].Add(dtAdd);
}
else
{
List<DT> list = new List<DT>();

list.Add(dtAdd);
dict.Add(dtAdd.nSize, list);
}

I'm not really sure what you mean by "access", but the above code includes
One thing I would like to do is determine the
number of keys in the dictionary, which I can do by
dictionary.Key.Count, but
then how can I get a count of all the values associated with a specific
key?

For example, where "nSize" is a specific key:

dict[nSize].Count;

Hope that helps.

Pete
 
B

Ben Voigt [C++ MVP]

Peter said:
[...]
From the example, I have three sizes -- but this, of course can vary.
How
can I create a dictionary that would allow me to mimic the
std::multimap<>?

I don't know anything about std::multimap said:
I've been trying Dictionary<int, List<DT>>, but I can't figure out
how to add
or access the dictionary.

To add:

Assuming:

Dictionary<int, List<DT>> dict;
DT dtAdd;

Then:

if (dict.Contains(dtAdd.nSize))
{
dict[dtAdd.nSize].Add(dtAdd);
}
else
{
List<DT> list = new List<DT>();

list.Add(dtAdd);
dict.Add(dtAdd.nSize, list);
}

A little less redundant code:

List<DT> multi;

if (!dict.TryGetValue(dtAdd.nSize, out multi))
dict.Add(dt.nSize, multi = new List<DT>());

multi.Add(dtAdd);
I'm not really sure what you mean by "access", but the above code
includes an example of how to actually access an element in the
Dictionary said:
One thing I would like to do is determine the
number of keys in the dictionary, which I can do by
dictionary.Key.Count, but
then how can I get a count of all the values associated with a
specific key?

For example, where "nSize" is a specific key:

dict[nSize].Count;

Hope that helps.

Pete
 
B

Ben Voigt [C++ MVP]

Not that this would normally matter, but...in a previous test I found
that TryGetValue() was slower than calling Contains() followed by an
access of the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

Because of that, I tend to use Contains() rather than TryGetValue(). The
version of the above code with TryGetValue() only removes a
single call to Add(), at the expense of being slightly harder to
read. Even if it performed better, the reduced obviousness of the
code would IMHO argue in favor of the alternative, but being harder
to read _and_ performing more poorly is a pretty negative
combination, even if hte differences are slight.

Ok, you could use Contains and then either add a new element or retrieve the
existing one. You could even make a class implementing IDictionary that
auto-instantiates missing members and hides the details of how it does so.
YMMV. People who are really more C++ programmers than C# programmers
have a tendency to love highly-compressed code. There's a sort of
"let's see how much functionality we can squeeze into a single line"
mentality going on. :) Personally, I don't mind my code to be
spread out a bit, especially when it's easier to see what's going on.

Well, I wasn't actually trying to make the code denser.

I was trying to first establish an invariant (dict contains an element for
dtAdd.nSize).

Once the invariant is established you no longer need two versions of the
Add-to-list code that need to be maintained (ok, now it's a single function
call but it might not stay that way).
 
B

Ben Voigt [C++ MVP]

Not that this would normally matter, but...in a previous test I found
that TryGetValue() was slower than calling Contains() followed by an
access of the specific element.

I never did figure out why that was, and it seems counter-intuitive to
me. But that's what I found.

I'd like to see that benchmark... because disassembling the code there is no
way that can be true, unless TryGetValue causes a performance bug in the
JIT.

Also Jon has a very relevant blog entry here:
http://msmvps.com/blogs/jon.skeet/a...ple-extension-method-but-a-beautiful-one.aspx
 
M

Marc Gravell

Shame I didn't spot this thread earlier... But!

..NET 3.5 intruduces ILookup<TKey,TValue> which is comparable to a
multimap; the default implementation (Lookup<TKey,TValue>) is immutable,
but it is trivial to write a mutable implementation; in fact, one is in
MiscUtil - EditableLookup<TKey,TValue>:

http://www.pobox.com/~skeet/csharp/miscutil/

And you can use it in 2.0 as well (the MiscUtil build stubs-out the
missing interfaces down in the 2.0 configuration).

Marc
 
W

Willy Denoyette [MVP]

Hmmm... as expected, there is no clear winner.

Compiled with from the command line using "Visual C# 2008 Compiler version
3.5.21022.8 " with /o.
Run on an AMD Phenom 9500 quad-core 2.2GHz.

Iterations: 5000
Data elements: 5000
Data range: 65536
Tests to run: 1, 1, 1, 1, 1, 2, 2, 2, 2, 2

_HistDict2: 00:00:13.2856168
_HistDict2: 00:00:13.2247140
_HistDict2: 00:00:13.2682774
_HistDict2: 00:00:13.2376446
_HistDict2: 00:00:13.2230366
_HistDict3: 00:00:13.3116609
_HistDict3: 00:00:13.2944652
_HistDict3: 00:00:13.2405608
_HistDict3: 00:00:13.2495046
_HistDict3: 00:00:13.2582542

Validating results...done.

Willy.

Nothing would make me happier than to have another pair of eyes look at
the benchmark and explain my observations (if only to show that I did
something wrong in timing the code).

I don't have handy access to the code right now, but I'll post when I do.
Fair warning: I don't have time to clean it up and it was written in
pursuit of a different question. (Here's the context:
http://groups.google.com/group/micr...5a45ca37ee0/4fc79d93deffc692#4fc79d93deffc692)

Okay Ben...have at it. :)

I've copied the basic test program below. In order to keep things brief,
I've deleted most of the tests that actually existed in the original
program, leaving just the two relevant ones.

On my computer, using a command line of "TestHistogramBrief.exe 5000 5000
-1 1 1 1 1 1 2 2 2 2 2" to run each test five times, I get the following
output (_HistDict2 uses Contains(), _HistDict3 uses TryGetValue()):

_HistDict2: 00:00:14.9067082
_HistDict2: 00:00:14.1235553
_HistDict2: 00:00:13.6588232
_HistDict2: 00:00:14.0332028
_HistDict2: 00:00:13.6914594
_HistDict3: 00:00:15.3459849
_HistDict3: 00:00:14.7904829
_HistDict3: 00:00:15.2897546
_HistDict3: 00:00:14.9534329
_HistDict3: 00:00:15.4209609

Tossing out the high and low values, averaging the remaining three for
each, I get 13.949406 for the Contains() version, and 15.196391 for the
TryGetValue() version.

The difference is relatively slight (just over 8% faster for Contains()),
and as you can see the actual times vary a bit. But for me, it looks like
a reasonably reliable result, and sometimes an 8% difference is important.

And no, I don't understand the difference. If you (or anyone else) can
shed some light, I'm all ears. Even if all that it turns out to be is
some mistake in my implementation.

Pete


using System;
using System.Collections.Generic;
using System.Text;
using System.Diagnostics;
using System.IO;

namespace TestHistogram
{
class Program
{
static int imaxRange = 65536;
const int ctestMax = 2;

struct HistPair
{
public int value;
public int count;

public HistPair(int v, int c)
{
value = v;
count = c;
}
}

class TestScenario
{
private readonly int _citer;
private readonly int _cvalues;
private readonly int _imaxRange;

private readonly int[] _rgiValues;
private readonly Stopwatch _sw = new Stopwatch();

public TestScenario(int citer, int cvalues, int imaxRange)
{
_citer = citer;
_cvalues = cvalues;
_imaxRange = imaxRange;

Random rnd = new Random(0);

_rgiValues = new int[_cvalues];
for (int ivalue = 0; ivalue < _cvalues; ivalue++)
{
_rgiValues[ivalue] = rnd.Next(_imaxRange);
}
}

public HistPair[] RghpRunTest(int itest)
{
HistPair[] rghpRet = null;

switch (itest)
{
case 1:
rghpRet = _RghpRunHptest(_HistDict2);
break;
case 2:
rghpRet = _RghpRunHptest(_HistDict3);
break;
default:
Console.WriteLine("invalid test selected ("
+ itest.ToString() + ")");
break;
}

return rghpRet;
}

private delegate HistPair[] HptestDelegate(int[] rgi);

private HistPair[] _RghpRunHptest(HptestDelegate hptd)
{
HistPair[] rghpRet = null;

GC.Collect();
GC.WaitForPendingFinalizers();
_sw.Reset();
_sw.Start();
for (int iiter = 0; iiter < _citer; iiter++)
{
rghpRet = hptd(_rgiValues);
}
_sw.Stop();
Console.WriteLine(hptd.Method.Name + ": "
+ _sw.Elapsed.ToString());

return rghpRet;
}

// HistDict1, but using ContainsKey() to check for invalid keys
private HistPair[] _HistDict2(int[] rgiValues)
{
Dictionary<int, int> dict = new Dictionary<int, int>();

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

int ihp;
HistPair[] rghpRet = new HistPair[dict.Count];

ihp = 0;
foreach (KeyValuePair<int, int> kvp in dict)
{
rghpRet[ihp++] = new HistPair(kvp.Key, kvp.Value);
}

Array.Sort(rghpRet, delegate(HistPair hpA, HistPair hpB)
{
return hpB.count.CompareTo(hpA.count);
});

return rghpRet;
}

// HistDict1, but using TryGetValue to check for invalid keys
and
// retrieve value at the same time
private HistPair[] _HistDict3(int[] rgiValues)
{
Dictionary<int, int> dict = new Dictionary<int, int>();

foreach (int value in rgiValues)
{
int count;

if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

int ihp;
HistPair[] rghpRet = new HistPair[dict.Count];

ihp = 0;
foreach (KeyValuePair<int, int> kvp in dict)
{
rghpRet[ihp++] = new HistPair(kvp.Key, kvp.Value);
}

Array.Sort(rghpRet, delegate(HistPair hpA, HistPair hpB)
{
return hpB.count.CompareTo(hpA.count);
});

return rghpRet;
}
}

static void Main(string[] args)
{
int citer = 10000;
int cvalues = 10000;
List<int> rgitest = new List<int>();

switch (args.Length)
{
default:
goto case 4;
case 4:
for (int iarg = 3; iarg < args.Length; iarg++)
{
int itest;

if (!int.TryParse(args[iarg], out itest))
{
rgitest = null;
break;
}

rgitest.Add(itest);
}
goto case 3;
case 3:
ArgInt(args[2], ref imaxRange);
goto case 2;
case 2:
ArgInt(args[1], ref cvalues);
goto case 1;
case 1:
ArgInt(args[0], ref citer);
goto case 0;
case 0:
break;
}

Console.Error.WriteLine("Iterations: " + citer.ToString());
Console.Error.WriteLine("Data elements: "
+ cvalues.ToString());
Console.Error.WriteLine("Data range: " + imaxRange.ToString());
Console.Error.Write("Tests to run: ");
if (rgitest != null)
{
bool fFirst = true;

foreach (int itest in rgitest)
{
if (!fFirst)
{
Console.Error.Write(", ");
}
fFirst = false;
Console.Error.Write(itest.ToString());
}
}
else
{
rgitest = new List<int>(ctestMax);

for (int itest = 1; itest <= ctestMax; itest++)
{
rgitest.Add(itest);
}

Console.Error.Write("all tests (1-" + ctestMax.ToString()
+ ")");
}
Console.Error.WriteLine(Environment.NewLine);

TestScenario tests = new TestScenario(citer, cvalues,
imaxRange);
List<HistPair[]> rgrghpResults = new List<HistPair[]>();

foreach (int itest in rgitest)
{
HistPair[] rghpT = tests.RghpRunTest(itest);

if (rghpT != null)
{
rgrghpResults.Add(rghpT);
}
}

ValidateResults(rgrghpResults);

if (Debugger.IsAttached)
{
Console.Error.WriteLine("Press Enter/Return to exit");
Console.ReadLine();
}
}

static void ArgInt(string strArg, ref int iarg)
{
int iT;

if (int.TryParse(strArg, out iT) && iT > 0)
{
iarg = iT;
}
}

static void ValidateResults(List<HistPair[]> rgrghp)
{
Console.Error.WriteLine();
Console.Error.Write("Validating results...");

// First check each array individually for correctness
foreach (HistPair[] rghp in rgrghp)
{
int countPrev = rghp[0].count;

for (int ihp = 1; ihp < rghp.Length; ihp++)
{
if (countPrev < rghp[ihp].count)
{
Console.Error.WriteLine("array order error! index
" + ihp.ToString() +
" has a count (" + rghp[ihp].count.ToString() +
") greater than previous element ("
+ countPrev.ToString() + ")");

break;
}

countPrev = rghp[ihp].count;
}

// Reorder the array in a consistent way for later
comparison
Array.Sort(rghp, delegate(HistPair hpA, HistPair hpB)
{
if (hpA.count == hpB.count)
{
return hpB.value.CompareTo(hpA.value);
}

return hpB.count.CompareTo(hpA.count);
});
}

// Then compare each array to each other
for (int irghp = 0; irghp < rgrghp.Count; irghp++)
{
int irghpNext = (irghp + 1) % rgrghp.Count;
HistPair[] rghp = rgrghp[irghp],
rghpNext = rgrghp[irghpNext];

if (rghp.Length != rghpNext.Length)
{
Console.Error.WriteLine("array length mismatch! array
#" +
irghp.ToString() + " has "
+ rghp.Length.ToString() +
" elements, array #" + irghpNext.ToString() + "
has " +
rghpNext.Length.ToString() + " elements");

continue;
}

for (int ihp = 0; ihp < rghp.Length; ihp++)
{
if (rghp[ihp].value != rghpNext[ihp].value)
{
Console.Error.WriteLine("element value mismatch!
index " +
ihp.ToString() + ", array #"
+ irghp.ToString() +
": " + rghp[ihp].value + ", array #"
+ irghpNext.ToString() +
": " + rghpNext[ihp].value);

break;
}
if (rghp[ihp].count != rghpNext[ihp].count)
{
Console.Error.WriteLine("element count mismatch!
index " +
ihp.ToString() + ", array #"
+ irghp.ToString() +
": " + rghp[ihp].count + ", array #"
+ irghpNext.ToString() +
": " + rghpNext[ihp].count);

break;
}
}
}

Console.Error.WriteLine("done.");
}
}
}
 
W

Willy Denoyette [MVP]

Peter Duniho said:
Well, actually my expectation isn't that there's no clear winner, but
rather than the TryGetValue() version is consistently faster.

What I don't understand is why, at least on my computer (apparently not
yours), Contains() is consistently slower. These tests are very
repeatable, and while the variance is definitely higher on my computer
than yours, the slowest time for Contains() is only slower than one test
of TryGetValue(), and frankly that's only because it's the very first test
(which in my tests has consistently always been very slow...slow enough
that I think it'd actually be more fair to run a couple of throw-away
tests and then run five of each throwing out the high and low).

Speaking of that slow first test, I'll point out that even on your
computer, if you toss out the first run, Contains() is faster than
TryGetValue() for all but one other test. I'm not really convinced that
your tests show "no clear winner". I'd be more willing to accept that
statement if you presented a run that includes a couple of throw-away
tests and still showed a random distribution across both tests. Given the
apparently closer results on your computer, doing that is much more
important than it would be for my tests.

But in any case, I'm still without an explanation as to why there should
be such a clear difference on my computer.

Pete


When I say there is no clear winner, I mean a *Clear* winner between both
_HistDict2 and _HistDict3, I don't consider a difference of ~1% a clear
winner.

Your sample benchmark does not really measure the performance of ContainsKey
or TryGetValue, you are measuring a lot more.

So, let's only consider what happens in :

if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey" directly
calls into "FindEntry", while "TryGetValue" prepares a call stack before it
calls "FindEntry", after the return from "FindEntry", TryGetValue stores the
returned value (if Key is found) or a default value (if no Key found) in the
callers stack before returning.
So basically both are executing the same code (the FindEntry function),
except that TryGetValue takes a 15 instruction overhead in case the key not
found (which is most often the case in your benchmark) or an overhead of 16
instruction when the key was found.

Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).
This represents a difference of 9% in number of instructions (but much less
in execution time (<5%)), in favor of "ContainsKey".
Obviously, the difference reflected by the benchmark is much less (< 2%), as
the foreach loop takes only 30% of the total execution time.

Here are the results of some new runs, now after the program was compiled
without /o (yes this runs faster than an optimized build).

Iterations: 5000
Data elements: 5000
Data range: 65536
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 13080 msec.
_HistDict2: 13080 msec.
_HistDict2: 13169 msec.
_HistDict2: 13060 msec.
_HistDict2: 13173 msec.
_HistDict2: 13068 msec.
_HistDict3: 13275 msec.
_HistDict3: 13167 msec.
_HistDict3: 13233 msec.
_HistDict3: 13181 msec.
_HistDict3: 13268 msec.
_HistDict3: 13175 msec.

Validating results...done.

Note that I changed your like here:

Console.WriteLine(hptd.Method.Name + ": " + _sw.ElapsedMilliseconds + "
msec.");

Willy.
 
B

Ben Voigt [C++ MVP]

So, let's only consider what happens in :
if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey"
directly calls into "FindEntry", while "TryGetValue" prepares a call
stack before it calls "FindEntry", after the return from "FindEntry",
TryGetValue stores the returned value (if Key is found) or a default
value (if no Key found) in the callers stack before returning.
So basically both are executing the same code (the FindEntry
function), except that TryGetValue takes a 15 instruction overhead in
case the key not found (which is most often the case in your
benchmark) or an overhead of 16 instruction when the key was found.

Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

Like I surmised earlier... JIT performance bug.
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.
 
W

Willy Denoyette [MVP]

Ben Voigt said:
Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

The overhead is not (entirely) call overhead, TryGetValue returns the value
of the K/V pair or a default value depending on the type of the value as an
out argument. The call overhead is only 5 instructions on X86 and on X64
there is no overhead at all.

Like I surmised earlier... JIT performance bug.
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.

What do you mean with second call to FindEntry? there is only one call to
FindEntry per loop.
 
W

Willy Denoyette [MVP]

Oops, Hit the send key too soon ;-)

Willy.

Ben Voigt said:
So, let's only consider what happens in :

if (dict.ContainsKey(value))
..
if (dict.TryGetValue(value, out count))
...
and forget about the remainder of the methods.

When you look at the JITTED code you'll see that "ContainsKey"
directly calls into "FindEntry", while "TryGetValue" prepares a call
stack before it calls "FindEntry", after the return from "FindEntry",
TryGetValue stores the returned value (if Key is found) or a default
value (if no Key found) in the callers stack before returning.
So basically both are executing the same code (the FindEntry
function), except that TryGetValue takes a 15 instruction overhead in
case the key not found (which is most often the case in your
benchmark) or an overhead of 16 instruction when the key was found.

Victim of the 32 byte threshold for inlining then. TryGetValue is a very
simple function and *ought* to be inlined, but because it is over 32 bytes
of IL, it isn't.

Like I surmised earlier... JIT performance bug.
Both foreach loops take:
loop with ContainsKey:
159/174
loop with TryGetValue:
174/190 instructions to complete (not found/found).

Is this including the second call to FindEntry from the ContainsKey/found
variant? Seems like the call to operator[] should have added a lot more
than 15 instructions in that case.

These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .
Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Willy.
 
J

Jon Skeet [C# MVP]

Willy Denoyette said:
These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

Is that taking into account the indexer also calling FindEntry (on the
"get") though?

That's the odd part for me - ContainsKey followed by an indexer get
needs to look through the hashtable twice, whereas TryGetValue should
only need to look through it once.

Are you saying that the overhead for TryGetValue is the same as the
overhead of effectively calling FindEntry an extra time?
foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .
Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Yes it does - it uses "count + 1" instead of "dict[value] + 1", and
uses the actual return value (the boolean) to determine whether or not
it was found.

Which part of the result do you believe isn't being used?
 
W

Willy Denoyette [MVP]

Jon Skeet said:
Willy Denoyette said:
These are the loops I'm talking about.

foreach (int value in rgiValues)
{
if (dict.ContainsKey(value))
{
dict[value] = dict[value] + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 159 instructions to complete when Key is not found, 174 when
found.
Here ContainsKey directly calls FindEntry...

Is that taking into account the indexer also calling FindEntry (on the
"get") though?

Oh I see what Ben was talking about. No, I did not (but agreed I should
have), nor do I consider the possible intermediate GC run and the fact that
the Dictionary (the backing store) is dynamically extended during the run.
The Dictionary starts with the default size, but gets extended several times
during the run, each extention costs ~2000 cycles. The backing store finaly
gets that big that it ends on the LOH.
That's the odd part for me - ContainsKey followed by an indexer get
needs to look through the hashtable twice, whereas TryGetValue should
only need to look through it once.

It only has to look through the hashtable twice if the key was found.
Are you saying that the overhead for TryGetValue is the same as the
overhead of effectively calling FindEntry an extra time?
No I'm not. I'm only discussing the results of the benchmark, run with the
same command-line arguments as posted by Peter.
I'm only comparing both methods when the key is not found (which is quite
common with 64K possible random values for 5000 entries), in which case
ConTainsKey is the winner. Cases where the key is found are a lot more
complex to compare, TryGetValue is the clear winner when the majority of
keys are found in the dictionary, as illustrated here:

Iterations: 5000
Data elements: 5000
Data range: 100
Tests to run: 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2

_HistDict2: 2936 msec.
_HistDict2: 2915 msec.
_HistDict2: 2913 msec.
_HistDict2: 2912 msec.
_HistDict2: 2906 msec.
_HistDict2: 2924 msec.
_HistDict3: 2206 msec.
_HistDict3: 2204 msec.
_HistDict3: 2199 msec.
_HistDict3: 2200 msec.
_HistDict3: 2194 msec.
_HistDict3: 2204 msec.


Here the number of random keys is limitted to 100, and TryGetValue is ~25%
faster due to the extra FindEntry in ContainsKey.
It's obviouw why this is much faster than the test with 64K keys, only a
few extentions of the backing store have to be done in this case.

foreach (int value in rgiValues)
{
int count;
if (dict.TryGetValue(value, out count))
{
dict[value] = count + 1;
}
else
{
dict[value] = 1;
}
}

Above takes 174 instructions when key is found, 190 when not found. Here
FindEntry is called from inside TryGetValue .
Note that this code does not take advantage of the value returned by
TryGetValue, which somewhat defeats it's purpose.

Yes it does - it uses "count + 1" instead of "dict[value] + 1", and
uses the actual return value (the boolean) to determine whether or not
it was found.

True, but only in the case were the key is found.
Note that the above code could have been written like....
...
dict.TryGetValue(value, out count)
dict[value] = count + 1;
...


Willy.
 
W

Willy Denoyette [MVP]

Peter Duniho said:
[...]
Here the number of random keys is limitted to 100, and TryGetValue is
~25% faster due to the extra FindEntry in ContainsKey.
It's obviouw why this is much faster than the test with 64K keys, only
a few extentions of the backing store have to be done in this case.

Okay, so I take it that the bottom line here is that there are at least
two issues at work:

TryGetValue() _is_ in fact "inferior" in the sense that it can't be
inlined and so results in more overhead for the operation when the key
isn't found. (This isn't to say it's actually an inferior approach...just
that there's something about it that could be considered "inferior").

The other issue is, of course, that the particular input I provided
exposes the worst-case scenario in which the key is usually not found. In
other words, even if TryGetValue() can be considered to be "inferior" in
some specific way, that is only going to be apparent in some specific
scenario.

The part I still don't understand: why TryGetValue() should be _faster_
when the key is found, given that you wrote that both forms of the whole
loop take 174 instructions in that case. Shouldn't the two approaches be
equivalent in that case, rather than TryGetValue() showing a marked
improvement? Is my lack of comprehension due to a naïve understanding of
what "174 instructions" means? That is, that the instruction count isn't
a direct measure of the time the .NET code will take to execute?

I admit, I don't really know when you wrote about "instructions" what
exactly you're talking about. Is that IL instructions? Or the final x86
code? Either way, I can easily see how the count of instructions does not
necessarily tell you precisely how fast the code will execute.

In the end, I think I am most satisfied that, once again, it's been shown
that performance-tuning this sort of thing is fraught with difficulty, if
not entirely pointless. :)

Pete



Ok, let me show you the (raw) call graphs for both foreach loops (taken from
a CPU instrumentation analyzer).

1) ContainsKey - Key found case:

caller (the method) callee
Start
|
3
--------------------------->FindEntry
74
<---------------------------
9
--------------------------->get_Item
93 (includes a call to FindEntry)
<----------------------------
7
--------------------------->Insert
85
<----------------------------
19
|
----> return to Start



2) TryGetValue - Key found case:

Start
|
6
--------------------------->TryGetValue (calls FindEntry)
94
<---------------------------
12
--------------------------->Insert
85
<----------------------------
17
|
----> return to Start

3) ContainsKey - Key NOT found case:

caller (the method) callee
Start
|
3
--------------------------->FindEntry
41
<---------------------------
10
--------------------------->Insert
88
<----------------------------
17
|
----> return to Start


4) TryGetValue - Key NOT found case:

Start
|
6
--------------------------->TryGetValue (calls FindEntry)
56
<---------------------------
10
--------------------------->Insert
88
<----------------------------
17
|
----> return to Start
Note:
- the figures are X86 instruction numbers
- instruction # in callee include instruction counts of sub calls, the
actual call graph is deeper than shown here.

This gives us for:
ContainsKey
Key found: 290 instructions
Key not found: 159 instructions
TryGetValue
Key found: 214 instructions
Key not found: 177 instructions

Notice that ContainsKey is somewhat faster when the key does not exist,
while ContainsKey is considerably slower when the key exists.

Some more remarks:
Beware that the instruction count isn't a direct measure of the time taken
to execute a loop, it can only be used as a good estimate if a major portion
of the loop executes the same call graph. Modern processors have become so
fast, that the real bottleneck is the memory system (cache speed, sizes and
hierarchy). Running the same benchmark on a 20% faster CPU (clock speed)
won't yield a 20% performance gain, while running on a slower CPU with a
"better" memory system might give you better results.
The performance level of these kind of benchmarks are now constrained by the
memory system (memory bound) , things like memory bandwidth, throughput and
latency are since long more important than CPU clock rates.

Willy.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top