Generic Dictionary performance?

  • Thread starter Thread starter Rune B
  • Start date Start date
R

Rune B

Hi Group

I was considering using a Generic Dictionary<> as a value container inside
my business objects, for the reason of keeping track of fields changed or
added and so on.
- But how expensive is it to instantiate/use Generic Dictionaries in great
numbers (let's just say 100000's ), in terms of memoryuse and performance?

Any practical experiences out there?


--------------
Simplified example:


public class SomeItem
{

public SomeItem()
{
}

private Dictionary<string, object> _values = new Dictionary<string,
object>();

private string GetString(string fieldname)
{
if(_values.ContainsKey(fieldname))
return _values[fieldname];
return null;
}

private void SetValue(string fieldname, object value)
{
if(_values.ContainsKey(fieldname))
_values[fieldname] = value;
else
_values.Add(fieldname, value);
}

public string Key
{
get { return this.GetString("Key"); }
set { this.SetValue("Key", value); }
}

public string ItemName
{
get { return this.GetString("ItemName"); }
set { this.SetValue("ItemName", value); }
}

// and a number of properties more
}
 
- But how expensive is it to instantiate/use Generic Dictionaries in great
numbers (let's just say 100000's ), in terms of memoryuse and performance?

The real question here is not "how expensive is it to instantiate/use
Generic Dictionaries" but "what is the most optimal solution to provide the
functionality I require? A simple string array might be the solution,
depending on your actual business requirements, or anything up to and
including a custom class. What you have to do is compare the specific needs
of your app to determine a minimum set of necessary features, and ask what
is the tool that can provide that minimum set of features? At that point, it
doesn't matter "how expensive" it is to use that tool; it only matters that
it is the most efficient tool for the job.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.

Rune B said:
Hi Group

I was considering using a Generic Dictionary<> as a value container inside
my business objects, for the reason of keeping track of fields changed or
added and so on.
- But how expensive is it to instantiate/use Generic Dictionaries in great
numbers (let's just say 100000's ), in terms of memoryuse and performance?

Any practical experiences out there?


--------------
Simplified example:


public class SomeItem
{

public SomeItem()
{
}

private Dictionary<string, object> _values = new Dictionary<string,
object>();

private string GetString(string fieldname)
{
if(_values.ContainsKey(fieldname))
return _values[fieldname];
return null;
}

private void SetValue(string fieldname, object value)
{
if(_values.ContainsKey(fieldname))
_values[fieldname] = value;
else
_values.Add(fieldname, value);
}

public string Key
{
get { return this.GetString("Key"); }
set { this.SetValue("Key", value); }
}

public string ItemName
{
get { return this.GetString("ItemName"); }
set { this.SetValue("ItemName", value); }
}

// and a number of properties more
}
 
Kevin Spencer said:
The real question here is not "how expensive is it to instantiate/use
Generic Dictionaries" but "what is the most optimal solution to provide
the functionality I require? A simple string array might be the solution,
depending on your actual business requirements, or anything up to and
including a custom class. What you have to do is compare the specific
needs of your app to determine a minimum set of necessary features, and
ask what is the tool that can provide that minimum set of features? At
that point, it doesn't matter "how expensive" it is to use that tool; it
only matters that it is the most efficient tool for the job.

Yeah, I see your point - the functionality if an Dictionary<string, object>
is exactly what is needed here. (somewhat similar to a DataRow without
versions and schema).

I just got concerned with performance when I looked at the Dictionary with
reflector. It seemed at first glance like an pretty complex structure, with
a lot of boxing and internal stuff going on... so I've considered solving it
with a simpler model, that would require a lot more code out side the "item"
class
- but why bother if it actually performed very well...

R
 
Kevin Spencer said:
The real question here is not "how expensive is it to instantiate/use
Generic Dictionaries" but "what is the most optimal solution to provide the
functionality I require?

I would say it's not even that. I'd ask: "What is the simplest, most
readable and maintainable solution which gives me the performance and
functionality I require?"

The "most optimal" solution is often far from the simplest one, but
often the simplest one performs well *enough*.

Of course, this post only makes sense when interpreting "most optimal"
in terms of performance - if you meant it in a more holistic "best in
many different ways" sense, then we probably already agree :)
 
I just got concerned with performance when I looked at the Dictionary with
reflector. It seemed at first glance like an pretty complex structure, with
a lot of boxing and internal stuff going on... so I've considered solving it
with a simpler model, that would require a lot more code out side the "item"
class
- but why bother if it actually performed very well...

That's *nearly* exactly the right attitude. However, instead of "very
well" I believe you ought to be testing whether it performs "well
enough". I haven't personally run into any situations where the
performance of any built-in map classes has been too slow for my
purposes - but I haven't written the exact app you're writing.

If you're really worried, you should run tests with data which is as
close to "real" data as possible, and measure whether or not Dictionary
performs well enough. Personally, however, I wouldn't worry until I had
actually found that it's the bottleneck in the application.

See http://www.interact-sw.co.uk/iangblog/2005/11/09/profiling for an
excellent essay on this sort of thing.
 
Rune said:
Hi Group

I was considering using a Generic Dictionary<> as a value container inside
my business objects, for the reason of keeping track of fields changed or
added and so on.
- But how expensive is it to instantiate/use Generic Dictionaries in great
numbers (let's just say 100000's ), in terms of memoryuse and performance?

A great way to find out is to write a small test-program that does just
that :)
Any practical experiences out there?

I haven't instantiated that many Dictionary's, so not here ;)
Simplified example:
private Dictionary<string, object> _values = new Dictionary<string,
object>();
[usage of _values as storage for members]

What's wrong with simply declaring the fields:

public string Key;
...

or with properties, if you prefer that:

protected string key;
public string Key { get { return key; } set { key = value; } }

Are you writing genreric functionality in terms of the members in the
class? That could probably be written using reflection instead, for
eaxample by requiering your business-objects to implement something like:

interface IBusinessFieldView
{ IDictionary<string,FieldInfo> Fields { get; } }

or with properties if you prefer that:

interface IBusinessPropertyView
{ IDictionary<string,PropertyInfo> Properties { get; } }

The implementation of Fields/Properties could be saved staticly in each
class and you could pretty easily write helper-functions that would make
the implementation look something like:

class Foo: IBusinessPropertyView {
protected string key;
public string Key { get { ...; } set { ...; } }
protected string itemName;
public string ItemName { get { ...; } set { ...; } }
public static IDictionary<string,PropertyInfo> Properties =
Util.FindProperties(typeof(Foo), "Key", "ItemName", ...);
IDictionary<string,PropertyInfo> IBusinessPropertyView Properties
{ get { return this.Properties; } }
}

and generic usage (in all IBusinessPropertyView's) somewhat like:

IBusinessPropertyView foo = ...;
foreach ( string member in foo.Properties.Keys )
Console.WriteLine("{0} = {1}",
member,
foo.Properties[member].GetValue(foo));
 
Jon Skeet said:
That's *nearly* exactly the right attitude. However, instead of "very
well" I believe you ought to be testing whether it performs "well
enough". I haven't personally run into any situations where the
performance of any built-in map classes has been too slow for my
purposes - but I haven't written the exact app you're writing.

I guess you haven't been using Typed DataSets much (for large quantities of
data).
Convinient but slow, and in my case it seemed to me it was absolutly
overkill.
Next convinient approach was the Dictionary, and it seemed to be just the
thing-

If you're really worried, you should run tests with data which is as
close to "real" data as possible, and measure whether or not Dictionary
performs well enough. Personally, however, I wouldn't worry until I had
actually found that it's the bottleneck in the application.

Thanks for your advise, and of couse there will be testing...

See http://www.interact-sw.co.uk/iangblog/2005/11/09/profiling for an
excellent essay on this sort of thing.

Absolutely, this one is bookmarked.

Regards, Rune
 
Helge Jensen said:
Rune said:
Hi Group

I was considering using a Generic Dictionary<> as a value container
inside
my business objects, for the reason of keeping track of fields changed or
added and so on.
- But how expensive is it to instantiate/use Generic Dictionaries in
great
numbers (let's just say 100000's ), in terms of memoryuse and
performance?

A great way to find out is to write a small test-program that does just
that :)
Any practical experiences out there?

I haven't instantiated that many Dictionary's, so not here ;)
Simplified example:
private Dictionary<string, object> _values = new Dictionary<string,
object>();
[usage of _values as storage for members]

What's wrong with simply declaring the fields:

public string Key;
...

or with properties, if you prefer that:

protected string key;
public string Key { get { return key; } set { key = value; } }

Are you writing genreric functionality in terms of the members in the
class? That could probably be written using reflection instead, for
eaxample by requiering your business-objects to implement something like:

interface IBusinessFieldView
{ IDictionary<string,FieldInfo> Fields { get; } }

or with properties if you prefer that:

interface IBusinessPropertyView
{ IDictionary<string,PropertyInfo> Properties { get; } }

The implementation of Fields/Properties could be saved staticly in each
class and you could pretty easily write helper-functions that would make
the implementation look something like:

class Foo: IBusinessPropertyView {
protected string key;
public string Key { get { ...; } set { ...; } }
protected string itemName;
public string ItemName { get { ...; } set { ...; } }
public static IDictionary<string,PropertyInfo> Properties =
Util.FindProperties(typeof(Foo), "Key", "ItemName", ...);
IDictionary<string,PropertyInfo> IBusinessPropertyView Properties
{ get { return this.Properties; } }
}

and generic usage (in all IBusinessPropertyView's) somewhat like:

IBusinessPropertyView foo = ...;
foreach ( string member in foo.Properties.Keys )
Console.WriteLine("{0} = {1}",
member,
foo.Properties[member].GetValue(foo));

--
Helge Jensen
mailto:[email protected]
sip:[email protected]
-=> Sebastian cover-music: http://ungdomshus.nu <=-
 
Are you writing genreric functionality in terms of the members in the
class? That could probably be written using reflection instead, for
eaxample by requiering your business-objects to implement something like:

<nice example>

Yeah, that's pretty much the point, and at some point I was going for the
reflection approach.
But what worries me is that you'll then have to maintain a large number of
strings:
Util.FindProperties(typeof(Foo), "Key", "ItemName", ...);
- Everytime you decide to add/rename/remove another Property...

The idea isn't bad though, I'll include it in the testbench.

Rune
 
Rune B said:
I guess you haven't been using Typed DataSets much (for large quantities of
data).

Well, I wouldn't really class DataSets as "map classes" - they happen
to have that too, but there's so much else in there that I'm not
surprised there's a performance hit.
Convinient but slow, and in my case it seemed to me it was absolutly
overkill. Next convinient approach was the Dictionary, and it seemed
to be just the thing-
Right.


Absolutely, this one is bookmarked.

Ian's rather good, isn't he? :)
 
Jon Skeet said:
Ian's rather good, isn't he? :)

Yeah, this really freshened my view of things.
And I love it when it shines through the guy really knows what he is talking
about.

Thanks again. R-)
 
Rune said:
<nice example>

Thanks, It works rather well too (atleast for me :)

It has the additional advantage that a class/object can participate in
any number of these protocols.
But what worries me is that you'll then have to maintain a large number of
strings:


- Everytime you decide to add/rename/remove another Property...

The idea isn't bad though, I'll include it in the testbench.

Doesn't the dictionary approach require even more code and
naming-properties-via-strings? and it's prone to misspellings.

Util.Findproperties will throw if a property isn't found.

I think of "my" approach as declarative:

public static IDictionary<string,PropertyInfo> Properties =
Util.FindProperties(typeof(Foo), "Key", "ItemName", ...);

effectively *declares* that the business-properties of Foo are "Key",
"ItemName", .... Any other state is not part of that view. I really
haven't found a shorter, more convinient or less error-prone way to
indicate this (yet ;)
 
Hi Jon,
Of course, this post only makes sense when interpreting "most optimal"
in terms of performance - if you meant it in a more holistic "best in
many different ways" sense, then we probably already agree :)

Yes, that is what I meant. There are many considerations to take into
account, and I didn't want to go into gory detail. My main point was that,
rather than asking quantitatively about a single possible solution, using a
term like "how expensive it it?" one should look at the problem from the
point of view of the problem, rather than one solution, and identify the
best possible solution to the problem. In addition, "how expensive" is a
relative quantitative term. How expensive something is, is always relative
to how expensive something else is. Without considering other solutions, the
question becomes meaningless.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.
 
Kevin said:
Yes, that is what I meant. There are many considerations to take into
account, and I didn't want to go into gory detail. My main point was that,
rather than asking quantitatively about a single possible solution, using a
term like "how expensive it it?" one should look at the problem from the
point of view of the problem, rather than one solution, and identify the
best possible solution to the problem.
Right.

In addition, "how expensive" is a
relative quantitative term. How expensive something is, is always relative
to how expensive something else is. Without considering other solutions, the
question becomes meaningless.

Hmm... in performance terms, I'd disagree. I don't need to know how
fast an ArrayList is compared with List<T> in order to discover whether
either of them is "fast enough". An answer to the "how expensive"
question of "not too expensive" is adequate in many situations, IMO.
Maybe I'm just splitting hairs again though :)

Jon
 
Hi Jon,
Hmm... in performance terms, I'd disagree. I don't need to know how
fast an ArrayList is compared with List<T> in order to discover whether
either of them is "fast enough". An answer to the "how expensive"
question of "not too expensive" is adequate in many situations, IMO.
Maybe I'm just splitting hairs again though :)

All quantitative terms are relative. It is impossible to express a quantity
without a relative quantity to compare it to. For example, if I were to ask
"How expensive is this watch?" You could answer "3 dollars" or you could
answer "3 pounds" (sorry, I don't know how to create the "Pound" character
with this editor!). Of course, that means that the expense of the watch is
equivalent to the value of 3 dolloars or 3 pounds. However, this tells you
nothing about the value of a dollar or a pound. You would have to state that
as relative to something else, such as gold, silver, or man-hours of work.
If I ask you "How expensive is this watch?" and you do not specify something
that it can be related to, such as "3," the answer is meaningless.

So, the real question is not "how expensive" something is, but how expensive
it is relative to something else. You might say that solution A is more
expensive than solution B, but less expensive than solution C. However,
without comparing the relative merits of other solutions, the question
becomes meaningless.

And of course, this is all related to putting the proverbial cart before the
horse. When one is building a house, and needs a tool to perform a certain
task, one does not ask one's self "How does this drill perform." but
instead, "what is the best tool for this particular task?" It's all a matter
of requirements. One may not be looking for a drill at all. If so, one may
not be looking for the particular drill one is asking about, but a different
drill altogether. IOW, it is the nature of the problem which drives the
solution.

Therefore, a better answer would be obtained, not by asking about a
particular drill, but by describing the problem in detail, and asking for
recommendations for a solution, and reasons for the recommendations. One
might suggest that one is considering a particular drill, but rather than
limiting the discussion to that drill, would be looking for recommendations
that might better perform than the particular drill, for the problem. For
example, one could say "I am doing thus and such with this piece of wood,
and have been considering the use of this particular drill. Can someone
confirm this as the best solution, or offer a better one, as well as telling
me what their justification would be for that decision?"

Does that make sense?

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.
 
Kevin said:
All quantitative terms are relative.

Yes - but not necessarily relative to another actual solution.

So, the real question is not "how expensive" something is, but how expensive
it is relative to something else. You might say that solution A is more
expensive than solution B, but less expensive than solution C. However,
without comparing the relative merits of other solutions, the question
becomes meaningless.

Not in my view. For instance, if I were looking at whether to use an
existing solution to a problem or whether to write my own, I could ask
the question "How expensive is the existing one?" and find out whether
or not it's too expensive for my needs, before working out whether I
even *could* write a faster one myself.

In other words "Fast enough" is an answer to the "how expensive"
question which is relative to the requirements, not relative to any
other solutions.

I'm initially thinking only in terms of performance, as that's what was
originally asked about. Arguably the same could be said of other
things: "How readable is this code?" "Readable enough - I can
understand it well enough at a glance to be convinced it's not going to
become a problem."
And of course, this is all related to putting the proverbial cart before the
horse. When one is building a house, and needs a tool to perform a certain
task, one does not ask one's self "How does this drill perform." but
instead, "what is the best tool for this particular task?" It's all a matter
of requirements. One may not be looking for a drill at all. If so, one may
not be looking for the particular drill one is asking about, but a different
drill altogether. IOW, it is the nature of the problem which drives the
solution.

Exactly. That's where I was disagreeing (perhaps badly!) with your
statement that:
"Without considering other solutions, the question becomes
meaningless."

It's quite possible to work out whether a given solution is acceptable
without considering any others.

I suspect we're on the same page, but neither of us is expressing
ourselves clearly enough.

Jon
 
Kevin said:
"How expensive is this watch?" You could answer "3 dollars" or you could
answer "3 pounds" (sorry, I don't know how to create the "Pound" character
with this editor!). Of course, that means that the expense of the watch is
equivalent to the value of 3 dolloars or 3 pounds

I'm sorry to possibly paraphrase -- but it seems like the best way to
shed light on the discussion.

Jon is just saying that an explicit comparison isn't always required, as
soon as the cost drops below a certain threshold it doesn't matter anymore.
Does that make sense?

Well yes, in a way. It's just that in programming it's not always about
the "best" solution -- it's about getting any working solution with
least possible effort to the developer (possibly amortized over time :).

The anlogy here would possibly be: You've got an employee (CPU) -- the
only thing to deliver before tomorrow is a hole in the wall. The
employee can make the hole with a spoon in 1 year, a hammer in 1 hour or
a powerdrill in 1 minute. Now clearly the spoon is out -- but it really
doesn't matter if you set him to work with the hammer or the powerdrill,
it does matter -- however if the employee already knows how to use a
hammer or a powerdrill.
 
Hi Jon,
I suspect we're on the same page, but neither of us is expressing
ourselves clearly enough.

I suspect it as well. I believe we're using the same terminology to talk
about 2 different aspects of the issue.

For example:
It's quite possible to work out whether a given solution is acceptable
without considering any others.

I can certainly understand where you're coming from with this. Development
is a multi-facted task, and involves various limitations, which engender
various compromises. While it might be "the best of all possible worlds" to
examine all alternatives, the art of development dictates that one
prioritize one's tasks, and abandon the less-important (but desirable)
possible tasks in order to meet deadlines, keep within budget, etc.

Still, considering the following statement:
Not in my view. For instance, if I were looking at whether to use an
existing solution to a problem or whether to write my own, I could ask
the question "How expensive is the existing one?" and find out whether
or not it's too expensive for my needs, before working out whether I
even *could* write a faster one myself.

In other words "Fast enough" is an answer to the "how expensive"
question which is relative to the requirements, not relative to any
other solutions.

I must mention that the possibility of writing a solution for one's self is
not the only possible alternative. Another existing solution could be a
better fit, and take the same amount of the developer's time. Any developer
who insists on writing his/her own custom classes for everything is not
likely to accomplish much of anything.

Again, it's a matter of balancing the requirements against the possible
tasks and the available resources.

I may be a perfectionist, but I'm also a realist!

I do find it helpful, however, to examine my thoughts about a given problem,
and make them less specific, breaking them down into more generalized "inner
terminology" (as it were). This process often leads to solutions which might
not as easily present themselves when considering only the specific
requirements of the present task. And I believe that the way we think about
problems is directly related to our ability to solve them well.

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.
 
Hi Helge,
The anlogy here would possibly be: You've got an employee (CPU) -- the
only thing to deliver before tomorrow is a hole in the wall. The
employee can make the hole with a spoon in 1 year, a hammer in 1 hour or
a powerdrill in 1 minute. Now clearly the spoon is out -- but it really
doesn't matter if you set him to work with the hammer or the powerdrill,
it does matter -- however if the employee already knows how to use a
hammer or a powerdrill.

The only problem with this idea is exemplified in what happened to Microsoft
when they got involved in enough software products. Various products used
very similar functionality, but each one had a proprietary solution. Over
time, this created a great deal of trouble for the company. In recent years,
they have been working on using shared functionality on a much greater
scale, and OOP lends itself very well to this sort of development.

By thinking about the components of a solution in a less specific, more
abstract way, and taking a bit more time to consider the eventual
consequences of each choice, you should be able to save yourself a lot of
trouble over the long haul.

Still, one does have to live in the real world, and limitations are what
they are. There is always a "fish or cut bait" point in the process, and we
all have to deal with when to take that step as well!

--
HTH,

Kevin Spencer
Microsoft MVP
..Net Developer

Presuming that God is "only an idea" -
Ideas exist.
Therefore, God exists.
 
Back
Top