Discovery: dictionaries load slow unless you have the rightkey/value pair in the right format

R

RayLopez99

Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC. But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast: aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this? I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

RL
 
F

Family Tree Mike

RayLopez99 said:
Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC. But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast: aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this? I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

RL


I think it is hard to tell why this might be without an understanding of the
Rectangle and Point definitions. Standard structures and classes that I
know with those names don't have any common constructors that would allow
(i, j) as arguments interchangably.
 
R

RayLopez99

I think it is hard to tell why this might be without an understanding of the
Rectangle and Point definitions.  Standard structures and classes that I
know with those names don't have any common constructors that would allow
(i, j) as arguments interchangably.

You don't deal with Forms do you FTM? Rectangle and Point are Forms
classes, and take x,y (x being the horizontal axis from the top left
corner of your screen, y the vertical). Here i,j are also loop int
variables (two nested for loops).

But if this does not ring a bell, it might be also that I've
simplified it from the actual code, without IMO losing anything from
the actual code. If Jon Skeet was still around he would complain
"post the entire code!" LOL. RIP Jon Skeet, wherever you are.

RL
 
F

Family Tree Mike

You don't deal with Forms do you FTM? Rectangle and Point are Forms
classes, and take x,y (x being the horizontal axis from the top left
corner of your screen, y the vertical). Here i,j are also loop int
variables (two nested for loops).

But if this does not ring a bell, it might be also that I've
simplified it from the actual code, without IMO losing anything from
the actual code. If Jon Skeet was still around he would complain
"post the entire code!" LOL. RIP Jon Skeet, wherever you are.

RL


Then, what constructor of a Rectangle (presumably System.Drawing.Rectangle)
takes an x and y as you just defined as the only parameters?
 
G

Göran Andersson

RayLopez99 said:
You don't deal with Forms do you FTM? Rectangle and Point are Forms
classes,

There is a Rectangle class, but it doesn't have any constructor that
takes any parameters.

There is no Point class, only the Point structure in the System.Drawing
namespace.
and take x,y (x being the horizontal axis from the top left
corner of your screen, y the vertical). Here i,j are also loop int
variables (two nested for loops).

But if this does not ring a bell, it might be also that I've
simplified it from the actual code, without IMO losing anything from
the actual code. If Jon Skeet was still around he would complain
"post the entire code!" LOL.

Well, something is lost. Having a rectangle with only coordinates and no
size doesn't even make sense.
RIP Jon Skeet, wherever you are.

That's what you say about someone who died. That's now what you meant,
right?
 
R

RayLopez99

Well, something is lost. Having a rectangle with only coordinates and no
size doesn't even make sense.

No, the size and location are the x,y, and (not shown) I have
an .Offset extension method that moves the rectangle around (Offsets
it from the original location, everytime the loop loops).

More general question: have you had problems with Dictionary keys?
(selecting them?) I recall reading something about how keys can be
hashed to speed up stuff, so I assume that they are O(n) something or
another when creating...maybe this is an example.

That's what you say about someone who died. That's now what you meant,
right?

Yes. But I hope he's alive. I was only speaking metaphorically, like
he's dead in terms of posting here.

RL
 
G

Göran Andersson

RayLopez99 said:
Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC. But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast: aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this? I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

RL

I found a Point structure in the WINdowsBase assembly, I suppose that's
the one that you are using?

That Point structure implements GetHashCode and Equals, so it should be
usable as a dictionary key.

The Rectangle class has no implementation for GetHashCode or Equals that
uses the coordinates of the rectangle, so that's not suitable as a
dictionary key.

I don't know the exact reason for your performance problem, and it's
hard to tell with so little of the code, but it definitely has nothing
to do with creating a new Point value. The value is copied when the Add
method is called anyway, so if it's copied from a newly created value or
a value that existed before makes no difference at all.
 
G

Göran Andersson

RayLopez99 said:
No, the size and location are the x,y, and (not shown) I have
an .Offset extension method that moves the rectangle around (Offsets
it from the original location, everytime the loop loops).

So the code that you showed isn't the code that you use, and you don't
create new Rectangle object but change existing ones?
More general question: have you had problems with Dictionary keys?
(selecting them?) I recall reading something about how keys can be
hashed to speed up stuff,

The dictionary always uses hash codes.

If the key has a GetHashCode method that gives too little variation in
the hash codes, the Dictioary doesn't work properly, and it gets as slow
as using a regular list.
 
R

RayLopez99

I found a Point structure in the WINdowsBase assembly, I suppose that's
the one that you are using?

That Point structure implements GetHashCode and Equals, so it should be
usable as a dictionary key.

The Rectangle class has no implementation for GetHashCode or Equals that
uses the coordinates of the rectangle, so that's not suitable as a
dictionary key.

Goran--you are onto something very important. It's late now and I
have to go sleep. Below is what I was going to write before I saw
this post.

Any help appreciated.

Here is the important thing (I would say "key" or "point" but it's
confusing): apparently using the Rectangle as a key is fast--if you
are getting a "Point" as the value. But going the other way (using
the "Point" as the key, to get a Rectangle as a value) slows down the
algorithm (Dictionary) tremendously. I am not switching my terms:
what I say is true--going from Rectangle (as key) and Point (as value)
is fast, see the below, but the slow part is going the other way.

Can you figure it out?

Here is the code below.

Another thing: if you cannot figure out how to fix this, then the next
step is this: how to find a Rectangle (a key) given a point (that is
the value in a dictionary, see the below). Why do I ask? Because I
know the points (x,y) that are the logical coordinates (I have a 1000
by 1000 chessboard, so the middle of the chessboard is 500,500), but I
want to find what rectangle is associated with any logical coordinate,
like say 498,501 (also there is a zoom factor, as per Bob Powell's
matrix transform, so depending on the zoom the rectangles change) I am
using a grid that is a two -dimensional array of rectangles (an idea
from an old book--it works pretty well). But an old post by Jon Skeet
said that you cannot find a key from a value--if I could, I could just
use the first table (RectpointDict below)

Thank you.

RL

Question:
why is my dictionary so slow?

I have a 1 million cell collection of Rectangles to Points, stored in
RectpointDict, a dictionary. It takes no time at all to store
Rectangles as the key, and Points as the value. There is a one-to-one
correspondence between Rectangles and Points.

Yet, when I run the below (since I want to store Points as the key),
to create a new dictionary called PointRectangleDict, I get a long
wait (about 30 seconds) on a fast Core 2 Duo with lots of RAM. Any
ideas?

RL

PointRectangleDict = new Dictionary<Point, Rectangle>();

foreach (KeyValuePair<Rectangle, Point> kvp in RectpointDict)


{

PointRectangleDict.Add(kvp.Value, kvp.Key); //makes one dictionary's
key the other's value


}

//http://www.phase9studios.com/2008/01/08/DictionaryVSHashTable.aspx
 
G

Göran Andersson

RayLopez99 said:
Goran--you are onto something very important. It's late now and I
have to go sleep. Below is what I was going to write before I saw
this post.

Any help appreciated.

Here is the important thing (I would say "key" or "point" but it's
confusing): apparently using the Rectangle as a key is fast--if you
are getting a "Point" as the value. But going the other way (using
the "Point" as the key, to get a Rectangle as a value) slows down the
algorithm (Dictionary) tremendously. I am not switching my terms:
what I say is true--going from Rectangle (as key) and Point (as value)
is fast, see the below, but the slow part is going the other way.

Can you figure it out?

Here is the code below.

Another thing: if you cannot figure out how to fix this, then the next
step is this: how to find a Rectangle (a key) given a point (that is
the value in a dictionary, see the below). Why do I ask? Because I
know the points (x,y) that are the logical coordinates (I have a 1000
by 1000 chessboard, so the middle of the chessboard is 500,500), but I
want to find what rectangle is associated with any logical coordinate,
like say 498,501 (also there is a zoom factor, as per Bob Powell's
matrix transform, so depending on the zoom the rectangles change) I am
using a grid that is a two -dimensional array of rectangles (an idea
from an old book--it works pretty well). But an old post by Jon Skeet
said that you cannot find a key from a value--if I could, I could just
use the first table (RectpointDict below)

Thank you.

RL

Question:
why is my dictionary so slow?

I have a 1 million cell collection of Rectangles to Points, stored in
RectpointDict, a dictionary. It takes no time at all to store
Rectangles as the key, and Points as the value. There is a one-to-one
correspondence between Rectangles and Points.

Yet, when I run the below (since I want to store Points as the key),
to create a new dictionary called PointRectangleDict, I get a long
wait (about 30 seconds) on a fast Core 2 Duo with lots of RAM. Any
ideas?

RL

PointRectangleDict = new Dictionary<Point, Rectangle>();

foreach (KeyValuePair<Rectangle, Point> kvp in RectpointDict)


{

PointRectangleDict.Add(kvp.Value, kvp.Key); //makes one dictionary's
key the other's value


}

//http://www.phase9studios.com/2008/01/08/DictionaryVSHashTable.aspx

It's most likely the coordinates that you use for the points that makes
it slow. If the x coordinate is the same as the y coordinate, all points
get the same hash code and the dictionary gets very slow.

I tried adding items to a dictionary with a Point as key, and using new
Point(i, i) I can add about 10000 items in a second. Using new Point(i,
0) I can add about 10000000 items in a second.

The GetHashCode method for the Point structures calculates the hash code
by doing an xor between the x and y coordinates. This means that if x
and y are equal, the hash code becomes zero.

If the x and y coordinates are the same in all your points, you should
create an EqualityComparer to use with the dictionary so that you can
implement the GetHashCode method differently.

But then again, if the x and y coordinates in your points are the same,
it's rather pointless (!) to use a Point as key...
 
I

imuaplease

Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new  Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC.  But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast:  aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this?  I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

RL

The title makes me imagine much, I have to tell you
I still believe noop matter what one says, it is just a freedom of
speech the result is unchanged
 
I

imuaplease

It's most likely the coordinates that you use for the points that makes
it slow. If the x coordinate is the same as the y coordinate, all points
get the same hash code and the dictionary gets very slow.

I tried adding items to a dictionary with a Point as key, and using new
Point(i, i) I can add about 10000 items in a second. Using new Point(i,
0) I can add about 10000000 items in a second.

The GetHashCode method for the Point structures calculates the hash code
by doing an xor between the x and y coordinates. This means that if x
and y are equal, the hash code becomes zero.

If the x and y coordinates are the same in all your points, you should
create an EqualityComparer to use with the dictionary so that you can
implement the GetHashCode method differently.

But then again, if the x and y coordinates in your points are the same,
it's rather pointless (!) to use a Point as key...
There is nothing like making an operation to change one's gender, no
propsal, no go-to <<<<< I have in mind that!
 
A

Anthony Jones

RayLopez99 said:
Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC. But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast: aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this? I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

I've read through this thread so far and what you are actually doing is as
clear as mud.

Please state exactly which namespace Point and Rectangle are from and
exactly what are the data types of i and j in the code above. Add some more
code to the sample if you feel it would make things more clear.
 
F

Frans Bouma [C# MVP]

RayLopez99 said:
Recently I found that dictionaries cannot be used easily if a certain
format is not observed for the key rather than the value.

I had this dictionary:

Dictionary<Point, Rectangle> aDictionary = new Dictionary<Point,
Rectangle>();

then later, inside of a massive (million iteration) loop, I had this:

aRectangle = new Rectangle (i,j);

aDictionary.Add(new point(i,j), aRectangle);

After several hundred iterations, the program slowed down dramatically
on a fast PC. But, strangely, if I switched the value (here a
Rectangle type) and the key (here a Point), it worked fine (fast).

That is, this worked fast: aDictionary.Add(aRectangle, new point
(i,j)); //order reversed, and key is not created 'inline' or 'on the
fly'

Anybody observe this? I think it has to do with the fact a 'new' key
is being created everytime in the loop, rather than a 'new' value.
For some reason, because of the algorithm used for key pairs, keys
cannot easily be created 'behind the scenes'.

Or so it seems (to my mind's eye)

Which point/rectangle are you using, from System.Drawing?
These two use a hashcode which is based on both x and y (and for
rectangle also with and height) using XOR, which shouldn't lead to much
duplicates. Every duplicate will make things slower.

FB


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
R

RayLopez99

        Which point/rectangle are you using, from System.Drawing?
        These two use a hashcode which is based on both x and y (and for
rectangle also with and height) using XOR, which shouldn't lead to much
duplicates. Every duplicate will make things slower.

                FB

Thanks FB--you are onto something. What I observed is for real, not a
coding phantom resulting from something else, IMO. I think because
the key and value are so close, there is not a clear distinction
between the two and thus it's slowing down the algorithm when using
Point to find Rectangle, rather than vice versa, since Rectangle is
constructed from Point. And yes they are part of System.Drawing.

PS--the good news, and it's funny, is that five minutes after I posted
this reply to Goran this morning, it struck me: the solution for this
particular problem of mine (has nothing to do with this post,
theoretically), is trivial--I have a 2D array of rectangles I call
"Board", having signature: Rectangle [,] Board. It turns out that the
i,j elements of the board are also identical for my Point (I am using
a solution offered by Jones in his Windows MFC book, on how to
construct a game board). Thus, for this particular program, here is
the solution:

public Rectangle ReturnBoardRectangle (Point P)
{
int x = P.X;
int y = P.Y;
return Board[x, y]; //done!
}

LOL, I can't believe I overlooked this. No need for a second
dictionary after all.

But, aside from this workarond, it was still educational for me to
post here about this theoretical problem.

RL
 
R

RayLopez99

Please state exactly which namespace Point and Rectangle are from and
exactly what are the data types of i and j in the code above.  Add somemore
code to the sample if you feel it would make things more clear.

Thanks for your help. I figured a workaround as per my other reply
here, but the theoretical problem remains: when chosing key and
value, you must have the value different sufficiently from the key so
that the 'behind the scenes' hashing/algorithm for the dictionary is
efficient. If not (if, as apparently here, the key is derived from
the value, thus the value cannot easily be used as the key in a
reverse dictionary, if you follow the drift) your dictionary creation
is painfully slow.

RL
 
R

RayLopez99

Goran you are a genius. Genius is sometimes defined as the ability to
see a problem and solve it based on limited information. You, unlike
some other posters in this thread (who were very helpful and very
smart to be sure), saw the solution based on the limited info I
provided. Comments below.

It's most likely the coordinates that you use for the points that makes
it slow. If the x coordinate is the same as the y coordinate, all points
get the same hash code and the dictionary gets very slow.

Yes, this is likely the case. For example, I am traversing an array
defining a game board, that can be as high as 3000 x 3000. X,Y
(defined from the upper left corner of your screen), can be, for
example (0,2999), (1,2999), (2,2999), etc (in this case you are
traversing the last row of the board, horizontally, and moving left to
right across the bottom of the screen).
I tried adding items to a dictionary with a Point as key, and using new
Point(i, i) I can add about 10000 items in a second. Using new Point(i,
0) I can add about 10000000 items in a second.

The GetHashCode method for the Point structures calculates the hash code
by doing an xor between the x and y coordinates. This means that if x
and y are equal, the hash code becomes zero.

Yes, again true. You have observed it seems the asymmetry I have
spoken about: using one element of the pair as key is faster than
using the other element of the pair.
If the x and y coordinates are the same in all your points, you should
create an EqualityComparer to use with the dictionary so that you can
implement the GetHashCode method differently.

I would love, for my future reference, to learn what you mean by this
(in my mind's eye I know what you are saying, but I need a concrete
example). If you have a link or easy example please post, but I don't
want to waste your time, so please don't worry about it and go out of
your way--it's not important now. By the way, you have helped me a
lot in the past and now, in fact, I name (for keyword searching) all
interesting discoveries that you have pointed out before (such as
using a predicate logic boolean search function/method) as "GORAN" in
my notes, so I can find them easier!
But then again, if the x and y coordinates in your points are the same,
it's rather pointless (!) to use a Point as key...

Aha! This is incredible that you deduced this, and very true. Below
is what I found right after I logged off this morning, as I just
posted to another here.

RL

PS--the good news, and it's funny, is that five minutes after I posted
this reply to Goran this morning, it struck me: the solution for this
particular problem of mine (has nothing to do with this post,
theoretically), is trivial--I have a 2D array of rectangles I call
"Board", having signature: Rectangle [,] Board. It turns out that the
i,j elements of the board are also identical for my Point (I am using
a solution offered by Jones in his Windows MFC book, on how to
construct a game board). Thus, for this particular program, here is
the solution:

public Rectangle ReturnBoardRectangle (Point P)
{
int x = P.X;
int y = P.Y;
return Board[x, y]; //done!
}

LOL, I can't believe I overlooked this. No need for a second
dictionary after all.

But, aside from this workarond, it was still educational for me to
post here about this theoretical problem.

RL
 
J

Jeff Johnson

You don't deal with Forms do you FTM? Rectangle and Point are Forms
classes

No, they are structures, not classes, and they are in System.Drawing. Don't
use "class" as a catch-all. If you're trying to be generic, say "type."
 
C

Chris Dunaway

public Rectangle ReturnBoardRectangle (Point P)
{
int x = P.X;
int y = P.Y;
return Board[x, y]; //done!
}

Firstly, no need to create the extra variables, just call

return Board[P.X, P.Y];

Secondly, why bother creating this method at all? You could just
reference the board array directly:

i.e. instead of this:

Rectangle r = ReturnBoardRectangle(somePoint);

why not just do this:

Rectangle r = Board[somePoint.X, somePoint.Y];

and eliminate the method? I suppose if your ReturnBoardRectangle does
more than just access the array, then it might be useful.

Chris
 
R

RayLopez99

why not just do this:

         Rectangle r = Board[somePoint.X, somePoint.Y];

and eliminate the method?  I suppose if your ReturnBoardRectangle does
more than just access the array, then it might be useful.

Chris

Good point, thanks Chris. But in fact, I need to access Board from
outside the class it's in, because it's private. But I'll make it a
property and use your suggestion...wait, properties cannot use input
variables...so I'll just shorten the method as per your suggestion.

RL
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top