fastest searchable datastructure?

Michel Posseth [MCP] · Jan 17, 2008

IMHO

The Hashtable will not work for the TS as the object limit in .Net = 2 GB
the hashtable way will blow up with a out of memory exception wit the
amounts the TS wants to store

so the only way that will probably work is the also in this thread proposed
database aproach

Michel

TheSilverHammer · Jan 18, 2008

Dictionaries are close to O(1) which is as fast as you can get.

However, as others have pointed out, 10^6 is a lot of space, never mind 10^9.

If you are looking at 32 bit pointers (references) you are using 4 megs just
for the references to be held. 8 megs at 64 bit. Exactly how big are your
objects? At 10^6 you can say you will need one MEGABYTE of RAM per BYTE of
size for each of your objects. IE: An object who's size is 20 bytes will
cost you 20 megs of RAM to hold 10^6 of them.

I would seriously look at your data architecture and see if you can lower
the number of instances you need to hold at once. The only application I
can think of that would need such a huge quantity of data might be a game
terrain database. In those cases you really do not need all of the terrain
loaded at once and can section them off to blocks. In the cases where you do
need all that data available at once, you end up shrinking the map size to
something reasonable or implementing a lot of compression.

Jon Skeet [C# MVP] · Jan 18, 2008

On Jan 18, 3:04 pm, TheSilverHammer

I would seriously look at your data architecture and see if you can lower
the number of instances you need to hold at once. The only application I
can think of that would need such a huge quantity of data might be a game
terrain database. In those cases you really do not need all of the terrain
loaded at once and can section them off to blocks. In the cases where you do
need all that data available at once, you end up shrinking the map size to
something reasonable or implementing a lot of compression.

I've got a datastructure of 100 million records in memory. I probably
shouldn't say what it's for, but it basically ends up being a 10,000
squared lookup, with a single byte result.

Doing this in memory (for the low, low cost of 100M of memory) gave me
a ~100,000% performance improvement over a database lookup

Jon

Bill McCarthy · Jan 19, 2008

Jon Skeet said:
I've got a datastructure of 100 million records in memory. I probably
shouldn't say what it's for, but it basically ends up being a 10,000
squared lookup, with a single byte result.

Doing this in memory (for the low, low cost of 100M of memory) gave me
a ~100,000% performance improvement over a database lookup

That of course would not be a searchable datastructure, more like an indexed
array where the location is calculated by offset

If it was a key per
item, then each item would require at least a 4 byte key, making the in
memory size at least 500M

Alex Clark · Jan 24, 2008

I know it's not always good netiquette to pry into other's projects in these
newsgroups, but is anyone else really interested to know what this actual
application is for...?

Pieter · Jan 24, 2008

Oh, it's good to have a healthy curiousity ;-)

It's a Reinforcement Learning Project: Artificial Intelligence.
I'm making an agent that can play a board game using RL. To do this, it must
learn from what it did:
- the agent comes in different states
- every state has different actions
- the agent must choose between these actions
- at the end he gets his reward (win or lose). this reward is given to the
actions he choose.
By learnign (exploration) and exploitation he must maximize his wins :-)

The whole datastructure is needed to put my states in: I must be able to
search very fast is he had alreaddy been in that state :-)

It has to be finished by monday evening ;-)

fastest searchable datastructure?

Michel Posseth [MCP]

TheSilverHammer

Jon Skeet [C# MVP]

Bill McCarthy

Alex Clark

Pieter