Performance of large single Hashtable vs. several Hashtables

A

Anders Borum

Hello!

I have a list of singleton classes (model managers) that store objects
internally using hashtables. Each of these classes use a single hashtable to
store e.g. users, pages, elements and so on (I have complete control of the
objects stored). I am currently using a set of abstract cache classes that
my model managers subclass.

The hashtable in a model managers could, potentially, store more than 25.000
objects (let's imagine 50.000 for sake of the discussion). The hashtables
are accessed very frequently (they store objects that are cloned and handed
to the client context services).

I am tinkering about creating a central cache, but have yet to figure out
when (or if) a hashtable breaks down, or when it becomes a bad decision to
go this way.

Obviously, using a central cache does implicit ask for synchronization
(which I also use in abstract cache classes), but sharing the cache could
sacrifice additional speed.

For testing purposes, I tried populating a hashtable with 650.000 instances
(ate up around 75MB of RAM), and it worked all right. I haven't done any
tests on how the size of the hashtable actually affects lookup times (I
belive to have read that the lookup time is almost constant, regardless of
the size).

The caching mechanism in ASP.NET is built around a hashtable. I haven't
found any recommendations for number of items stored here, so I be on the
right track.

Any comments and ideas are very welcome!
 
J

Justin Rogers

The only performance issues I've ever found with the Hashtable were in
pre-V1, when the Terrarium was using over 1 million records. They have
a precomputed primes list up to that point, and when we hit the threshold
and had to compute a new prime, we ran into some problems. So basically
growing the Hashtable at very large record counts could pose a problem
for you.
 
S

Sherif ElMetainy

Hello

The performance will depend on the data types you use as keys for the hash
table. If you use many types as keys for a single hashtable, it is more
likely that GetHashCode of one type can be similar to that of another type,
although they are not equal. Therefore you may have a lot of collisions in
the hash table that cal affect performance.


Sorry for my bad english
Best regards,
Sherif
 
A

Anders Borum

Hello!
The only performance issues I've ever found with the Hashtable were in
pre-V1, when the Terrarium was using over 1 million records. They have
a precomputed primes list up to that point, and when we hit the threshold
and had to compute a new prime, we ran into some problems. So basically
growing the Hashtable at very large record counts could pose a problem
for you.

Thanks for getting back to me on this subject. As always it's interesting
information you guys pass back.

I don't think I'll ever reach 1 million entried in a single hashtable, and
the memory requirements for such large amounts of objects are definately
quite high (I have been looking at expiring items from the cache, similar to
the ASP.NET cache model - and it works quite well, but I need to design the
architecture around worst case scenarios).

Any idea why the ASP.NET cache was not made a seperate part of the
framework? It seems like many people have had to create their own caches,
quite similar to the one in ASP.NET.

Microsoft have provided the cache application block, but I was more looking
for a .NET framework core class feature. Perhaps this will be provided with
the next .NET release.

I will give the central cache some more thoughts, but the fact alone that
there could be concurrent access to the central hashtable (when two or more
modelmanagers try to access the share cache at the same time), there will be
a quite high performance penaulty, compared to providing each model manager
with its own cache context (i.e hashtable).

Caching is king!

Btw. sorry for replying so late - I have been playing a golf tournament :)
 
A

Anders Borum

Hello!
The performance will depend on the data types you use as keys for the hash
table. If you use many types as keys for a single hashtable, it is more
likely that GetHashCode of one type can be similar to that of another type,
although they are not equal. Therefore you may have a lot of collisions in
the hash table that cal affect performance.

I will be storing objects of the same superclass called "CmsObject", and
index the objects in the hashtable using an integer currently, but will soon
switch to guids as that allows me to create the identifiers directly within
my model managers.

I am, btw., not overriding the virtual GetHashCode() method in my
superclass.

Thanks in advance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top