Need quick lookup like Hashtable, but don't need to store value

illegal.prime · Aug 12, 2006

Hey all, I seem to recall stumbling across a class that is exactly the
same as Hashtable, except lacking the Value portion. I need something
that works just like the Hashtable, but that doesn't store a value (in
addition to the key).

Essentially, I will just be storing string keys and will want to do a
lookup in my container to see if they exist in it. Hashtable would
achieve exactly this, but it would also expect a secondary Value
argument when I add the string keys to it. Now obviously I can still
use a Hashtable and just use null for Value, but I would rather not do
that.

Thoughts?

Novice

Markus Stoeger · Aug 12, 2006

Hey all, I seem to recall stumbling across a class that is exactly the
same as Hashtable, except lacking the Value portion. I need something
that works just like the Hashtable, but that doesn't store a value (in
addition to the key).

Essentially, I will just be storing string keys and will want to do a
lookup in my container to see if they exist in it. Hashtable would
achieve exactly this, but it would also expect a secondary Value
argument when I add the string keys to it. Now obviously I can still
use a Hashtable and just use null for Value, but I would rather not do
that.

Thoughts?

You can use an ArrayList and lookup values using the BinarySearch
method. I'm not on C# 2.0 yet, but a List<T> probably has the same method.

hth,
Max

Carl Daniel [VC++ MVP] · Aug 12, 2006

Hey all, I seem to recall stumbling across a class that is exactly the
same as Hashtable, except lacking the Value portion. I need something
that works just like the Hashtable, but that doesn't store a value (in
addition to the key).

Essentially, I will just be storing string keys and will want to do a
lookup in my container to see if they exist in it. Hashtable would
achieve exactly this, but it would also expect a secondary Value
argument when I add the string keys to it. Now obviously I can still
use a Hashtable and just use null for Value, but I would rather not do
that.

You want one of the standard collection classes that's inexplicably missing
from the BCL - a Set (or Bag). You can build it yourself (e.g. using a
sorted List<string>), or just use Hashtable and ignore the value.

Here's a simple Set<T> based on a sorted list that I use - you can embelish
it further to cover whatever operations you need, but this should be enough
to get you started.

public class Set<T> where T: IComparable
{
private List<T> m_list = new List<T>();

public IEnumerable<T> Items
{
get
{
foreach (T t in m_list)
yield return t;
}
}

public void Clear()
{
m_list.Clear();
}

public bool Add(T t)
{
int i = m_list.BinarySearch(t);
if (i < 0)
{
m_list.Insert(~i, t);
return true;
}

return false;
}

public bool Contains(T t)
{
int i = m_list.BinarySearch(t);
return i >= 0;
}
}

-cd

Bruce Wood · Aug 15, 2006

Hey all, I seem to recall stumbling across a class that is exactly the
same as Hashtable, except lacking the Value portion. I need something
that works just like the Hashtable, but that doesn't store a value (in
addition to the key).

Essentially, I will just be storing string keys and will want to do a
lookup in my container to see if they exist in it. Hashtable would
achieve exactly this, but it would also expect a secondary Value
argument when I add the string keys to it. Now obviously I can still
use a Hashtable and just use null for Value, but I would rather not do
that.

Nope. There is no such structure.

Just store the key in both the Key and Value portions. It won't take up
any more space.

Using an ArrayList and binary search is slower than a Hashtable for
lookups, and much, much slower for inserts. Hashtable should be your
preferred solution.

Carl Daniel [VC++ MVP] · Aug 16, 2006

Bruce said:
Nope. There is no such structure.

Just store the key in both the Key and Value portions. It won't take
up any more space.

Using an ArrayList and binary search is slower than a Hashtable for
lookups, and much, much slower for inserts. Hashtable should be your
preferred solution.

That depends a great deal on the number of items, how they're distributed,
and how they're compared. It's quite commonn for a sorted list to be more
efficient than a hashtable for <100 items, and string comparison is quite a
lot faster than hashing for typical mixtures of strings (because the
comparison loop will typically bail out after only a few characters, while
the hash always runs to completion). The only way to really know is to
measure both approaches in your application - there's simply no hard & fast
rule that's always right.

-cd

illegal.prime · Aug 16, 2006

This is a fundamental concept in Computer Science. Searching a sorted
list using binary search is O(log n)
The above log is base 2.

The complexity to get an entry from a hashtable that doesn't have
collisions is O(1).

In other words, I can't think of any reason to use a sorted list over a
hashtable - regardless of the language (C# or otherwise).

Novice

Carl Daniel [VC++ MVP] · Aug 16, 2006

This is a fundamental concept in Computer Science. Searching a sorted
list using binary search is O(log n)
The above log is base 2.

The complexity to get an entry from a hashtable that doesn't have
collisions is O(1).

In other words, I can't think of any reason to use a sorted list over a
hashtable - regardless of the language (C# or otherwise).

Big-O notation tells you the relative performance in terms of number of
operations when the number of operations is large (approaches infinity).

Unfortunately, it's easy to be lead astray by worst-case theoretical
performance in real-world cases. There are a couple of factors that play
heavily into the equation:

1. The size of your collections.
2. The cost of the individual operations.

In the case of sorted array versus hashtable, you have to consider the cost
of comparison to the cost of hashing. The relative cost of those operations
is a function of the size and content of the items being compared and
hashed.

For example, imagine you have a container with 100, 100-character strings.

For a hashed container, determining if a given 100-character string is in
that container will require touching all 100 characters of the target string
in order to compute its hash. The hash is then used as an index into a
table of "buckets" to find the item. If the collection contains items with
the same bucket number (hash collisions), the hashtable generally
degenerates to a linear search, doing full-value comparisons of the target
string to each item in the bucket. If there are no hash collisions, then
this cost can be ignored, and the cost of a lookup is roughly constant and
roughly equal to the cost of calculating that one hash value (100 character
accesses).

For a sorted array, finding a string using a binary search will take at most
7 comparisons, with each comparison taking between 2 and 200 character
accesses, depending on how different the target string is from the ones
already in the container. Clearly, in the worst case (nearly identical
strings differing only at high index values), the hashtable will vastly
outperform the sorted array, but for many real-world applications, the
string comparison will drop out after only a handful of comparisons, and the
cost of 7 string comparisons will actually be less than the cost of
calculating a single hash value.

Bottom line: you have to know your content and how it's accessed to know
which is best. Researchers have shown that for small collections, the
sorted list is usually faster even though the hashtable has a better Big-O
figure. The only way to know for sure in your application is to measure.

-cd

Brian Gideon · Aug 16, 2006

This is a fundamental concept in Computer Science. Searching a sorted
list using binary search is O(log n)
The above log is base 2.

The base on the log is irrelevent in Big-Oh notation. Consider the
following.

O(log2(n)) = O(log10(n) / log10(2)) = O(3.32 * log10(n)) = O(log10(n))

However, the base is important when used outside of Big-Oh notation;
like when you're actually trying to determine the number of operations
of a particular algorithm or when you're trying to estimate the break
even point of algorithms with different complexities.

The complexity to get an entry from a hashtable that doesn't have
collisions is O(1).

In other words, I can't think of any reason to use a sorted list over a
hashtable - regardless of the language (C# or otherwise).

Memory and speed could both be reasons for using a sorted list over a
hash table. Hash tables obviously use more memory and in some
scenarios the O(log n) algorithm could be faster than the O(1)
algorithm. The documentation mentions that a ListDictionary is usually
faster than a Hashtable when the number of elements is less than 10.
My past observations have verified that.

cjard · Aug 31, 2006

I can still
use a Hashtable and just use null for Value, but I would rather not do
that.

Why not?

Need quick lookup like Hashtable, but don't need to store value

illegal.prime

Markus Stoeger

Carl Daniel [VC++ MVP]

Bruce Wood

Carl Daniel [VC++ MVP]

illegal.prime

Carl Daniel [VC++ MVP]

Brian Gideon

cjard