algorithm to find all unique email id strings in an array

S

sfdev2000

I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

Are there are gotchas for using a C# hashtable for this solution?
 
J

Jon Skeet [C# MVP]

I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

Are there are gotchas for using a C# hashtable for this solution?

It would help if you'd say which version of .NET you're using.

In .NET 3.5 I'd just call Distinct() on the array.
In .NET 2.0 I'd use a Dictionary<string,string> using the same value as
the key.
In .NET 1.1 I'd use Hashtable.
 
A

Arne Vajhøj

I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

If on 3.5 then HashSet was a possibility.

Arne
 
J

Jon Skeet [C# MVP]

Arne Vajhøj said:
If on 3.5 then HashSet was a possibility.

You *could* explicitly use a HashSet - but why go to the work of doing
it yourself when Enumerable.Distinct() does it all for you? :)
 
A

Arne Vajhøj

Jon said:
You *could* explicitly use a HashSet - but why go to the work of doing
it yourself when Enumerable.Distinct() does it all for you? :)

When you have a hammer problems tend to look like nails.

:)

Is .Discrete using a HashSet internally ?

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top