algorithm to find all unique email id strings in an array

  • Thread starter Thread starter sfdev2000
  • Start date Start date
S

sfdev2000

I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

Are there are gotchas for using a C# hashtable for this solution?
 
I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

Are there are gotchas for using a C# hashtable for this solution?

It would help if you'd say which version of .NET you're using.

In .NET 3.5 I'd just call Distinct() on the array.
In .NET 2.0 I'd use a Dictionary<string,string> using the same value as
the key.
In .NET 1.1 I'd use Hashtable.
 
I'm wrestling with the best way to create some C# code to find all of
the unique email id strings I receive within an array. Basically I
want to efficiently eliminate all of the duplicates. Typically there
will be between 1 and a few hundred unique email ids, and worse case
it might be 20,000 to 50,000 unique email ids.

My first pass was to use a C# hashtable. I use the hashtable to tell
me whether or not I've already seen the email id string in the
array. I don't care about the ordering of the email ids, all I care
about is finding all of the unique email ids.

Does anyone have any suggestion for a better solution?

If on 3.5 then HashSet was a possibility.

Arne
 
Arne Vajhøj said:
If on 3.5 then HashSet was a possibility.

You *could* explicitly use a HashSet - but why go to the work of doing
it yourself when Enumerable.Distinct() does it all for you? :)
 
Jon said:
You *could* explicitly use a HashSet - but why go to the work of doing
it yourself when Enumerable.Distinct() does it all for you? :)

When you have a hammer problems tend to look like nails.

:-)

Is .Discrete using a HashSet internally ?

Arne
 
Back
Top