Parse a String and get unique values

Raterus · Sep 14, 2004

Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they were keys, then just iterate through the end result, The results should be unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael

Rich · Sep 14, 2004

Here is an example from the VS2003 help files for string
parsing - Note how you can have several delimeters listed -
you need to have delimeters in your string.

Sub strSplit()
'4 delimeters here, a space, comma, period, colon
Dim delimStr As String = " ,.:"
Dim delimiter As Char() = delimStr.ToCharArray()
Dim words As String = "one two,three:four."
Dim split As String() = Nothing
Console.WriteLine("The delimiters are -{0}-", delimStr)
Dim x As Integer
For x = 1 To 5
split = words.Split(delimiter, x)
Console.WriteLine(ControlChars.Cr + "count =
{0,2} ..............", x)
Dim s As String
For Each s In split
Console.WriteLine("-{0}-", s)
Next s
Next x
End Sub

-----Original Message-----
Hi,

I'm looking for ideas for the most efficient way to

accomplish this. I have a string representing names a
person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with

an array of UNIQUE strings that appear in the original
string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the

words like they were keys, then just iterate through the
end result, The results should be unique, but I'm not sure
that would be the best method for performance.

steve · Sep 14, 2004

something like:

(?=(\b\w+\b)|^)([^\1]*)\1+

could be used in a regex replace on the original string to give you unique
words...replacing w/ $2. from there you could just do a regex split on \b
(word boundry). viola, there's your unique array.

hth,

steve

Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have
a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE
strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they
were keys, then just iterate through the end result, The results should be
unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael

Parse a String and get unique values

Raterus

Rich

steve