Parse a String and get unique values

R

Raterus

Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they were keys, then just iterate through the end result, The results should be unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael
 
R

Rich

Here is an example from the VS2003 help files for string
parsing - Note how you can have several delimeters listed -
you need to have delimeters in your string.

Sub strSplit()
'4 delimeters here, a space, comma, period, colon
Dim delimStr As String = " ,.:"
Dim delimiter As Char() = delimStr.ToCharArray()
Dim words As String = "one two,three:four."
Dim split As String() = Nothing
Console.WriteLine("The delimiters are -{0}-", delimStr)
Dim x As Integer
For x = 1 To 5
split = words.Split(delimiter, x)
Console.WriteLine(ControlChars.Cr + "count =
{0,2} ..............", x)
Dim s As String
For Each s In split
Console.WriteLine("-{0}-", s)
Next s
Next x
End Sub

-----Original Message-----
Hi,

I'm looking for ideas for the most efficient way to
accomplish this. I have a string representing names a
person goes by.
"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with
an array of UNIQUE strings that appear in the original
string (In no particular order)
arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the
words like they were keys, then just iterate through the
end result, The results should be unique, but I'm not sure
that would be the best method for performance.
 
S

steve

something like:

(?=(\b\w+\b)|^)([^\1]*)\1+

could be used in a regex replace on the original string to give you unique
words...replacing w/ $2. from there you could just do a regex split on \b
(word boundry). viola, there's your unique array.

hth,

steve


Hi,

I'm looking for ideas for the most efficient way to accomplish this. I have
a string representing names a person goes by.

"John Myers Joe John Myers"

And I need to parse it in such a way that I end up with an array of UNIQUE
strings that appear in the original string (In no particular order)

arr(0) = "John"
arr(1) = "Myers"
arr(2) = "Joe"

One way I can think of is to use a hashtable and add the words like they
were keys, then just iterate through the end result, The results should be
unique, but I'm not sure that would be the best method for performance.

Thanks for any ideas!
--Michael
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top