Remove duplicate values from String Arrays

G

Guest

Hello,

I need some help to learn how to remove duplicate values from a string array.
The string for example is "blue yellow yellow red" or "blue yellow red
yellow". What I am doing is... I split the string into a string array or 5
elements. I then create a 2nd array to add only unique values and then join
the 2nd array so that the string is "blue yellow red". My issue is I am not
sure how to add unique values only to the second array.

i am sure there is a simple way to do this but cannot figure it out.

here is the code that i have written...

Sub RemoveDuplicateWords()

Dim array1() As String
Dim array2() As String
Dim tmp1, tmp2, tmp3 As String
Dim i, a, b, c As Integer
Dim str As String
Dim repeat As Boolean

repeat = False

str = "blue yellow yellow red"
'str = "blue yellow red yellow" 'this works

array1 = Split(str, " ")
i = UBound(array1) - LBound(array1)

ReDim array2(0 To i)
c = 0

For a = 0 To i
tmp1 = ""
tmp = array1(a)
'Debug.Print tmp

For b = 0 To i
tmp1 = array2(b)

If tmp = tmp1 Then
repeat = True
Exit For
End If

Next b

If repeat = False Then

'Debug.Print tmp
array2(c) = tmp
c = c + 1

End If
Next a

tmp3 = Join(array2, " ")
Debug.Print tmp3

End Sub
 
G

Guest

i have fixed my issue, i move the boolean value inside the loop

however, if there is a better way to do this please let me know.

thanks
 
J

John Nurick

IMO the simplest way to do this is with a Dictionary object, whose
Exists method makes it easy to tell whether an item already exists.
E.g.:

Function Uniquify(List As String, Delimiter As String) As String
Dim Uniques As Object
Dim Item As Variant
Dim Buffer As String

Set Uniques = CreateObject("Scripting.Dictionary")
For Each Item In Split(List, Delimiter)
If Not Uniques.Exists(Item) Then
Uniques.Add Item, 1
End If
Next
For Each Item In Uniques
Buffer = Buffer & Item & Delimiter
Next
Uniquify = Left(Buffer, Len(Buffer) - Len(Delimiter))
End Function
 
G

Guest

thanks for the response, i will look at using the dictionary object instead.
one question - regarding processing speed, would you say a dictionary object
processes faster than string arrays when comparing elements or removing
duplicates. i am have two large lists of data (3000 rows and 450,000 rows)
where i am looking for word matches between two sets of strings and i tried
using string arrays to do this using a series of loops. i am sure there is a
better way to do this.

appreciate your advice.

bobm
 
J

John Nurick

The only way to find out is to run some systematic tests, which will
probably take longer than the time that one approach may save over the
other in use.

I suspect that the outcome will depend on the number of items in your
"arrays": with few items, the time taken to create a Dictionary object
may outweigh the time saved by using Dictionary.Exists (which AFAIK uses
an index of keys maintained by the Dictionary object) over iterating
n-squared times through your n-element arrays, but with more items I'd
expect the Dictionary approach to have the advantage.

More important considerations IMHO are (a) the amount of coding
required, which favours Dictionary; and (b) the fact that the Dictionary
object is in the Microsoft Scripting Runtime library rather than in core
Access VBA, which is a point against it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top