unique values in an array

T

th081th081

I have set up an array that holds string values from excel, (option
base 1).

Is there a function that will delete all the duplicate values in the
array and just leave the unique ones, i have many such string array
that i need to do the same thing for.

My code is like this

term() as string (declare array)

Redim term(noItems) (re dim array with the number of items)

Import data to the term array

Call Unique(term) (pass the term array to the unique function and get
back the term array with just the unique items only)

Can any body help

Cheers

Tony
 
B

Bob Phillips

Private Sub Unique(ByRef term As Variant)
Dim tmp As Variant
Dim i As Long
Dim j As Long

ReDim tmp(LBound(term) To UBound(term))
j = LBound(term)
For i = LBound(term) To UBound(term)
If IsError(Application.Match(term(i), tmp, 0)) Then
tmp(j) = term(i)
j = j + 1
End If
Next i
ReDim Preserve tmp(LBound(tmp) To j - 1)
term = tmp

End Sub


--
---
HTH

Bob


(there's no email, no snail mail, but somewhere should be gmail in my addy)
 
A

Alan Beban

Bob said:
Private Sub Unique(ByRef term As Variant)
Dim tmp As Variant
Dim i As Long
Dim j As Long

ReDim tmp(LBound(term) To UBound(term))
j = LBound(term)
For i = LBound(term) To UBound(term)
If IsError(Application.Match(term(i), tmp, 0)) Then
tmp(j) = term(i)
j = j + 1
End If
Next i
ReDim Preserve tmp(LBound(tmp) To j - 1)
term = tmp '<----error message:Variable uses an Automation type not supported in Visual Basic

End Sub
The Op said his array is of type String(). The following doesn't work:

Sub test12()
Dim term() As String
ReDim term(0 To 3)
term(0) = "a"
term(1) = "b"
term(2) = "c"
term(3) = "c"
Unique term
End Sub
 
D

Dana DeLouis

Would something like this work?

Sub TestIt()
Dim V
V = Array("a", "b", "c", "b", "d", "c", "e")
V = Unique(V)
End Sub

Function Unique(V) As Variant
Dim d As Object
Dim Obj

Set d = CreateObject("Scripting.Dictionary")
On Error Resume Next

For Each Obj In V
d.Add Obj, 1
Next Obj
Unique = d.Keys
End Function
 
A

Alan Beban

Dana said:
Would something like this work?

Sub TestIt()
Dim V
V = Array("a", "b", "c", "b", "d", "c", "e")
V = Unique(V)
End Sub

Function Unique(V) As Variant
Dim d As Object
Dim Obj

Set d = CreateObject("Scripting.Dictionary")
On Error Resume Next

For Each Obj In V
d.Add Obj, 1
Next Obj
Unique = d.Keys
End Function
Not if V is of type String() as stated by the OP. E.g., the following
doesn't work:

Sub TestIt()
Dim V() As String
ReDim V(1 To 7)
V(1) = "a"
V(2) = "b"
V(3) = "c"
V(4) = "b"
V(5) = "d"
V(6) = "c"
V(7) = "e"
V = Unique(V) '<---Type mismatch error message
End Sub
 
D

Dana DeLouis

Not if V is of type String() as stated by the OP.

Ok. How about plan B ?

Function Unique(V) As Variant
Dim d As Object
Dim Obj

Set d = CreateObject("Scripting.Dictionary")
On Error Resume Next

For Each Obj In V
d.Add Obj, 1
Next Obj

If TypeName(V) = "String()" Then
Dim s() As String
Dim J As Long

ReDim s(1 To d.Count)
J = 1

For Each Obj In d.keys
s(J) = Obj
J = J + 1
Next Obj
Unique = s
Else
Unique = d.keys
End If
End Function

- - - - - -
HTH
Dana DeLouis
 
R

Rick Rothstein \(MVP - VB\)

If your array will **always** be a String array, this subroutine should do
what you want...

Sub RemoveDuplicates(ArrayIn() As String)
Dim X As Long
Dim Index As Long
Dim Combined As String
Dim TempArray() As String
TempArray = ArrayIn
Index = LBound(ArrayIn)
For X = LBound(ArrayIn) To UBound(ArrayIn)
If InStr(Chr$(1) & Combined & Chr$(1), Chr$(1) & _
ArrayIn(X) & Chr$(1)) = 0 Then
TempArray(Index) = ArrayIn(X)
Combined = Combined & Chr$(1) & ArrayIn(X)
Index = Index + 1
End If
Next
ReDim Preserve TempArray(LBound(ArrayIn) To Index - 1)
ArrayIn = TempArray
End Sub

Rick
 
I

ilia

This works.

Function Unique(V As Variant) As Variant
Dim myItem As Variant
Dim myArray() As String
Dim i As Integer
Dim myCol As New Collection

On Error Resume Next
For i = LBound(V) To UBound(V)
myCol.Add Item:=V(i), Key:=V(i)
Next i
On Error GoTo 0

ReDim myArray(myCol.Count)
i = LBound(myArray)

For Each myItem In myCol
myArray(i) = CStr(myItem)
i = i + 1
Next myItem

Unique = myArray
End Function
 
A

Alan Beban

If the functions in the freely downloadable file at
http://home.pacbell.net/beban are available to your workbook, either of
the following will work:

Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
V = ArrayUniques(V)
End Sub

Sub TestIt2()
Dim V() As String
ReDim V(0 To 0)
Assign Array("a", "b", "c", "b", "d", "c", "e"), V
V = ArrayUniques(V)
End Sub

Alan Beban
 
A

Alan Beban

Alan said:
If the functions in the freely downloadable file at
http://home.pacbell.net/beban are available to your workbook, either of
the following will work:

Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
V = ArrayUniques(V)
End Sub

Sub TestIt2()
Dim V() As String
ReDim V(0 To 0)
Assign Array("a", "b", "c", "b", "d", "c", "e"), V
V = ArrayUniques(V)
End Sub

Alan Beban
I posted the above because it will work for arrays of any built-in type;
e.g.,

Dim V() As Integer
ReDim V(0 to 0)
Assign Array(1,2,3,2), V
V = ArrayUniques(V)

Ditto for Boolean(), Byte(), Long(), etc.

Alan Beban
 
G

Gary Keramidas

here's a concept about what i posted, newArr will contain only unique values

Option Explicit
Sub test()
Dim arr As Variant
Dim newArr As Collection
Set newArr = New Collection

arr = Array("A", "B", "C", "A", "D", "E", "B")
On Error Resume Next
For i = LBound(arr) To UBound(arr)
newArr.Add arr(i), CStr(arr(i))
Next
On Error GoTo 0
End Sub
 
G

Gary Keramidas

sorry, forgot to dim the i variable

Option Explicit
Sub test()
Dim arr As Variant
Dim newArr As Collection
Dim i As Long
Set newArr = New Collection

arr = Array("A", "B", "C", "A", "D", "E", "B")
On Error Resume Next
For i = LBound(arr) To UBound(arr)
newArr.Add arr(i), CStr(arr(i))
Next
On Error GoTo 0
End Sub
 
A

Alan Beban

Gary said:
sorry, forgot to dim the i variable

Option Explicit
Sub test()
Dim arr As Variant
Dim newArr As Collection
Dim i As Long
Set newArr = New Collection

arr = Array("A", "B", "C", "A", "D", "E", "B")
On Error Resume Next
For i = LBound(arr) To UBound(arr)
newArr.Add arr(i), CStr(arr(i))
Next
On Error GoTo 0
End Sub
Since you mention it separately and explicitly, you probably want

Dim i As Byte

I suppose whether you want to use an array or a collection depends on
what you're going to do with it.

Alan Beban
 
H

Harlan Grove

Dana DeLouis said:
Ok. How about plan B ?

Function Unique(V) As Variant ....
If TypeName(V) = "String()" Then
....

Too special-case. Try


Function foo(ByRef a As Variant) As Boolean
'requires reference to Microsoft Scripting Runtime
'if running under Excel 97 or Mac versions of Excel,
'comment next line and uncomment the line after
Dim d As Dictionary
'Dim d As Object
Dim j As Long, k As Long, n As Long

On Error Resume Next
j = UBound(a, 2)

'only process 1D arrays - return TRUE (error) otherwise
If IsObject(a) Or (Not IsArray(a)) Or Err.Number = 0 Then
Err.Clear
foo = True
Exit Function
End If

On Error GoTo 0

'if running under Excel 97 or Mac versions of Excel,
'comment next line and uncomment the line after
Set d = New Dictionary
'Set d = CreateObject("Scripting.Dictionary")

k = LBound(a, 1)
n = UBound(a, 1)

For j = k To n
If Not d.Exists(a(j)) Then d.Add Key:=a(j), Item:=0
Next j

n = d.Count + k - 1

ReDim a(k To n)

For j = k To n
a(j) = d.Keys(j - k)
Next j
End Function


And test it.


Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
Debug.Print Typename(V), LBound(V, 1), UBound(V, 1)
Call foo(V)
Debug.Print Typename(V), LBound(V, 1), UBound(V, 1)
End Sub


Before the call, V is an array from 0 to 6 of strings. After the call,
it's an array from 0 to 4 of strings.
 
H

Harlan Grove

Alan Beban said:
If the functions in the freely downloadable file at
http://home.pacbell.net/bebanare available to your workbook, either
of the following will work:

Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
V = ArrayUniques(V)
End Sub
....

V's dimensions start out as (0 To 6) and end up (1 To 5, 1 To 1), so
let's modify your test script to


Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
GoSub dump
V = ArrayUniques(V) 'foo(V)
GoSub dump
Exit Sub

dump:
Dim k As Long
Debug.Print TypeName(V), LBound(V, 1), UBound(V, 1)
For k = LBound(V) To UBound(V)
Debug.Print V(k)
Next k
Return
End Sub


Run it, and whatcha know, it throws a runtime error at

Debug.Print V(k)

the second time the dump subroutine is called because V is no longer
1D. If you're going to be Oh, so particular about the OP's array being
String() type, I'm going to be just as particular that the OP's array
is very likely 1D and should remain so.

Do I take it that converting 1D VBA arrays into 2D VBA arrays is
intended functionality for ArrayUniques? If so, you might want to
consider mentioning it to the poor unfortunates who try to use your
function library.
 
A

Alan Beban

Harlan said:
Do I take it that converting 1D VBA arrays into 2D VBA arrays is
intended functionality for ArrayUniques? If so, you might want to
consider mentioning it to the poor unfortunates who try to use your
function library.

The functionality is chosen by the user. You and the other poor
unfortunates need only read the description of ArrayUniques in the
description portion of the function library or at the beginning of the
ArrayUniques procedure code. It states quite clearly that the default
output is a single column 1-based vertical array, therefore necessarily
two-dimensional.

The user selects among:

1-based vertical 2-D (of necessity) V = ArrayUniques(V) default case

0-based vertical 2-D (of necessity) V = ArrayUniques(V, , "0vert")

1-based horizontal 1-D V = ArrayUniques(V, , "1horiz")

0-based horizontal 1-D V = ArrayUniques(V, , "0horiz")

The second parameter (Boolean) provides for case matching (the default)
or not

The fourth parameter (Boolean) provides for omitting blanks (the
default) or not.

I'm changing the description to make it explicit that the horizontal
outputs are one-dimensional.

I don't mind you pissing on the library; you have to do that,
particularly when you're not on your meds. But do the group the courtesy
of understanding the specific function you're going to discuss before
you do.

Alan Beban
 
H

Harlan Grove

Alan Beban said:
The functionality is chosen by the user. You and the other poor
unfortunates need only read the description of ArrayUniques in the
description portion of the function library or at the beginning of
the ArrayUniques procedure code. It states quite clearly . . .

If so, *YOU* screwed up your overly simple examples, which I quote:

Sub TestIt()
Dim V() As String
ReDim V(0 To 6)
V(0) = "a"
V(1) = "b"
V(2) = "c"
V(3) = "b"
V(4) = "d"
V(5) = "c"
V(6) = "e"
V = ArrayUniques(V)
End Sub

Sub TestIt2()
Dim V() As String
ReDim V(0 To 0)
Assign Array("a", "b", "c", "b", "d", "c", "e"), V
V = ArrayUniques(V)
End Sub

In both cases you start off with 0-based 1D arrays and convert them
into 1-based 2D arrays. In theory, you understand your own array
functions library better than anyone else, so when you provide
examples of its use, shouldn't you try to make examples of using it as
straightforward as possible?

In this case, that would have meant ArrayUniques calls like so:

V = ArrayUniques(V, , "0horiz")

But if generality is desired, ranges, 1D and 2D arrays accepted, and
returned arrays having the same lower dimension bounds as the passed
arrays, consider


Function adistinct(ByVal a As Variant) As Variant
'returns result array if successful, #REF! if passed multiple
'area range, #NUM! if passed scalar or 3D or higher array,
'#VALUE! would indicate runtime error if called as a udf
'------------------------------------------------------------
'requires reference to Microsoft Scripting Runtime
'if running under Excel 97 or Mac versions of Excel,
'comment next line and uncomment the line after
Dim d As Dictionary
'Dim d As Object
Dim i As Long, j As Long, k As Long, n As Long, x As Variant

'convert single area ranges to arrays; die on multiple area ranges
adistinct = CVErr(xlErrRef)
If TypeOf a Is Range Then _
If a.Areas.Count = 1 Then a = a.Value Else Exit Function

'check for 3rd dim'n bounds
On Error Resume Next
i = -1
i = UBound(a, 3) - LBound(a, 3) 'if 3D or higher, i now >= 0
j = -1
j = UBound(a, 2) - LBound(a, 2) 'if 2D or higher, j now >= 0
On Error GoTo 0

'die on non-Range objects, scalars and 3D or higher arrays
adistinct = CVErr(xlErrNum)
If IsObject(a) Or (Not IsArray(a)) Or i >= 0 Then Exit Function

'if running under Excel 97 or Mac versions of Excel,
'comment next line and uncomment the line after
Set d = New Dictionary
'Set d = CreateObject("Scripting.Dictionary")

'load distinct values into Dictionary object
For Each x In a
If Not d.Exists(x) Then d.Add Key:=x, Item:=0
Next x

'keep original 1st dim'n lower bound (k), but change
'upper bound based on number of distinct values (n)
k = LBound(a, 1)
n = d.Count + k - 1

'reduce a to its distinct values
If j < 0 Then '1D
ReDim a(k To n)

For j = k To n
a(j) = d.Keys(j - k)
Next j

Else '2D
'keep original 2nd dim'n lower bound (i), and make it
'the upper bound as well, so a degenerate 2nd dim'n
i = LBound(a, 2)
ReDim a(k To n, i To i)

For j = k To n
a(j, i) = d.Keys(j - k)
Next j

End If

Set d = Nothing

adistinct = a

End Function


This doesn't do exactly the same thing your ArrayUniques does, but it
does return specific type arrays with the same number of dimensions
and lower dimension bounds as the passed arrays, which I consider more
convenient. And it took just 69 lines including comments and blank
lines with no compound statements compared to ArrayUniques, which
weighs in at 140 lines.

As I said, this function doesn't do exactly the same thing as yours.
With regard to case insensitivity, if an array contained "AA", "Aa",
"aA" and "aa", which should be kept as the distinct value? The first
found? The last found? The most frequently occurring? The one with the
most upper or lower case chars? Determined by collation sequence? If
first or last, should the function iterate through the array row-major
or column-major? To me, it takes more than just one 2-state optional
parameter for this.

As for omitting blanks, there'd be at most one instance of "" in the
result array, and that'd be easy enough to eliminate using a separate
filtering function. The advantage of a separate filtering function is
that it could accept an array of values to remove, possibly including
Empty, error values, etc.

That leaves changing the number of dimensions and array lower bounds,
and for those things I'd prefer to use different functions, and have a
broader choice than just 0 or 1 as lower bounds.
 
A

Alan Beban

Harlan said:
As I said, this function doesn't do exactly the same thing as yours.
With regard to case insensitivity, if an array contained "AA", "Aa",
"aA" and "aa", which should be kept as the distinct value? The first
found? The last found? The most frequently occurring? The one with the
most upper or lower case chars? Determined by collation sequence? If
first or last, should the function iterate through the array row-major
or column-major? To me, it takes more than just one 2-state optional
parameter for this.

Seems that by choosing case insensitivity, the user will have indicated
that it doesn't make any difference which is kept as the distinct value;
that he/she is indicating that for his/her purposes they are all
equivalent.
As for omitting blanks, there'd be at most one instance of "" in the
result array, and that'd be easy enough to eliminate using a separate
filtering function. The advantage of a separate filtering function is
that it could accept an array of values to remove, possibly including
Empty, error values, etc.

Not sure what the thrust of this comment is. The one instance of ""
(which will often result from unwanted blanks in the range/array from
which duplicates are to be eliminated) is eliminated with a single
simple line of code -- If OmitBlanks Then x.Remove ("")

There is no need for the ArrayUniques function to contain the separate
filtering function for any additional filtering that a user might
design. The filtering can readily be accomplished, if desired, with
something like

FilteringFunction(ArrayUniques([ArrayUniques parameters]),
[FilteringFunction parameters])

Alan Beban
 
H

Harlan Grove

Alan Beban said:
Seems that by choosing case insensitivity, the user will have
indicated that it doesn't make any difference which is kept as the
distinct value; that he/she is indicating that for his/her purposes
they are all equivalent.

Fair enough, so first match. An alternative way to handle this would
be to use a different function to change all text to upper, lower or
Proper case before calling the function that would remove duplicates.
. . . The one instance of "" (which will often result from unwanted
blanks in the range/array from which duplicates are to be
eliminated) is eliminated with a single simple line of code -- If
OmitBlanks Then x.Remove ("")

Now that you mention it, this lies in an inefficient block of code.

On Error Resume Next
For Each Elem In arr
x.Add Item:=Elem, Key:=CStr(Elem)
Next
If OmitBlanks Then x.Remove ("")
On Error GoTo 0

There's no need for the enclosing error trapping, which slows things
down. Just test whether each new array value is already loaded into
the dictionary, and only add those that aren't. Similarly, only remove
"" if it exists.

For Each Elem In arr
If not x.Exists(Elem) Then x.Add Item:=0, Key:=Elem
Next
If OmitBlanks And x.Exists("") Then x.Remove ""
There is no need for the ArrayUniques function to contain the
separate filtering function for any additional filtering that a
user might design. The filtering can readily be accomplished, if
desired, with something like

FilteringFunction(ArrayUniques([ArrayUniques parameters]),
[FilteringFunction parameters])

Indeed, but note that your parameter to omit blanks only eliminates
zero-length strings. If the input array were a range, and that range
contained blank cells (in the ISBLANK sense), those cells would be
recorded as Empty rather than "", so using your optional parameter
wouldn't eliminate them. My point is that IF a user would want to
eliminate both "" and Empty, it's more efficient to reduce the array
to the distinct values POSSIBLY INCLUDING "" and Empty, then eliminate
the at most single remaining "" and Empty items.

If all a user would ever want to do would be eliminating "" values,
your approach would be fine. However, if a user would want to
eliminate "" AND Empty values (and/or possibly other values, e.g.,
whatever Excel would receive representing missing values from database
records, which are not always "" or Empty), then the likely need to
call another function would negate the possible usefullness of
eliminating "" in ArrayUniques.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top