Speed up Array Transfer to Dictionary

E

ExcelMonkey

Hi I have a routine which checks an array with duplicate entries and tranfers
only the unique items to a dictionary. The code works fine excpet it is very
slow. Its slow because the array has to compare each of its element to every
other element in the array. If the size of the array is large, the code
slows down dramatically.

The array is called PR2 and the Dictionary is called Dict1.

Note I need all the duplicates as I use them to calculate a frequency of
occurence for each duplicate. This is why I kept the duplicate data in an
array in the first place as a dictionary cannot have duplicate keys.

Set Dict1 = New Dictionary

For t = LBound(PR2) To UBound(PR2)
If Not Dict1.Exists(PR2(t)) Then
For z = LBound(PR2) To UBound(PR2)
If z = t Then
'Do nothing as you are comparing this
'word to itself
ElseIf z <> UBound(PR2) Then
If PR2(t) = PR2(z) Then
'Duplicate flagged. Don't add to
'Dictionary
dupe = dupe + 1
End If
Else
'Only add to dictionary on last
'run as dictionary can not have duplicate
'keys
Dict1.Add PR2(t), dupe
End If
Next
End If
'Reset Duplicate
dupe = 1
Next

Thanks

EM
 
X

xp

Just a thought,

How about you load two arrays. One unique that you will load into the
dictionary and the second to just count the dupes? Or just count the dupes in
the source directly...?

I think that might run faster since there will be fewer comparisons?
 
E

ExcelMonkey

Right but I start out with an array with duplicates. The only way to get a
second array without duplicates is to compare individual entries in the first
array with themselves. Hence the time consumption.

My post was slightly mislabeled as the slow processing is not due to the
transfer of the data from the array to the dictionary but from the comparison
of the data in the array to itslef to find the dupclicates.

Thanks

EM
 
K

ker_01

Have you considered something like the following? <aircode warning> The goal
here is (a) to reduce the number of comparisons, and (b) to avoid a full loop
of comparisons every time you have a known duplicate.

Dim mylist(1 to 100, 1 to 2)

For i = lbound(mylist(1)) to ubound(mylist(1)) '1 to 100
If mylist(i,2) = "" then 'no matches yet
'add a non-dupe count, or add to your dictionary here
For p = (i+1) to ubound(mylist(1))
if mylist(p,1) = mylist(i,2) then mylist(p,2) = "Duplicate"
DupeCount = DupeCount+1
Next
endif
Next
 
D

Dana DeLouis

The only way to get a second array without duplicates is to compare
individual entries in the first array with themselves.
Hence the time consumption.

Hi. If I understand the problem, the Dictionary itself is more
efficient. I believe you want a list of unique entries, along with
their count. Then you will enter this info into your main dictionary.
The main idea is to use the "item" property.

My suggestion would be to use a temporary dictionary to reduce the data,
then enter this into the main dictionary.
Here is a small example that should give you some ideas.

Sub Demo()
'// vba library includes 'Microsoft Scripting Runtime.'
Dim J As Long
Dim Dic As Dictionary
Set Dic = New Dictionary

Dim T(1 To 5) 'An array with some dup data
T(1) = 11
T(2) = 12
T(3) = 11
T(4) = 13
T(5) = 12

For J = LBound(T) To UBound(T)
If Dic.Exists(T(J)) Then
Dic.Item(T(J)) = Dic.Item(T(J)) + 1
Else
Dic.Add T(J), 1
End If
Next J

Debug.Print Dic(11)
Debug.Print Dic(12)
Debug.Print Dic(13)
End Sub

Returns:
2
2
1

= = = =
HTH
Dana DeLouis
 
E

ExcelMonkey

Another way to get around this would be to check to see if the item in the
array already exists in the Dictionary. If it does not exist, Add it with an
item of "1". If it does exist don't add anything and simply update the item
for that specific key. I think this allows me to avoid the comparison within
the array itslelf. See below


Sub DumpArrayToDictionary()
Dim PR2() As String

ReDim PR2(0 To 6)

PR2(0) = "cat"
PR2(1) = "dog"
PR2(2) = "mouse"
PR2(3) = "cat"
PR2(4) = "dog"
PR2(5) = "dog"
PR2(6) = "dog"

dupe = 1

Set Dict1 = New Dictionary

For t = LBound(PR2) To UBound(PR2)
If Not Dict1.Exists(PR2(t)) Then
Dict1.Add PR2(t), dupe
Else
Dict1.Item(PR2(t)) = Dict1.Item(PR2(t)) + 1
End If
Next

Dim cumcount As Double
cumcount = 0
For Each Item In Dict1
Debug.Print Item & " " & Dict1(Item)
cumcount = cumcount + Dict1(Item)
Next
Debug.Print cumcount
End Sub
 
E

ExcelMonkey

Thanks Dana. I posted my result prior to seeing yours. I think they are
pretty much the same.

Thanks

EM
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top