Filtering Column Duplicates

C

Craig Freeman

Good day,

I'm attempting to extract duplicate comma-delimited text strings (cell
character length greater that 255 - limitation of countif) found in a
column and return those results in a separate worksheet and in
separate rows for each duplicate found in the original column.

For example: (keep in mind, this example is less than 255 characters
per cell, but I need this to work for cell over 255 characters)

Sheet1

A1 : Horse,cow,pig,
A2 : Pig,cat,horse
A3 : Cow,dog,cat

Would return:

Sheet2

A1 : Horse
A2 : Cow
A3 : Pig
A4 : Cat
A5 : Dog

Any VBA miracle minds attempting this one, will have my deepest
gratitude.


thanks,
 
T

Tom Ogilvy

Modification of this code from John Walkenback's site. Assumes xl2000 or
later

http://j-walk.com/ss/excel/tips/tip47.htm

Sub RemoveDuplicates()
Dim AllCells As Range, Cell As Range
Dim NoDupes As New Collection
Dim i As Integer, v As Variant

' The items are in A1:A105
Set AllCells = Range("A1:A105")

' The next statement ignores the error caused
' by attempting to add a duplicate key to the collection.
' The duplicate is not added - which is just what we want!
On Error Resume Next
For Each Cell In AllCells
v = Split(cell,",")
for i = lbound(v) to ubound(v)
v(i) = Replace(v(i),".","")
v(i) = Replace(v(i),"?","")
v(i) = Replace(v(i),"!","")
NoDupes.Add v(i), CStr(v(i))
' Note: the 2nd argument (key) for the Add method must be a string
Next i
Next Cell

' Resume normal error handling
On Error GoTo 0
redim v(1 to NoDupes.count, 1 to 1)
' Sort the collection (optional)
For i = 1 To NoDupes.Count - 1
v(i,1) = NoDupes(i)
Next i

With Range("J1").Resize(Nodupes.count,1)
.Value = v
.sort Key1:=Range("J1"), Order1:=xlAscending, _
header:=xlNo
end With
End Sub

Code has not been tested and may contain typos.
 
K

keepITcool

My solution is very similar to Tom's post.

however I've used a Dictionary object (from Scripting Runtime) rather
then a Collection. I find it more versatile and in many cases it
outperforms the collection, certainly when used WITH a reference to the
library and Dim'ed as dictionary



Here goes:
Option Explicit

Const DELIM = ","

Sub Parser()
Dim arr As Variant
Dim itm As Variant
Dim rSrc As Range
Dim rCel As Range
Dim oDic As Object

'We're using an unbound Scripting Dictionary object
'Preferably add reference to Microsoft Scripting Runtime and use
'Dim oDic As Dictionary
'Set oDic = New Dictionary

Set oDic = CreateObject("Scripting.Dictionary")

'Dictionary s/b case INsensitive
oDic.CompareMode = TextCompare

'Set source (Limited to usedrange)
Set rSrc = Worksheets(1).Columns(1)
Set rSrc = Intersect(rSrc, rSrc.Parent.UsedRange)

'Ignore errors (thrown on duplicate entries)
On Error Resume Next

For Each rCel In rSrc.Cells
With rCel
If Not IsNumeric(.Value) Then
arr = Split(.Value, DELIM)
For Each itm In arr
oDic.Add itm, itm
Next
End If
End With
Next

oDic.Remove vbNullString
On Error GoTo 0

With oDic
ReDim arr(1 To .Count, 1 To 1)
Dim n As Long
itm = .Items
For n = 1 To .Count - 1
arr(n, 1) = itm(n - 1)
Next
End With

With Worksheets(2)
.UsedRange.ClearContents
.Cells(1, 1).Resize(oDic.Count, 1).Value = arr
.Cells(1, 1).Sort .Cells(1), xlAscending
End With

End Sub


--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


Craig Freeman wrote :
 
C

Craig Freeman

Hi Tom,

Thanks for your help. Your modified code works well...but there seems
to be a couple of anomalies with its results. The first, is that for
some reason the code seems to ignores the last string (integer) in the
last row of the range...? Not sure why. The second anomaly is a
little harder to explain, so I'll give the scenario and the results in
the following examples:

Example 1:

cow,pig,dog
cow,pig,dog

returns:

cow
pig

Example 2:

cow,pig,dog
cow,pig,dog
dog

returns:

cow
pig

Example 3:

cat,dog
cow,pig,dog
cow,pig,dog

returns:

cat
cow
dog

I could put a few more examples of different combinations and the
results, but I don't want to confuse it too much. God knows I've
been scratching my head to find a commonality between the results.
Any ides..?

Craig Freeman
 
C

Craig Freeman

Hello keepITcool,

Thanks for taking the time to reply to my question.

I'm not sure if it's me, or some problem with the code, but I'm
gettting a 'compile error - variable not defined' with line 25:
'oDic.CompareMode = TextCompare'. Is there something that I need to do
to make this work...possible the something with 'Scripting
Dictionary'..? Any ideas?

thanks again,
Craig Freeman
 
K

keepITcool

Craig.

Yep

I made my code "latebound" but easily convertable to "earlybound",
however I forgot to change the constant to it's corresponding value.

You'll have to change the constant TextCompare to 1,
(or 0 for BinaryCompare => case sensitive).

OR

You'll have better performance with the early bound variant,
by adding the reference, and swapping the commented lines.
for DIM and SET


--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


Craig Freeman wrote :
 
C

Craig Freeman

keepITcool,

Cool...that did the trick.

Interestingly enough, the same anomalies that I commented on with Tom's
modified code, also appeared with your suggestion..? I'd like to
understand why this is..? The workaround seems to be adding an
erroneous row of data at the end of the worksheet.
You'll have better performance with the early bound variant,
by adding the reference, and swapping the commented lines.
for DIM and SET

Sorry, I'm very new to this game and still learning. I've read that
'Late-bound' means that the interface is not known at compile time, and
that 'Early-bound' is known at compile time - and as you stated, this
increases performance. Kinda of like stating a fact before the
question is asked. What I'm not sure of, is the adding the reference
part and which commented lines for DIM and SET to switch. Apologies if
this is obvious.

thanks again,
Craig Freeman
 
K

keepITcool

Craig,

Anomaly is in fact a programming error, causing the last item added to
the dictionary to be discarded in the writeback to the array.

line For n = 1 To .Count - 1
s/b For n = 1 To .Count

Re latebound:
good to see you've done a bit of homework, and understood.
I'll rephrase what I thought obvious, but apparently wasn't...

add reference to Microsoft Scripting Runtime

line dim oDic as Object
s/b dim oDic as Dictionary

line set oDic = CreateObject(etc)
s/b set oDic = new dictionary


ofcourse changing the 0 (back) to TextCompare is not needed,
but may make the code easier to read.

HTH

--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


Craig Freeman wrote :
 
C

Craig Freeman

keepITcool,

That did it...in fact it fixed it for both solutions. You've been a
big help. Thank you again!

Craig Freeman
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top