How to count matching text

G

Guest

Let's say you have a list of hundreds of confirmation numbers, they can be
text, numeric or alpa-numeric, how can I know how many of the entries are
exact duplicates vs how many are different. Example:
HVT88947
HVT88947
HVT88948
HVT88948
9784268
712589
DDKLM
DDKLM
HVT88946
712589
Ok, we have 10 entries above. Four of them are duplicate entries, six of
them are different entries, right? Ok, so lets say you now have thousands of
these entries and you know there are lots of duplicates somewhere in the
list. How can you calculate how many are duplicate entries and how many are
different (or original entries)? And you need to have the calc work so that
as people add more entries the calc always updates the number of duplicates
there are in the growing list?
 
R

R.VENKATARAMAN

try this vba

try(just copy) this vba code and see whether you get what you want
I am not an expert. statements following single quotation are comments.

Public Sub test()
'your data range is A2 to A11
'FIRST SORT THE DATA
Dim cell As Range
Dim i As Integer
i = 1
'NOTE in the next statement it starts from A3 and ends in A12
For Each cell In Range("a3:a12")
cell.Activate
If cell = cell.Offset(-1, 0) Then
i = i + 1
Else
ActiveCell.Offset(-1, 1) = i
i = 1
End If
Next cell
End Sub
 
B

Biff

Hi!

This all depends on what you mean by "exact" duplicate.

HVT88947
Hvt88947

Would those be exact duplicates or is that not an issue?

Here's how you can calc duplictes and uniques.

First, create a dynamic named range for your list. Assume
your list starts in A1.

Goto Insert>Name>Define
In the Name box, enter a name for your list. I'll call it
LIST.
In the Refers To box, enter this formula:

=OFFSET(A1,0,0,COUNTA(A:A),1)

Click Add then OK.

Now, to find the unique values in the list enter this
formula: (assume you enter this formula in B1)

=SUMPRODUCT((List<>"")/COUNTIF(List,List&""))

To find the number of duplicates enter this formula:

=COUNTA(LIST)-B1

All of this assumes that there are no blank cells in the
list!

Biff
 
A

Alan Beban

If the functions in the freely downloadable file at
http://home.pacbell.net/beban are available to your workbook

=COUNTA(B:B)-COUNTA(ArrayUniques(B:B)) if HVT and Hvt are not duplicates;

=COUNTA(B:B)-COUNTA(ArrayUniques(B:B,FALSE)) if they are.

Alan Beban
 
M

Max

Just another option to tinker around with ..
which also extracts both the list of uniques
and the list of duplicates for reference ..

Meanings:
--------------
Uniques = 1st instance of the item in the list,
Duplicates = 2nd, 3rd, etc instances of the item in the list
There's no case sensitivity distinction for uniques,
i.e. HVT = HvT = hvt (all are considered identical, not unique)

Assume the data is in Sheet1, in A2 down
HVT88947
HVT88947
HVT88948
HVT88948
9784268
712589
DDKLM
DDKLM
HVT88946
712589
etc

Put in B2:
=IF(OR(COUNTIF($A$2:A2,A2)>1,A2=""),"",ROW())

Put in C2:
=IF(OR(COUNTIF($A$2:A2,A2)<2,A2=""),"",ROW())

Select B2:C2, fill down to say, C100
(Let's take a small list length of ~100 items)

In Sheet2
-------------

Select A2:A100
(# of rows similar to cols B and C in Sheet1)

Put in the formula bar:

=IF(ISERROR(MATCH(SMALL(Sheet1!B:B,ROW(A1:A100)),Sheet1!B:B,0)),"",INDEX(She
et1!A:A,MATCH(SMALL(Sheet1!B:B,ROW(A1:A100)),Sheet1!B:B,0)))

Array-enter the formula with CTRL+SHIFT+ENTER
instead of just pressing ENTER

The above extracts the list of uniques from Sheet1's col A

Select B2:B100
(# of rows similar to cols B and C in Sheet1)

Put in the formula bar:

=IF(ISERROR(MATCH(SMALL(Sheet1!C:C,ROW(A1:A100)),Sheet1!C:C,0)),"",INDEX(She
et1!A:A,MATCH(SMALL(Sheet1!C:C,ROW(A1:A100)),Sheet1!C:C,0)))

Array-enter the formula with CTRL+SHIFT+ENTER

The above extracts the list of duplicates from Sheet1's col A

Put in A1: =SUMPRODUCT(--(A2:A100<>""))
A1 returns the # of uniques

Put in B1: =COUNTA(Sheet1!A:A)-A1
B1 returns the # of duplicates

For the sample data in A2:A11 in Sheet1
(as per your post,
with A1:B1 assumed empty, no headers)

you'll get in col A:

6 (# of uniques)
HVT88947
HVT88948
9784268
712589
DDKLM
HVT88946

and in col B:

4 (# of duplicates)
HVT88947
HVT88948
DDKLM
71258

Test out the returns above
by adding some new unique & duplicate items
in Sheet1's col A
(you can even leave blank cells in-between entries)

--
The drawback of the above set-up
is the performance hit [calc speed]
as the range involved increases

The formulas could all be adapted to suit
say, a big input range of A2:A5000 in Sheet1
but the response time would suffer quite a bit

One way to mitigate the performance hit would be to
switch the calc mode to "Manual", via:
Tools > Options > Calculation tab > Check "Manual" > OK

And then press F9 to recalc after entries in col A
are done per input session
 
M

Max

Clarification:
Put in B1: =COUNTA(Sheet1!A:A)-A1
B1 returns the # of duplicates

If A1 in Sheet1 contains a col header,

Put instead in B1: =COUNTA(Sheet1!A:A)-A1-1
(subtract "1" for the header)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top