Count Unique (30000 rows) using criteria - Please help!

R

Robert_L_Ross

In Sheet1 I have:
ColE ColB
1/1/08 00113800
1/1/08 02559000
1/2/08 00113800
1/2/08 02559000
1/2/08 02559000
1/3/08 00113800
1/3/08 02559000

In Sheet2 I Need:
ColA ColB
1/1/08 2
1/2/08 2
1/3/08 2

How can I count unique values on one sheet using criteria from another?
Also, this needs to work on 30,000 rows on Sheet1. I tried a few solutions
from what I found on the newsgroups (using arrays) and it locked up Excel.

Thanks!
 
J

Joel

If you sort sheet1 with Column E as 1st key and Column B as 2nd key then run
code below it will create the output you are looking for.

Sub getsummary()

OldDate = ""
OldRowCount = 1
NewRowCount = 1
Unique = 0
With Sheets("Sheet1")
Do While .Range("E" & OldRowCount) <> ""
CurrentDate = .Range("E" & OldRowCount)
Num = .Range("B" & OldRowCount)
If CurrentDate = OldDate Then
If Num <> OldNum Then
Unique = Unique + 1
OldNum = Num
End If
Else
Unique = 1
OldDate = CurrentDate
OldNum = Num
End If

NewDate = .Range("E" & (OldRowCount + 1))
If CurrentDate <> NewDate Then
With Sheets("Sheet2")
.Range("A" & NewRowCount) = OldDate
.Range("B" & NewRowCount) = Unique
NewRowCount = NewRowCount + 1
End With
End If
OldRowCount = OldRowCount + 1
Loop
End With
 
R

Robert_L_Ross

I don't need the dates on Sheet 2 created, I need it to use the date it finds
on Sheet 2 as the criteria for the count of column B on Sheet 1.

And I need to try to keep it a function, not VB code (our IT group hates us
using VB - they don't support it).
 
T

T. Valko

With 30k rows of data just about any formula using built-in functions will
be slow to calculate. If you'd rather not use a macro are you open to using
an add-in?
 
H

Harlan Grove

Robert_L_Ross said:
In Sheet1 I have:
ColE      ColB
1/1/08   00113800
1/1/08   02559000
1/2/08   00113800
1/2/08   02559000
1/2/08   02559000
1/3/08   00113800
1/3/08   02559000

If your source data above is sorted on col E, then the most efficient
way to handle this uses additional columns of formulas in Sheet1. If
columns X, Y and Z are available, your first row of data is 2 and your
last is 30001, enter the following formulas.

X30002:
=ROW()

Y30002:
=X30002

X2:
=IF(E2<>E1,ROW(),"")

Y2:
=IF(N(X2),IF(N(X3),X3,Y3)-1,IF(N(X3),X3,Y3))

Z2:
=IF(E2<>E1,SUMPRODUCT(1/COUNTIF(INDEX(B:B,X2):INDEX(B:B,Y2),INDEX
(B:B,X2):INDEX(B:B,Y2))),"")

Fill X2:Z2 down into X3:Z30001.
In Sheet2 I Need:
ColA      ColB
1/1/08   2
1/2/08   2
1/3/08   2
....

From your follow-up post it seems col A in Sheet2 is already filled.
If so, the B2 formula for the date in A2 is given by the formula

B2:
=VLOOKUP(A2,Sheet1!E:Z,22,0)

This is the most recalc efficient way to do this I've found. It'll
still be quite slow with 30K records.

If you need to do this a lot, you need to show your IT people this
mess of formulas that you'd need to use and tell them that if they
provided you a decent SQL database (so NOT Access) to use, this could
be done simply as

SELECT ColE, COUNT(DISTINCT ColB) FROM Table GROUP BY ColE;

IOW, if they refuse to give you the best tool for the task, they
should have to support what you have to hack together. They should
welcome a decent VBA alternative to the mess of formulas above, but if
not, make it very clear that THEY get to maintain all these formulas
when you move on to greener pastures.
 
M

mike in texas

I'd suggest a pivot table on sheet 2. Set it up with dates in the rows and
use the "count" of date in the data field.

I did 25,000 records in a few seconds.

Good luck


Mike
 
H

Harlan Grove

mike in texas said:
I'd suggest a pivot table on sheet 2. Set it up with dates in the rows and
use the "count" of date in the data field.
....

Perhaps reading the OP's specs carefully would help.
....

Create a Pivot Table from the data in Sheet1 as you propose and the
result would be

1/1/08 2
1/2/08 3
1/3/08 2

Note the difference: Pivot Table would return 3 for the 1/2/08 date
while the OP wants 2. If you had read the OP's specs carefully (or
even the subject line), you might have noticed the bit about counting
UNIQUE (meaning counting distinct). There are only 2 distinct col B
values corresponding to the col E value 1/2/08 in the OP's original
data. How would you use a Pivot Table to return the DISTINCT count?
 
R

Robert_L_Ross

Harlan,

In an off way it would work if I chose to display the ColB results in the
Pivot Table...so I'd provide the date as the first column in the pivot table,
then the Lender ID (ColB) results...the pivot table would then be able to
provide a count of unique Lender ID results.

The only problem with using pivot tables is that the result I need to use as
a source for another sheet...a Pivot Table will vary wildly in the total
number of lines I get. I'm wondering can you use a Pivot Table as a source
for a VLOOKUP and have the Pivot Table act like a range or a named range? If
it can, would a VLOOKUP accept a named range instead of a 'hard' range?
=vlookup(a1,PivotTable1,2,false)
instead of
=vlookup(a1,b1:b30000,2,false)
 
H

Harlan Grove

Robert_L_Ross said:
In an off way it would work if I chose to display the ColB results in the
Pivot Table...so I'd provide the date as the first column in the pivot table,
then the Lender ID (ColB) results...the pivot table would then be able to
provide a count of unique Lender ID results.
....

So where did the OP ask for counts of each distinct col B value rather
than the count of distinct col B values? Why is a 2-step solution with
a pivot table as the first step OK if the second step is left
unmentioned?

Pivot tables are answering a different question than the one the OP
asked.
 
R

Robert_L_Ross

"Perhaps reading the OP's specs carefully would help."

Actually Harlan, if YOU read the OP closely, you would see that the actual
question was: "How can I count unique values on one sheet using criteria from
another?"

On the OP I didn't specify I needed it done in one step.

"Note the difference: Pivot Table would return 3 for the 1/2/08 date while
the OP wants 2. If you had read the OP's specs carefully (or even the subject
line), you might have noticed the bit about counting UNIQUE (meaning counting
distinct). There are only 2 distinct col B values corresponding to the col E
value 1/2/08 in the OP's original data. How would you use a Pivot Table to
return the DISTINCT count?"

Maybe you don't use Pivot Tables that much, but yes, the VALUE could return
3, but if you include Lender ID as part of the display, then it returns 2 per
lender ID. THAT'S the answer to "How would you use a Pivot Table to return
the DISTINCT count?"

Furthermore, you are claiming that building the Pivot Table is one step and
getting the Unique count is another step. If you had actually walked that
answer through, you would know that you can ask for the unique values to be
returned AS YOU BUILD THE PIVOT TABLE, so it's not 2 steps, it's the same
step.

You are tearing Mike's answer down because you don't agree with it, not
because it wouldn't give me the result I want. Mike's answer does give me
the result I want, but since it's in a Pivot Table it may not work for me.
Before you tear someone's answer down, think it through - just because it's
not your method doesn't mean it's not a valid way of doing something.
 
H

Harlan Grove

Robert_L_Ross said:
. . . if YOU read the OP closely, you would see that the actual
question was: "How can I count unique values on one sheet using criteria from
another?"

The fact that multiple worksheets may be involved is irrelevant for
formula-based solutions. The only difference between source data and
results in the same or different worksheets would be unnecessary vs
necessary, respectively, inclusion of the worksheet name in range
references.

So the original question boils down to how to count unique values,
which I interpreted to mean distinct values.
On the OP I didn't specify I needed it done in one step.

Fair enough. Would you like a solution that takes 100 steps?
Maybe you don't use Pivot Tables that much, but yes, the VALUE could return
3, but if you include Lender ID as part of the display, then it returns 2 per
lender ID.  THAT'S the answer to "How would you use a Pivot Table to return
the DISTINCT count?"

Include Col B as another row variable? You get the following result.

Count of ID
Date ID Total
01/01/2008  00113800 1
 02559000 1
01/01/2008 Total 2
01/02/2008  00113800 1
 02559000 2
01/02/2008 Total 3
01/03/2008  00113800 1
 02559000 1
01/03/2008 Total 2


Where do you get the distinct count without additional formulas
applied to the pivot table? And how complicated would those formulas
be?

Include col B as a column variable, and you get the following result.

Count of ID ID
Date  00113800  02559000
01/01/2008 1 1
01/02/2008 1 2
01/03/2008 1 1

Maybe getting closer. This would only require counting nonblank
columns corresponding to each date, but if there were 30K rows in the
source data, maybe there could be more than 255 distinct col B values,
in which case this could fubar in Excel 2003 and prior.

Perhaps you mean some other layout I'm not figuring out.
Furthermore, you are claiming that building the Pivot Table is one step and
getting the Unique count is another step.  If you had actually walked that
answer through, you would know that you can ask for the unique values to be
returned AS YOU BUILD THE PIVOT TABLE, so it's not 2 steps, it's the same
step.

Leading to 2 questions. First, where do you specify unique/distinct
values in the Pivot Table settings? Second, if this is so
straightforward, why'd you start this thread?
You are tearing Mike's answer down because you don't agree with it, not
because it wouldn't give me the result I want. . . .
....

Mike's response was incomplete. He didn't include the col B values.
That's an immaterial omission?
 
R

Robert_L_Ross

You are obviously not wanting to listen to what I'm trying to say, so I'm not
going to continue to clog the thread up arguing with you.
 
H

Harlan Grove

Robert_L_Ross said:
You are obviously not wanting to listen to what I'm trying to say, so I'm not
going to continue to clog the thread up arguing with you.
....

Translation: you found out there's no direct way to count DISTINCT
text values from a text field using a pivot table, so rather than
admit you've been wrong in your last few follow-ups you'll feign a
high minded disappointment.
 
H

Harlan Grove

Harlan Grove said:
...

Translation: you found out there's no direct way to count DISTINCT
text values from a text field using a pivot table, so rather than
admit you've been wrong in your last few follow-ups you'll feign a
high minded disappointment.

Also, there are a number of people who like to point out when I'm
wrong. Count the number of people other than yourself who are taking
issue with what I've stated in this thread.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top