count unique records

K

kr

I have a list of information. I need to count the data per hour, but remove
dups and only count unique recs for the hour.
This is what I have so far.

A B
12:01 8888889
12:05 8888889
12:55 0009988
1:10 1122334
1:15 5559076
1:20 1122334
1:30 5678901
2:00 2345667
3:00 0987654

This is what I have so far to count the # of calls per hour
Column C
12:00 = sumproduct((A2:A10>=C2)*(A2:A10<C3) = 3
1:00
2:00

This formula is working great to return the # of values in Column B that
fall between the 2 times. However, I need to remove the duplicates in column
B from that hour to get a raw number - not one number counting more than
once. I've been trying to figure out how to use the COUNTIF but not getting it

Thank you!
Vs2007
 
L

Lars-Åke Aspelin

I have a list of information. I need to count the data per hour, but remove
dups and only count unique recs for the hour.
This is what I have so far.

A B
12:01 8888889
12:05 8888889
12:55 0009988
1:10 1122334
1:15 5559076
1:20 1122334
1:30 5678901
2:00 2345667
3:00 0987654

This is what I have so far to count the # of calls per hour
Column C
12:00 = sumproduct((A2:A10>=C2)*(A2:A10<C3) = 3
1:00
2:00

This formula is working great to return the # of values in Column B that
fall between the 2 times. However, I need to remove the duplicates in column
B from that hour to get a raw number - not one number counting more than
once. I've been trying to figure out how to use the COUNTIF but not getting it

Thank you!
Vs2007

Try this formula:

=SUMPRODUCT((A$2:A$10>=C3)*(A$2:A$10<C4)*(1/COUNTIF(B$2:B$10,B$2:B$10)))

Note: This formula will only work for duplicates within the same hour.
If there might be duplicates which are not within the same hour, you
need to specify for which hour it should be counted.

Hope this helps / Lars-Åke
 
K

kr

THANK YOU SOOOOO MUCH!!!!This really helped. I didnt understand why you
divided 1 but the count though? it kept giving me a fraction. I removed the
1 and just divided by the expresssion and it works perfect. PLEASE let me
know if i'm missing something. THANKS AGAIN!
 
L

Lars-Åke Aspelin

=SUMPRODUCT((A$2:A$10>=C2)*(A$2:A$10<C3)/(COUNTIF(B$2:B$10,B$2:B$10)))
and
=SUMPRODUCT((A$2:A$10>=C2)*(A$2:A$10<C3)*(1/COUNTIF(B$2:B$10,B$2:B$10)))

give the same result, but I chose the longer version as it shows
better the thinking behind the formula.
Each record that fulfils the time limits should only be counted with
1/n if there are n records with the same value in column B.
1/n times n will get 1, and that type of record will thus contribute
with 1 to the total result.

Btw, i hade to change 1:10 to 13:10 etc as I use 24 hour clock setting
on my computer. I assume that 1:10 is evaluated as greater than 12:55
on your computer. Otherwise there might be some problem.

Lars-Åke
 
T

T. Valko

Caveat:
=SUMPRODUCT((A$2:A$10>=C3)*(A$2:A$10<C4)*(1/COUNTIF(B$2:B$10,B$2:B$10)))

If there is a duplicate value for a different hour you'll get incorrect
results:

12:01 8888889
12:05 8888889
12:55 0009988
1:10 0009988

Counting for the 12PM hour, the above formula returns 1.5 while the correct
result should be 2.

Assuming the values in column B are numeric values formatted to display
leading zeros. Try this array formula** :

=COUNT(1/FREQUENCY(IF((A2:A10>=C3)*(A2:A10<C4),B2:B10),B2:B10))

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFT
key then hit ENTER.
 
L

Lars-Åke Aspelin

Good point. Without that said, we can not assume that there can't be
duplicate values for different hours.
And a very interesting use of the FREQUENCY function in your formula.
Can we challenge you to come up with a formula that can handle
general, alphanumeric values in column B.

Lars-Åke
 
T

T. Valko

This will handle both text and numbers:

12:01 X
12:05 8888889
12:55 X
1:10 0009988

Array entered** :

=COUNT(1/FREQUENCY(IF((A2:A10>=C3)*(A2:A10<C4),MATCH(B2:B10,B2:B10,0)),ROW(B2:B10)-MIN(ROW(B2:B10))+1))

That does not account for empty cells in column B.

You can reduce MIN(ROW(B2:B10)) to simply ROW(B2) but the longer expression
seems more "userproof".

While using (A2:A10>=C3)*(A2:A10<C4) gives you more flexibility to define a
time range, if you're interested in only the specific hourly counts then you
can use this:

=COUNT(1/FREQUENCY(IF(HOUR(A2:A10)=12,MATCH(B2:B10,B2:B10,0)),ROW(B2:B10)-MIN(ROW(B2:B10))+1))

But then you'd have to possibly account for empty cells in column A as an
empty cell would evaluate as hour 0 or 12AM.

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFT
key then hit ENTER.
 
L

Lars-Åke Aspelin

Very nice.
And I guess you can put in a (B2:B10<>"") factor and make it handle
also empty cells in column B (by ignoring them).

Lars-Åke
 
K

kr

Okay! Its taking me a little while but I will put that back in. When I did
it with 1/n I used the roundup to see if it would give whole numbers ...but
it didnt seem to work so let me try it againg. thanks a billion
 
K

kr

And concerning the clock, yes! it hit me after I went to bed why it wasn't
producing for the other hours. So i knew I needed to go back and change to
'army' time. but this is good confirmation
 
K

kr

okay, i get it! i was thinking the same, because there are dup #s in the
column but of course i AM wanting to count per the hour. so, if 5554444
shows up 4 times - 2 in 12:00 and 2 in 1:00, then i want it to show up 1 for
12:00 and 1 for 1:00. I will try this again with this new. and thanks for
the arry function, i saw it online but was not quite sure how it worked. I'm
new and not new to this! I don't have to work in i everyday but when I need
to do complicated reports... I know where to go for help in understanding the
logic! LOLthanks
 
K

kr

all right, u've got me stumped on this one. now, in column A, the data will
be formatted for time (army time). The data in column b is phone #s, but will
be formatted as text because text and numbers will both be present.

now the last formula is the greatest of all! would this be yoru final
analysis of the best choice to use???
b/c that is exactly what is taking place and probably would end up being the
best. Tahnk you for the great functions. sometimes it makes sense and some
times doesn't. would you mind explaing the "match"and "min" functions or is
there a better website with more in tell. this is really awesome. you're
going to make me shine at work!!!!! But ALL of the accolades is to you guys!
for real!!!!!
 
L

Lars-Åke Aspelin

One more thing.
From the OP it seem like there will be a sequence of times (hours) in
the C column, so it would be nice to make it possible to "copy down"
the formula.
For that we need to throw in some $ signs for the A and B ranges.

=COUNT(1/FREQUENCY(IF((A$2:A$10>=C3)*(A$2:A$10<C4),MATCH(B$2:B$10,B$2:B$10,0)),ROW(B$2:B$10)-MIN(ROW(B$2:B$10))+1))

Lars-Åke
 
T

T. Valko

now the last formula is the greatest of all! would this
be yoru final analysis of the best choice to use???

Yes, because:
The data in column b is phone #s, but will be formatted as text
would you mind explaing the "match"and "min" functions

The FREQUECY function counts numbers that fall into interval bins. Because
the uniques you want to count are TEXT we need to convert these values to
numbers that the FREQUECY function can handle. That's what we use the MATCH
function for.

MATCH returns a numeric value that represents a values *relative* position
within an array. For example:

A10 = X
A11 = Y
A12 = X
A13 = A

=MATCH("Y",A10:A12,0) = 2

The lookup_value "Y" is in the 2nd position within the array A10:A12.

We use MATCH to convert the text values to their *relative* numeric
positions within the array. For example:

A10 = X
A11 = Y
A12 = X
A13 = A

MATCH(A10:A13,A1:A13,0)

Returns this array: {1;2;1;4}

MATCH will "find" the first instance of the lookup_value and return the
position of the first instance. So:

lookup_value A10 X is in position 1 within the the array
lookup_value A11 Y is in position 2 within the the array
lookup_value A12 X is in position 1 within the the array
lookup_value A13 A is in position 4 within the the array

This is how we use the positions of the text entries as the numbers that
FREQUENCY can then use as the data_array:

FREQUENCY({1;2;1;4}.............

Now, we need interval bins that will match the *relative* positions of the
data in A10:A13. In other words, we have a range array that has 4 elements
so we need to generate an array that represents the relative position of
each element: {1;2;3;4}. That's what we use this for:

ROW(A10:A13)-MIN(ROW(A10:A13))+1))

ROW(A10:A13) returns the array {10;11;12;13}

MIN(ROW(A10:A13)) returns 10

So we subtract 10 then add 1 like this:

10-10+1 = 1
11-10+1 = 2
12-10+1 = 3
13-10+1 = 4

Now we have our array of *relative* position numbers that we use as the
interval bins in the FREQUENCY function:

FREQUENCY({1;2;1;4},{1;2;3;4})
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top