Counting unique numbers

J

Juan Marin

Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin
 
T

T. Valko

Try this array formula** :

Data in the range A2:B15
Assuming no empty cells in column B

D2:Dn = 1001, 1003, 1004 etc

Array entered** in E2 and copied down as needed:

=SUM(IF(FREQUENCY(IF(A$2:A$15=D2,B$2:B$15),B$2:B$15),1))

** array formulas need to be entered using the key combination of
CTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFT
key then hit ENTER.
 
R

Ron Rosenfeld

Hello everyone,

I have a large set of data that has in one column, the company id
which repeats on consecutive rows for each year in the dataset. Then
in the next column, there's the ID of each segment within the company
(a number). I need to count for each company code, how many segments
(unique numbers) are there. The data for a small sample looks like
this:

COMP-ID SEG-ID
1001 1
1001 1
1001 1
1003 1
1003 1
1003 1
1003 1
1004 3
1004 4
1004 5
1004 3
1004 4
1004 5
1004 3

in this case I would need to generate a list like this

1001 1
1003 1
1004 3

which means that company 1001 has 1 segment, 1003 has 1 segment and
company 1004 has 3 segments (segments 3, 4 and 5).

I hope anybody can shed some light on how to tackle this one because I
haven't been able to figure something out. Thanks in advance,

Juan Marin

Add a column to your data.

If your has the column labels in A1:B1, and the data starts in A2, then

C1: Unique Segs
C2: =IF(SUMPRODUCT(($A$2:$A2=A2)*($B$2:$B2=B2))>1,0,1)

Fill down as far as needed.

Then Insert/Pivot Table.

COMP-ID to Row area
Unique Segs to Values or Data area

Make sure your "Unique Segs" value field setting is set to do a SUM
--ron
 
J

Juan Marin

Thank you so much Ron and Biff, both solutions work pretty well.
Nevertheless I'm encountering one more difficulty, my data set is
really large (+400,000 rows) so pasting down any of these formulas is
not practical because as I paste them down, they stay fixed to the
first cell of the range. However, I know that the data is ordered, and
that in any case, if I were able to look just 100 rows forward and 100
rows ahead, that would be enough. I've been trying to fix both
formulas with ADDRESS() in such a way that I could vary the initial
and ending rows in such way, but I haven't been able to do so
effectively. Any thoughts? Thank you again.

JM
 
T

T. Valko

my data set is really large (+400,000 rows)

Yeah, that could slow things down when you're counting uniques on that size
range!
I know that the data is ordered

Despite the length of this formula it's much better! It narrows down the
range to only those cells that are of the same ID.

Array entered**

=SUM(IF(FREQUENCY(IF(INDEX(A:A,MATCH(D2,A:A,0)):INDEX(A:A,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)=D2,INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)),INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)),1))

If you can download and install the free Morefunc add-in from:

http://xcell05.free.fr/morefunc/english/index.htm

Alternate download site:

http://www.download.com/Morefunc/3000-2077_4-10423159.html

Then you can use this slightly shorter array entered** version:

=SUM(IF(FREQUENCY(IF(INDEX(A:A,MATCH(D2,A:A,0)):INDEX(A:A,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1)=D2,SETV(INDEX(B:B,MATCH(D2,A:A,0)):INDEX(B:B,MATCH(D2,A:A,0)+COUNTIF(A:A,D2)-1))),GETV()),1))** array formulas need to be entered using the key combination ofCTRL,SHIFT,ENTER (not just ENTER). Hold down both the CTRL key and the SHIFTkey then hit ENTER.--BiffMicrosoft Excel MVP"Juan Marin" <[email protected]> wrote in messageThank you so much Ron and Biff, both solutions work pretty well.> Nevertheless I'm encountering one more difficulty, my data set is> really large (+400,000 rows) so pasting down any of these formulas is> not practical because as I paste them down, they stay fixed to the> first cell of the range. However, I know that the data is ordered, and> that in any case, if I were able to look just 100 rows forward and 100> rows ahead, that would be enough. I've been trying to fix both> formulas with ADDRESS() in such a way that I could vary the initial> and ending rows in such way, but I haven't been able to do so> effectively. Any thoughts? Thank you again.>> JM
 
L

Lori

With this amount of data, perhaps you could use the
remove duplicates Excel 07 command or advanced filter for
unique values and then use a PivotTable or formula?

Otherwise try the Import Data command
(under Data > Import External Data in XL02 menus)

Select Excel files, locate the current file and click Open.
Choose the Data range eg Sheet1 (or a named range) then OK.
Now Choose Edit Query, Command Type: SQL and Command Text:

SELECT [COMP-ID],COUNT(*) AS [Segs] FROM
(SELECT DISTINCT * FROM [Sheet1$]) GROUP BY [COMP-ID]

Click OK, select the output cell and click OK again.

For 65536 rows and around 10000 COMP-IDs this takes only a
second to run in tests, other solutions froze the application.
 
B

Bernd P

Hello Lori,

400,000 records are admittedly asking for a database approach.

I optimized the memory strategy of my UDF Pfreq and its working now
for 65,536 rows with 10,000 different items. Runtime on my dualcore
proc: about 6 seconds.

Regards,
Bernd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top