Calculate Co-occurrence

J

jwang036

I have a table with the transaction records (currently 5000) of a store:

TranID ItemA ItemB ItemC ...

TranID=Text, one ID per visit
ItemX=1 or 0, if 1 ItemX is purchased, if 0 not purchased

What I want is a Co-occurrence matrix:

ItemA ItemB ItemC ...
ItemA AA AB AC
ItemB BA BB BC
ItemC CA CB CC

AA=all occurence of item A, how many times A appear in each ID
AB=all co-occurence of item A&B, how many times A &B (or B& A) appear at the
same time in each ID
No sequence difference, so AB=BA

I know I may should use Matlab etc. but it drives mad.

Thanks in advance!!!
 
P

Patrick Molloy

I'm sorry. Can you re-phrase keeping im mind that we don't have your workbook
 
J

jwang036

Typical record with column head:

column head: TranID ItemA ItemB ItemC ...
a0001 1 0 1
a0002 0 1 1
a0003 1 1 1

So there are 3 transactions: a0001 bought itemA & B, a0002 bought B&C, a0003
bought A&B&C

I need to calculate a table (Item # x item #, say we only have A,B,C, so the
table will be 3X3). Each cell is the occurence of the item or item
combination. eg: A-A=2, as A appear 2 twice in all records, A-B=1. because
A&B or B&A appear once in all records.

A B C
A 2 1 2
B 1 2 2
C 2 2 3

I have 28 items and thousands of records to run through.
 
C

Charabeuh

hello,
I suppose a record is composed of:
TextID,0,1,1,0.....
0 no ItemA was sold
1 ItemB was sold
1 ItemC was sold
0 no ItemD was sold
and so on....
I supposed that ItemA is always in the second column, ItemB is always in the
third column and so on...

1) I suppose your data begin at row 1 column A
(first row = name of your colum of data)
(second row and next rows ==> your data)

2) I suppose your data end at row 5000 column S

At row 5010 column B, copy the names of your items
(ItemA,ItemB.....) ==> end at column S
At column A from line 5011, copy the names of your items (ItemA,ItemB.....)
(use copy and paste special-transpose)

in cell row 5011, column B, put the formula :
=SUMPRODUCT(--(OFFSET($A$2:$A$5000;0;COUNTA($A$5011:$A5011))=1)*(B$2:B$5000=1))

copy this formula to the other cells of the co occurence table


This will be:
row 5010: blank, ItemA, ItemB, ItemC,,,,,,,
Column A from row 5010: blank, ItemA, ItemB, ItemC,,,,,,,
in the first cell (row 5011, column B):
=SUMPRODUCT(--(OFFSET($A$2:$A$5000;0;COUNTA($A$5011:$A5011))=1)*(B$2:B$5000=1))

Does this help you ?
 
J

jwang036

Brilliant! It works perfectly. Thanks!

BTW, I'm thinking about making a co-occurrence table based on a huge dataset
(may have 100K records). I'll use Access. Any chance to make this work in
Access?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top