Counting patters

  • Thread starter mahadevan.swamy
  • Start date
M

mahadevan.swamy

Hi,

I have 32 columns which consists of 0's, 1's and 2's and i have many
rows that also consists of the same numbers but in a different order.
Now what I want to do is to count how many rows consists of the same
exact pattern of 0, 1 and 2. I tried using CONCATENATE function to
bring all the info from the 32 columns together and used countif
statement to count how many times this pattern is occuring. But this
doesnt seem right. I also wanted to sort these numbers so that I can
bring these patterns together and eliminate the repeated ones. But the
sort function is reading this concatenated number as text. How do i
convert this text string into a number?

A sample data:

11111111111111111111111111111110
11111111011110111111111101111110
11111111101101111111111110111110
00111111112211111111111111221101
11111111111111111111111111111111
11010111211111011211111111101110
11101011121111102111111111011110
11111111011110111111111111111110
11111111101101111111111111111110
00111111111111111111111111111011
00111001211111111121011001111101
00110110121111111112100110111101
11101111011110110121111111111110
11011111101101111012111111111110
11111111001110110101111111111110
11111111001101111010111111111110
00111111111111111111111111110011
00111111111111111111111111110011

The last two rows matches. Therefore, I want it to be counted and
sorted.

Thanks

Swamy









Is there a right solution for this problem
 
D

Dave Peterson

You'll want to keep it as text--and ignore any message from excel about sorting
text as numbers.

Excel can keep track of 15 digits.

If you convert them to text, then the leading 15 digits will look nice. The
last 17 will be 0's.

And your routine sounds like the way I'd do it, too.
 
P

Pete_UK

If you just had 1's and 0's you could treat each row as a binary
number and combine the digits with powers of 2 to form a number. As
you have 2's as well, though, then you would have to treat each row as
a tertiary number (is that the right term? ternary? base-3?) and use
powers of 3 to convert the 32 digits into an equivalent number. A UDF
could do this for you.

However, if you did this for a 32-digit base-3 number you would lose
some precision, as Excel can only handle 15 digits accurately. So, it
might be better to split the number and to deal with, say, 11 digits
at a time (10 digits for the last group), such that you will then end
up with 3 numbers per row in extra columns. You could then sort the
block of data (plus the three extra columns) using the three columns
as keys.

A variation of this is to take 4 digits at a time in your UDF (81
possible values) and to convert this into a single ASCI character and
then to concatenate the 8 characters together - these can then be
sorted.

It is then relatively easy to identify duplicates - either in the same
sheet with a formula like:

=IF(A2=A1,"dupl","unique") entered on row 2 and copied down,

or you might make use of Advanced Filter by copying the derived
numbers/text to a separate sheet and filtering to exclude duplicates.

As for the "right solution", your question implies that there is only
one, but there is usually a variety of ways of achieving something in
Excel - this is just one way.

Hope this helps.

Pete
 
P

Pete_UK

Further to this:

you can do this without a UDF by copying this formula into AG1:

=CHAR(A1*27+B1*9+C1*3+D1+32)&
CHAR(E1*27+F1*9+G1*3+H1+32)&
CHAR(I1*27+J1*9+K1*3+L1+32)&
CHAR(M1*27+N1*9+O1*3+P1+32)&
CHAR(Q1*27+R1*9+S1*3+T1+32)&
CHAR(U1*27+V1*9+W1*3+X1+32)&
CHAR(Y1*27+Z1*9+AA1*3+AB1+32)&
CHAR(AC1*27+AD1*9+AE1*3+AF1+32)

(all one formula - I've split it to avoid awkward line-breaks). This
assumes that your data occupies columns A to AF (32 of them) and
starts on row 1.

This will give you 8 characters (strange-looking combinations), and
you can now sort all 33 columns using AG as the key field.

Once sorted, you can enter this formula in cell AH2:

=IF(AG2=AG1,"duplicate","unique")

and copy down. Filter this column for "duplicate", then highlight all
the rows that are visible and Edit | Delete Row. Select All in the
filter pull-down and delete columns AG and AH to leave you with unique
patterns of your data.

Hope this helps.

Pete
 
G

Guest

Try this:

With your posted numeric table in cells A1:AF18

Then
AG1: =--(A1&B1&C1&D1&E1&F1&G1&H1&I1&J1&K1&L1&M1&N1&O1)
AH1: =--(P1&Q1&R1&S1&T1&U1&V1&W1&X1&Y1&Z1&AA1&AB1&AC1&AD1)
AI1: =--(AE1&AF1)

Copy those formulas down through row 18.

Next
AJ1: =SUMPRODUCT(($AG$1:$AG$18=AG1)*($AH$1:$AH$18=AH1)*($AI$1:$AI$18=AI1))
copy that formuls down through row 18
(That formula counts the occurrences of each pattern.)

Also,
you can sort the patterns numericly
by selecting A1:J18
and sorting on
Col_AG......ascending
Col_AH......ascending
Col_AI......ascending

Is that something you can work with?
***********
Regards,
Ron

XL2002, WinXP
 
M

mahadevan.swamy

Ron,
Thank your for your help. I did exactly as you told me and the number
of pattern occurences is not properly counted. When I see a repeat in
the pattern, it displays 0 for both of them. Moreover, I have about
1000 rows of data and it would be hectic to for me to each and every
single row and delete data from three cells as you described. I would
like to have one column cells to bring the numbers together and the
other column cells to count the occurences. This would be a quick
check. Also this occurences is an important number because I need to
know how many patterns of this type are there.

Pete, You are right about performing binary conversion on the 0's and
1's. I forgot to mention that the number 2 that you see was actually
"X" because I was trying to manipulate the text as a number.
Basically, there are 32 sensors that reads a part and if the part
covers the sensors, it displays a 1, if the part has some holes in it
and uncover the sensors, it displays a 0 and if the part partially
blocks the sensor, it displays an "X". I tried your solution which you
posted in your second post and it does give me weird characters and I
would like to know how many times it is occuring. So i used the
countif statement and i tested it. It returns me a wrong number for
some patterns. You were referring to UDF in your first post. What is
UDF?
Thanks for your help

Swamy
 
G

Guest

There must be a difference in the data you have versus what I copied from
your post. In my testing all duplicates were flagged as such.

If you find and correct the data anomaly,
try this modified formula:
AJ1: =SUMPRODUCT(($AG$1:$AG1=$AG1)*($AH$1:$AH1=AH1)*($AI$1:$AI1=AI1))

When copied down, the formula returns "OK" for the first occurrence of a
pattern and "DUPE" for all duplicates.

Here are my first 18 calculated values:
111,111,111,111,111 111,111,111,111,111 10 OK
111,111,110,111,101 111,111,111,011,111 10 OK
111,111,111,011,011 111,111,111,101,111 10 OK
1,111,111,122,111 111,111,111,112,211 1 OK
111,111,111,111,111 111,111,111,111,111 11 OK
110,101,112,111,110 112,111,111,111,011 10 OK
111,010,111,211,111 21,111,111,110,111 10 OK
111,111,110,111,101 111,111,111,111,111 10 OK
111,111,111,011,011 111,111,111,111,111 10 OK
1,111,111,111,111 111,111,111,111,110 11 OK
1,110,012,111,111 111,210,110,011,111 1 OK
1,101,101,211,111 111,121,001,101,111 1 OK
111,011,110,111,101 101,211,111,111,111 10 OK
110,111,111,011,011 110,121,111,111,111 10 OK
111,111,110,011,101 101,011,111,111,111 10 OK
111,111,110,011,011 110,101,111,111,111 10 OK
1,111,111,111,111 111,111,111,111,100 11 OK
1,111,111,111,111 111,111,111,111,100 11 Dupe


Does that help?
***********
Regards,
Ron

XL2002, WinXP
 
P

Pete_UK

Hi Swamy,

sorry, I missed the bit in your post that said you want to know the
number of duplicates - I concentrated on the part that said you want
to get rid of the duplicates.

If you do have 0, 1 and X as values, then you can copy the data to a
second sheet (or just copy the sheet by CTRL-drag of the sheet tab)
and then work on the copy. Highlight all the data in the 32 columns
and use Find & Replace (CTRL-H) to search for X and replace it with 2
- then my long formula will work for this copy of your data, and you
will still have the original data.

Before sorting, though, fix the values in column AG by highlighting
all the cells with the formula in and clicking <copy>, then Edit |
Paste Special | Values (check) | OK then <Enter>. Then copy these
fixed values back to the original sheet, which will be in the same
order, in column AG. It will also help to give a name to this range of
codes in the first sheet - highlight the codes in the original sheet
and Insert | Name | Define and call it something like "Codes".

Now you can sort the data in the second sheet as I described above,
and by using the formula in column AH you can reduce the list of codes
to uniques. Then in cell AH1 of the second sheet (to overwrite then IF
formula which is already there) you can enter this formula:

=COUNTIF(Codes,AG1)

and copy this formula down.

Now in your second sheet you will have a complete set of unique
patterns (you can apply Find and Replace again on the first 32 columns
to change the 2's back to X's if you wish), and in column AH you will
have a count of the number of each of them in the first sheet.

I think this gives you everything that you asked for.

By the way, UDF stands for User-Defined Function. You can define your
own functions in Excel and then use them in a worksheet like any other
functions, eg =my_function(A1:AF1), but I did point out in my second
post that you could achieve your outcome without using one.

Hope this helps further.

Pete
 
M

mahadevan.swamy

Hi, I would like to send my file so that you can make changes and
present me your solution. I would like to know your e-mail address so
that I can send you my file. Thanks

Swamy
 
P

Pete_UK

Ok. It is:

pashurst then the <at> thingy, followed by auditel.net

change the obvious.

Pete
 
P

Pete_UK

Thanks, Dave - I've thought about it, but my spam filter seems to do
its job.

Just to report back on this:

I applied my method to the file that Swamy sent to me and discovered
that the COUNTIF formula was double-counting as it couldn't
distinguish between upper case and lower case letters in the 8-
character codes, and also it was treating the symbols "?" and "*" as
wildcard characters. Consequently, I changed the formula to:

=COUNT(IF(EXACT(Codes,AI1),1))

which was array-entered, and copied this down. This gave the correct
total (982) compared with the number of records in the original file,
so the method seems to work (for anyone else who's interested).

Pete
 
D

Dave Peterson

Completely off-topic.

But if you don't munge it in the header, you're not buying much by doing it in
the body with this:

pashurst then the <at> thingy, followed by auditel.net
 
P

Pete_UK

I don't mind people emailing me directly, and many do - presumably
Swamy couldn't read my address or didn't know how to email direct (or
maybe he was being polite by asking first). I've noticed in other
posts that a full email address is often disguised like

<[email protected]>

and I didn't know if this is done by the newsgroups - if it is, then
by putting my proper address in the posting, Swamy still wouldn't be
able to see what it was.

Well, that was my logic, anyway !! <bg>

Pete
 
D

Dave Peterson

I getcha.

Google does a nice job trying to protect email addresses. (I could see your
email address in my newsreader with no problems. I just figured everyone used
the same interface as me <bg>.)

(And the OP was using google, too.)

In the words of Emily Litella: Never mind.

Just in case...
http://en.wikipedia.org/wiki/Emily_Litella
 
M

mahadevan.swamy

Well, actually I can find out the email address as Google asks me to
enter some characters to view email address but I had this experience
before as I emailed 2 or 3 people and I got back my email saying that
my message failed to reach the recipient. So I make it a point to ask
people here before I send them an email. Thanks a lot Pete for doing
the great work on the excel file.

Swamy
 
P

Pete_UK

You're welcome - thanks for feeding back.

Pete

Well, actually I can find out the email address as Google asks me to
enter some characters to view email address but I had this experience
before as I emailed 2 or 3 people and I got back my email saying that
my message failed to reach the recipient. So I make it a point to ask
people here before I send them an email. Thanks a lot Pete for doing
the great work on the excel file.

Swamy




...

read more »- Hide quoted text -

- Show quoted text -
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top