Unique entries spread over several columns

M

moily

Hi there

I've posted a few entries similar to this but they keep getting buried and I
don't have a viable solution as yet. Apologies if this reposting has annoyed
anyone and if you can think of another way to handle the issue I'd welcome
the suggestion!

I’d first like to start out by thanking anyone who can help me out with this
issue but that, for many of my own reasons which are very valid, I cannot use
a VBA solution – only a formula based solution.

I have vastly simplified the amount of data for this example – this is
usually over 500 lines for each column and I have about 40 columns, some of
the columns stand alone and there are a few clumps like this one where seven
or three or five columns relate to each other. This means I cannot use any
solutions that amalgamate all columns into one giant one column which is then
searched for unique entries (this would create 2100-2800 lines with
potentially more), I must have one column of unique entries that is acheived
with no helper columns.

In the below example I have four columns of data. Most entries overlap
throughout the columns but I need to find every unique entry in all of these
columns. My results need to be shown in a neat column with no spaces in
between entries. The stand alone columns are simple to obtain the unique
entries but the complication comes when I need to extend this over several
columns. Most of the unique entries are found in the first column but there
can be a few others scattered in the other columns. Due to the amount of
data and the size of the spreadsheet I MUST have this in a very neat one
column solution. I’ve given an example below of the column structure and the
result I need. I’ve also given an example of the formulas I currently use
for the stand alone columns. I would really appreciate it if anyone can
assist in this!!!

Column structure and desired result example:

Column A Column B Column C Column D
Good Very bad Very bad Very good
Very bad Very good Good Good
Ok Ok Bad Bad
Good Good Extremely good Moderate
Good Ok Bad Extremely
bad
Bad Bad Ok Ok


Unique entries:
Good
Very bad
Ok
Good
Bad
Very good
Extremely good
Moderate
Extremely bad


Stand alone column array formulas (using Ctrl+shift+enter)
{=IF(INDEX(X2:X495,MATCH(0,-ISBLANK(X2:X495),0))=".","EXTRA",INDEX(X2:X495,MATCH(0,-ISBLANK(X2:X495),0)))}


{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI2,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI2,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI2,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI2,0)),0)),""))}

{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI3,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI3,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI3,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI3,0)),0)),""))}

{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI4,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI4,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI4,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI4,0)),0)),""))}

{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI5,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI5,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI5,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI5,0)),0)),""))}

{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI6,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI6,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI6,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI6,0)),0)),""))}

{=IF(IF(COUNT(MATCH(X$2:X$495,AI$2:AI7,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI7,0)),0)),"")=".","EXTRA",IF(COUNT(MATCH(X$2:X$495,AI$2:AI7,0))<COUNT(1/(X$2:X$495<>"")),INDEX(X$2:X$495,MATCH(0,(X$2:X$495<>"")-ISNA(MATCH(X$2:X$495,AI$2:AI7,0)),0)),""))}

Thanks a million!
Ann
 
D

Domenic

You say you cannot use VBA. How about using an add-in? If you download
and install the free add-in, Morefunc.xll, it would be relatively simple
and more efficient.

Otherwise, a list of unique values can be returned with the use of
defined names.

First, assuming that A2:F8 contains the data, and that the data contains
text values, define the following...

Insert > Name > Define

Name: Array1

Refers to:

=ROW(INDIRECT("1:"&ROWS(Sheet1!$A$2:$F$8)*COLUMNS(Sheet1!$A$2:$F$8)))

Click Add

Name: Array2

Refers to:

=INT((Array1-1)/COLUMNS(Sheet1!$A$2:$F$8))

Click Add

Name: Array3

Refers to:

=MOD((Array1-1),COLUMNS(Sheet1!$A$2:$F$8))

Click Add

Name: Array4

Refers to:

=T(OFFSET(Sheet1!$A$2:$F$8,Array2,Array3,1,1))

Click Ok

Then try the following...

H2, confirmed with CONTROL+SHIFT+ENTER:

=SUM(IF(FREQUENCY(IF(Array4<>"",MATCH("~"&Array4,Array4&"",0)),Array1),1)
)

I2, confirmed with CONTROL+SHIFT+ENTER, and copied down:

=IF(ROWS(I$2:I2)<=$H$2,INDEX(Array4,SMALL(IF(FREQUENCY(IF(Array4<>"",MATC
H("~"&Array4,Array4&"",0)),Array1),Array1),ROWS(I$2:I2))),"")
 
L

Lori

I'd use a pivot table for this with the multiple consolidation table option
with a blank row above and to the left of the data range, you can then
pivot the value field to a column to get a unique list and frequencies.

Here's an array formula approach, it'll be slow to recalc though.
With data range as B2:E7, enter using CTRL+SHIFT+ENTER the
formulas below and fill down as far as required.

G2:
=INDIRECT(TEXT(MIN(IF(COUNTIF(B$2:E$7,"<"&B$2:E$7)=0,
100000*ROW(B$2:E$7)+COLUMN(B$2:E$7))),"R0C00000"),0)

G3:
=INDIRECT(TEXT(MIN(IF(COUNTIF(B$2:E$7,"<"&B$2:E$7)=COUNTIF(B$2:E$7,
"<="&G2),100000*ROW(B$2:E$7)+COLUMN(B$2:E$7))),"R0C00000"),0)
 
M

moily

Domenic - this is brilliant thank you! I really really appreciate this!!!
Unfortunately, I can't use anything out of the ordinary for excel including
add-ins that would have to be downloaded. The pc prevents that as well as
the long-term requirements of the spreadsheet.

The one downside to it is that it does have a helper cell but since it's
only the one cell it's not to hard to work around! Would you happen to know
which (of yours or Lori's) would use the least amount of memory/be most
efficient?

Cheers,
Ann
 
M

moily

Hi Lori,

This is a brilliant solution too! Again, it's very very appreciated! It
ensures no helper cells/columns and even alphabetises the unique list!

The one downside to it is that it does return a REF! issue for the last
entry and consistently repeats the list over and over so you would have to
know how many unique entries to cut off the loop but that shouldn't be too
difficult to fix as long as a note is made to watch out for it! Would you
happen to know which (of yours or Domenic's) would use the least amount of
memory/be most efficient?

Many many thanks again!
Ann
 
D

Domenic

I'm not sure which of the two solutions is more efficient. However, it
looks like Lori's solution does not allow empty/blank cells.
 
H

Harlan Grove

moily said:
I have vastly simplified the amount of data for this example – this is
usually over 500 lines for each column and I have about 40 columns, some of
the columns stand alone and there are a few clumps like this one where seven
or three or five columns relate to each other.  This means I cannot use any
solutions that amalgamate all columns into one giant one column which is then
searched for unique entries (this would create 2100-2800 lines with
potentially more), I must have one column of unique entries that is acheived
with no helper columns.  
....

Hopefully you can use array formulas.

Here's another alternative that uses NO volatile functions. With the
data in a range named D and the topmost result in cell K3,

K3:
=T(D)

K4 [array formula]:
=IF(SUM(COUNTIF(K$3:K3,D))<COUNTA(D),INDEX(D,
MIN(IF(COUNTIF(K$3:K3,D)=0,ROW(D)*1000+COLUMN(D)))/1000,
MOD(MIN(IF(COUNTIF(K$3:K3,D)=0,ROW(D)*1000+COLUMN(D))),1000)),"")

Fill K4 down until the formula returns "". This returns the distinct
values in original order going across then down. Change the K4 formula
to

K4 [array formula]:
=IF(SUM(COUNTIF(K$3:K3,D))<COUNTA(D),INDEX(D,
MOD(MIN(IF(COUNTIF(K$3:K3,D)=0,ROW(D)+COLUMN(D)*1000)),1000),
MIN(IF(COUNTIF(K$3:K3,D)=0,ROW(D)+COLUMN(D)*1000))/1000),"")

to return the distinct values in original order going down then
across.
. . . Due to the amount of
data and the size of the spreadsheet I MUST have this in a very neat one
column solution. . . .
....

All purely formula solutions are VERY INEFFICIENT. This would be MUCH
SIMPLER and MORE EFFICIENT using multiple cell formulas per each
result. These additional cell formulas could be put in a hidden
worksheet if you must hide them from users at all costs. So what's the
overriding reason why you must have only one cell per result? You want
to minimize file size on disk? Multiple cells per result could use
less disk space if the formulas are much simpler/shorter. You want to
minimize RAM usage when the file is open in memory? The way to do that
is to minimize redundant calculations, and you do that in problems
like by using multiple cells per result. Even given the file storage
and RAM overhead from an additional hidden worksheet, it's likely
you'd achieve smaller file size and less RAM usage if you used
multiple cells per result.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top