Extract Unique Values, Then Extract Again to Remove Suffixes

K

Karl Burrows

I have a list of builders where I want to extract all the unique values.
Using =IF(COUNTIF($A$1:A1,A1)=1,A!,"") works fine to get the first set of
unique values, but I need to extract it a bit further. The builder names
may be Ryan 50, Ryan 60 or just Ryan. I need to combine those values to
come up with the final list of unique builders that just says Ryan. I'm not
sure if I can strip off anything from the end as some are Ryan - Greenbrier.
I thought maybe removing everything to the right of a blank space, but those
entries have 2 blanks.

Any help would be greatly appreciated! Thanks!
 
G

Guest

Try using the filter (data > filter) and then in the drop down list, select
"custom" and enter the custom feature you're looking for eg. contains "ryan".
 
K

Karl Burrows

It can't use a filter, it has to be derived from a formula. It drives other
reporting. In addition, there are other builders, so they would have to be
filtered as well. I almost need something like reverse concatenation.

Try using the filter (data > filter) and then in the drop down list, select
"custom" and enter the custom feature you're looking for eg. contains
"ryan".
 
A

aaron.kempf

what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron
 
K

Karl Burrows

It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home

What I am doing is about impossible in Access, as it is pulling lot data
into a formatted report spread over a 6 year rolling period. I would still
have the same issue in Access as well if I were to query the data for these
values.

Thanks!

what do you mean by reverse concatenation?

ps - you should be storing DATA in a DATABASE and not in excel.

-Aaron
 
G

Guest

I note that there is a space between the unique information eg. "ryan" and
the other data on the end eg "townhomes"....so try this:

1. highlight the column with the data in it
2. Go to DATA > TEXT TO COLUMNS, then make sure "delimited" is selected and
hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then FINISH.

Can you let me know if this works?
 
B

Bob Phillips

Karl,

You could use a helper column to get the sans number values

=SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))
,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))))*1,"")

and then count uniques here. The formula is an array formula, so commit with
Ctrl-Shift-Enter.
 
D

Domenic

Assuming that Column A contains your data, enter the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

=INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$D$3,A1)),0))

....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
and KB Home.

Hope this helps!
 
A

Alan Beban

Where does this assume the OP's data is, and in what row(s) of the
helper column is it to be array entered?

Alan Beban
 
K

Karl Burrows

I have tried text to columns, but it tries to split out the formula instead
of the value in the cell (even if I create a new cell and convert it to
text). Then, there are some builders that have several spaces in the name
(KB Home 60). I tried to use LEFT to pull out to the next blank space, but
that only pulled KB!

I note that there is a space between the unique information eg. "ryan" and
the other data on the end eg "townhomes"....so try this:

1. highlight the column with the data in it
2. Go to DATA > TEXT TO COLUMNS, then make sure "delimited" is selected and
hit NEXT, then make sure that "SPACE" is checked, then hit NEXT, then
FINISH.

Can you let me know if this works?
 
A

Alan Beban

Well, perhaps I'm not being imaginative enough, but it is difficult for
me to conceive of a formula that will be able to treat Ryan Townhomes
like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

Alan Beban
 
K

Karl Burrows

That is really close! It works on about 50% of the builder names.
Examples:

Works on:
Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50

Does not work on:
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

Should convert to:
C.P. Morgan
Lennar Homes
McCar Homes
Mulvaney

Strange how it works on some similar and not others. This is really good!
Thanks!

Assuming that Column A contains your data, enter the following formula
that needs to be confirmed with CONTROL+SHIFT+ENTER in B1 and copy down:

=INDEX($D$1:$D$3,MATCH(TRUE,ISNUMBER(SEARCH($D$1:$D$3,A1)),0))

....where D1:D3 contains the values to extract, such as Ryan, Mulvaney,
and KB Home.

Hope this helps!
 
K

Karl Burrows

That works in some values, but not others. Pulls KB Home as KB, Lennar
Homes as Lennar, Saussy Burbank as Saussy, D.R. Horton as D.R. It does work
on the ones that are only one name builders like Mulvaney, though.

This is a tough one!

Karl,

You could use a helper column to get the sans number values

=SUBSTITUTE(A2,MID(A2,MATCH(FALSE,ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))
,0),10-SUM(1*ISERROR(1*MID(A2,ROW(INDIRECT("1:10")),1))))*1,"")

and then count uniques here. The formula is an array formula, so commit with
Ctrl-Shift-Enter.
 
K

Karl Burrows

It is tough when builder names vary from one word names to initials and name
and the multiple word names.

Well, perhaps I'm not being imaginative enough, but it is difficult for
me to conceive of a formula that will be able to treat Ryan Townhomes
like a duplicate of Ryan and not treat KB Home like a duplicate of KB.

Alan Beban
 
H

Harlan Grove

Karl Burrows wrote...
It is in a database, but am using Excel to pull some formatted reports.
Here is a better example, for multiple builders:

Ryan
Ryan Townhomes
Ryan 60'
Mulvaney - Greenbrier
Mulvaney - 80
KB Home
KB Home 70'

I need to extract the unique values to give me:

Ryan
Mulvaney
KB Home
....

It's not too difficult to get Ryan and KB Home using the following udf,
which returns an array.


Function foo(r As Range) As Variant
Dim d As Object
Dim c As Variant, x As Variant, t As String

Set d = CreateObject("Scripting.Dictionary")

For Each c In r
t = c.Text
If d.Exists(t) Then d.Item(t) = d.Item(t) + 1 _
Else d.Add Key:=t, Item:=1
Next c

For Each c In d.Keys
For Each x In d.Keys
If x <> c And x Like c & "*" Then d.Remove Key:=x
Next x
Next c

foo = Application.WorksheetFunction.Transpose(d.Keys)
End Function


But reducing out Mulvaney is much more difficult in general because
left substrings could be common to several distinct records, e.g.,

John Smith Builders
John Smith & Sons Construction

There may be approaches you could take using fuzzy string matching, but
you may find it expedient to use the udf above to filter out most
'duplicates', then remove the remaining ones manually.
 
D

Domenic

If A2:A14 contains...

Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

....and D2:D10 contains...

Beazer
C.P. Morgan
D.R. Horton
KB Home
Lennar
McCar
Mulvaney
Ryan
Westminster

....enter the following formula in B2, and copy down...

=INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2:$D$10,A2)),0))

....confirmed with CONTROL+SHIFT+ENTER. You should get the following
results...

Beazer
D.R. Horton
Ryan
Ryan
Ryan
Westminster
KB Home
C.P. Morgan
Lennar
McCar
Mulvaney
Mulvaney
Mulvaney

Then, if you want a unique list -- it only just dawned on me that this
is probably what you want :) -- you can either use 'Advanced Filter'
and check 'Unique records only' or use the following formula...

C2, copied down until you get #N/A:

=INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14),0))

....confirmed with CONTROL+SHIFT+ENTER.

Hope this helps!
 
K

Karl Burrows

Problem is I don't have a column of named builders like you have in D2:D10.
Maybe this would be easier. Here is the exact list of builders they have.
This can also change as new subdivisions are added, but the list is
populated from an Access database. Builders may have more than one type of
lot in one subdivision or may have lots in multiple subdivisions, which is
what creates the need to identify builders this way for reporting. I don't
think this is going to be possible since Excel would have no way of knowing
that "Greenbrier" was not part of the builder name.

Beazer Homes
Beazer Homes 50'
Beazer Homes 60'
C.P. Morgan 40'
C.P. Morgan 50'
D.R. Horton
D.R. Horton
D.R. Horton 40
D.R. Horton 50
KB Home
KB Home 40
KB Home 50
Lennar Homes 40
Lennar Homes 50
Lennar Homes 55
Lennar Homes 60
McCar Homes
McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes
Pulte
Pulte
Ryan
Ryan - Aldridge
Ryan - Greenbrier
Ryan 60
Ryan 66
Ryan Townhomes
Ryan Townhomes
Saussy Burbank
Unsold Townhome Lots
Westminster
Westminster Townhomes


Thanks!

If A2:A14 contains...

Beazer Homes 50' & 60'
D.R. Horton 40 & 50
Ryan - Greenbrier
Ryan - Aldridge
Ryan Townhomes
Westminster Townhomes
KB Home 40 & 50
C.P. Morgan 40' & 50'
Lennar Homes 40, 50, 55 & 60
McCar Homes & McCar Townhomes
Mulvaney 60
Mulvaney Homes
Mulvaney Townhomes

....and D2:D10 contains...

Beazer
C.P. Morgan
D.R. Horton
KB Home
Lennar
McCar
Mulvaney
Ryan
Westminster

....enter the following formula in B2, and copy down...

=INDEX($D$2:$D$10,MATCH(TRUE,ISNUMBER(SEARCH($D$2:$D$10,A2)),0))

....confirmed with CONTROL+SHIFT+ENTER. You should get the following
results...

Beazer
D.R. Horton
Ryan
Ryan
Ryan
Westminster
KB Home
C.P. Morgan
Lennar
McCar
Mulvaney
Mulvaney
Mulvaney

Then, if you want a unique list -- it only just dawned on me that this
is probably what you want :) -- you can either use 'Advanced Filter'
and check 'Unique records only' or use the following formula...

C2, copied down until you get #N/A:

=INDEX(B2:$B$14,MATCH(0,COUNTIF($C$1:C1,B2:$B$14),0))

....confirmed with CONTROL+SHIFT+ENTER.

Hope this helps!
 
D

Domenic

Karl Burrows said:
Problem is I don't have a column of named builders like you have in D2:D10.

I should have said, 'enter a list of builders in D2:D10', as described.
So with the list of builders you've now provided, enter the following
list in D2:D13...

Beazer
C.P. Morgan
D.R. Horton
KB
Lennar
McCar
Mulvaney
Pulte
Ryan
Saussy
Unsold
Westminster

....and change the references in the formula accordingly. Does this help?
 
A

Alan Beban

I don't know why you picked "Greenbrier" as a problem. Harlan Grove
provided code that would eliminate it when operating on the list below.
The real problem seems to be that your list of "builders names" does not
include all the names of the builders. In particular, it does not include

C.P. Morgan
Lennar Homes
McCar
Mulvaney

each of which appears from your descriptions to be a builder's name.
(I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
name.)

Alan Beban
 
K

Karl Burrows

I think we have gotten away from the actual problem. It doesn't matter who
the builder is (I only originally posted a partial list of builders), I want
to programmatically or through a formula remove all the uncommon information
from any builder that may be there at some point in time. Builders may have
a suffix to further define the lots they own in a neighborhood, so it
doesn't matter who is in my list now. I also don't want to create a list of
the builders to match to the list to create the unique values. I am doing
that already and want to eliminate that portion of the worksheet input.

It boils down to taking a column of builder names and stripping off the data
that is not common to all values for that builder, whether that means
removing 60', - 60, - Greenbrier, or any other designation they may come up
with. I think that is where the hang up is. Unless you can compare all the
builder values and then say well these are pretty much alike other than the
"60" at the end or "townhome" or other, so let's remove that portion. I
haven't had a chance to try to coding and maybe that is what it does.

Does all this make sense? Thanks for everyone's patience.

I don't know why you picked "Greenbrier" as a problem. Harlan Grove
provided code that would eliminate it when operating on the list below.
The real problem seems to be that your list of "builders names" does not
include all the names of the builders. In particular, it does not include

C.P. Morgan
Lennar Homes
McCar
Mulvaney

each of which appears from your descriptions to be a builder's name.
(I am assuming that "Unsold Townhome Lots" is equivalent to a builder's
name.)

Alan Beban
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top