Find closest text match for each unique entry in a list

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Trying to find closest match in column B for each unique entry in column A.

I have a list like this (very long):
ParentSKU ManufacturerSKU
10028 A054150
10028 A054150
1002 AVCRGYGWT
1002 AVCRGNGGD
1002 AVCRBNGBZ
10000 2075492
10000 2075493
10000 2075492
9997 2075488
9997 2075499

For each unique "ParentSKU" in column A, I need to find the longest common
text among all of the same ParentSKU's in column B.

To demonstrate what result is desired, it's easy to eye it up and find the
answer. For the above, the following results are desired:
ParentSKU Answer
10028 A054150
1002 AVCR
10000 207549
9997 20754

Is there a way to do this in excel to avoid having to manually go through
the entire list and eye up each ParentSKU answer?

Each unique "ParentSKU" has between 1 and 54 rows and there are 29,000 total
rows. Additionally, "ManufacturerSKU's" can be either text, numbers, or a
combination of text and numbers.
 
try this

Sub getlongest()

StartRow = 1
Sh1Rowcount = 1
Sh2Rowcount = 1
With Sheets("Sheet1")
Do While .Cells(Sh1Rowcount, "A") <> ""
If .Cells(Sh1Rowcount, "A") <> _
.Cells(Sh1Rowcount + 1, "A") Then

ParentSKU = .Cells(Sh1Rowcount, "A")
Longstr = _
getlongeststr(StartRow, Sh1Rowcount)
With Sheets("Sheet2")
.Cells(Sh2Rowcount, "A") = _
ParentSKU
.Cells(Sh2Rowcount, "B") = _
Longstr
End With
StartRow = Sh1Rowcount + 1
Sh2Rowcount = Sh2Rowcount + 1
End If

Sh1Rowcount = Sh1Rowcount + 1
Loop

End With

End Sub
Function getlongeststr(StartRow, EndRow)

CharacterCount = 1
With Sheets("Sheet1")
Do While (True)

For RowCount = StartRow To (EndRow - 1)
a = Len(.Cells(RowCount, "B"))
b = Len(.Cells(RowCount + 1, "B"))
c = Mid(.Cells(RowCount, "B"), _
CharacterCount, 1)
d = Mid(.Cells(RowCount + 1, "B"), _
CharacterCount, 1)
If Len(.Cells(RowCount, "B")) < _
CharacterCount Then Exit Do

If Len(.Cells(RowCount + 1, "B")) < _
CharacterCount Then Exit Do

If Mid(.Cells(RowCount, "B"), _
CharacterCount, 1) <> _
Mid(.Cells(RowCount + 1, "B"), _
CharacterCount, 1) Then

Exit Do
End If
Next RowCount

CharacterCount = CharacterCount + 1
Loop
If CharacterCount = 1 Then
getlongeststr = ""
Else
getlongeststr = Left( _
.Cells(RowCount, "B"), _
CharacterCount - 1)
End If
End With
End Function
 
Assuming that A2:B11 contains the data, let D2:D5 contain 10028, 1002,
10000, and 9997. Then try the following...

E2, copied down:

=SUMPRODUCT(--($A$2:$A$11=$D2),--($B$2:$B$11<>""))

Then select F2 and define the following...

Insert > Name > Define

Name: SS

Refers to:

=INDEX($B$2:$B$11,MATCH(MIN(IF($A$2:$A$11=$D2,IF($B$2:$B$11<>"",LEN($B$2:
$B$11)))),IF($A$2:$A$11=$D2,LEN($B$2:$B$11)),0))

Click Add

Name: SL

Refers to:

=MIN(IF($A$2:$A$11=$D2,IF($B$2:$B$11<>"",LEN($B$2:$B$11))))

Click Add

Name: BigNum

Refers to:

=9.99999999999999E+307

Click Add

Name: Array1

Refers to:

=IF($A$2:$A$11=$D2,MID($B$2:$B$11,TRANSPOSE(ROW(INDIRECT("1:"&SL))),1),0)

Click Add

Name: Array2

Refers to:

=MID(SS,TRANSPOSE(ROW(INDIRECT("1:"&SL))),1)

Click Add

Name: Array3

Refers to:

=(Array1=Array2)+0

Click Add

Name: Array4

Refers to:

=MMULT(TRANSPOSE(ROW($A$2:$A$11)^0),Array3)

Click Ok

Then, enter the following formula in F2 and copy down:

=IF(N(E2),MID(SS,1,LOOKUP(BigNum,CHOOSE({1,2},SL,MATCH(TRUE,Array4<>E2,0)
-1))),"")

....confirmed with CONTROL+SHIFT+ENTER, not just ENTER.

Hope this helps!
 
Back
Top