Find position of first non-matching character

G

Greg Lovern

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?


Thanks,

Greg
 
G

Gary''s Student

Try this small UDF:

Function divergence(r1 As Range, r2 As Range) As Integer
v1 = r1.Value
v2 = r2.Value
divergence = 0
i = Application.WorksheetFunction.Max(Len(v1), Len(v2))
For j = 1 To i
ch1 = Mid(v1, j, 1)
ch2 = Mid(v2, j, 1)
If ch1 = ch2 Then
Else
divergence = j
Exit Function
End If
Next
End Function

If the two strings exactly match, it will return a zero. It will also catch
case changes as well.
 
C

Charles Williams

You could try this UDF:

Option Explicit

Public Function MissMatch(String1 As Variant, String2 As Variant)
Dim aByte1() As Byte
Dim aByte2() As Byte

Dim k1 As Long
Dim k2 As Long


aByte1 = CStr(String1)
aByte2 = CStr(String2)
MissMatch = 0
If Len(CStr(String1)) > 0 Then
For k1 = LBound(aByte1) To UBound(aByte1) Step 2
k2 = k2 + 1
If k1 > UBound(aByte2) Then
MissMatch = k2
Exit For
Else
If aByte1(k1) <> aByte2(k1) Then
MissMatch = k2
Exit For
End If
End If
Next k1
End If
End Function


Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com
 
G

Greg Lovern

Thanks, but I should have mentioned that I can do it in a UDF too.

What I want is to do it with Excel's built-in worksheet functions. Can
it be done that way?


Thanks,

Greg
 
B

Bernie Deitrick

Greg,

Array enter (enter using Ctrl-Shift-Enter):

=SUM(MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),1)<>MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2))))

With those strings in cells A2 and A3.

HTH,
Bernie
MS Excel MVP
 
B

Bernie Deitrick

Greg,

Yes, it can be done with an array function. See my post.

HTH,
Bernie
MS Excel MVP
 
G

Greg Lovern

Hi Bernie,

Thanks again -- BTW, I'm not sure what the SUM is doing; it seems to
work as well for me like this:

=MIN(IF(MID(A2,ROW(INDIRECT("A1:A"&LEN(A2))),
1)<>MID(A3,ROW(INDIRECT("A1:A"&LEN(A2))),
1),ROW(INDIRECT("A1:A"&LEN(A2))),LEN(A2)))


Thanks,

Greg
 
B

Bernie Deitrick

Greg,

You're right - <Blush> I modified another function that used the SUM and forgot to take it off of
there...

Bernie
MS Excel MVP
 
R

Ron Rosenfeld

I have two strings that start with the same characters, but at some
unknown and varying point they begin to differ. I want the position
number of the first non-matching character.

For example:
-- table and chair
-- table plus chair
-- position of first non-matching character is 7 (the "a" in
"and", and the "p" in "plus").

In the above example, both strings start out "table ", then start to
differ beginning with the 7th character. How can I return the position
of the first non-matching character, which in this example is 7?


Thanks,

Greg

You could use this **array** function.

Make sure the "99" is greater than the length of your longest string.

=MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$99),1),0)

To enter an **array** function, hold down <ctrl><shift> while hitting <enter>.
Excel will place braces {...} around the formula.

--ron
 
G

Greg Lovern

Excellent, thanks! :)

Greg


You could use this **array** function.

Make sure the "99" is greater than the length of your longest string.

=MATCH(FALSE,MID(A1,ROW($1:$99),1)=MID(A2,ROW($1:$99),1),0)

To enter an **array** function, hold down <ctrl><shift> while hitting <enter>.
Excel will place braces {...} around the formula.

--ron
 
R

Ron Rosenfeld

Excellent, thanks! :)

Greg

You're welcome. Glad to help.

By the way, the formula I gave you will return a #NA error if the two strings
match exactly. There are a variety of methods of handling this issue if that
is not desired.

Probably the simplest is to just embed it in an IF statement. e.g:

=IF(A1=A2,"Match",formula)

Also note that the formula, as provided, is NOT case sensitive. If you wanted
a case-sensitive formula, and also to check for a complete Match, then
something like:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW(1:99),1),MID(A2,ROW(1:99),1)),0))

should work.
--ron
 
R

Ron Rosenfeld

Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1)),0))



--ron
 
H

Harlan Grove

Ron Rosenfeld said:
Should be:

=IF(EXACT(A1,A2),"Strings Match",MATCH(FALSE,
EXACT(MID(A1,ROW($1:$99),1),MID(A2,ROW($1:$99),1)),0))

Variation on this. First, it's convenient to use a defined name like
seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,256,1))

or also define a name like NCHARS referring to 256 or something else
and define seq as

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,NCHARS,1))

Then use an array formula like

=MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1),0)

for case insensitive matches or

=MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",seq,1)),0)

for case sensitive matches. These return LEN(A1)+1 if A1 = A2.
 
R

Ron Rosenfeld

Variation on this. First, it's convenient to use a defined name like
seq referring to a formula like

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,256,1))

or also define a name like NCHARS referring to 256 or something else
and define seq as

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,NCHARS,1))

Then use an array formula like

=MATCH(FALSE,MID(A1&"x",seq,1)=MID(A2&"y",seq,1),0)

for case insensitive matches or

=MATCH(FALSE,EXACT(MID(A1&"x",seq,1),MID(A2&"y",seq,1)),0)

for case sensitive matches. These return LEN(A1)+1 if A1 = A2.


Certainly a different approach, to return LEN(A1)+1 if A1=A2

I have a question, Harlan:

If using a defined name like seq to return a sequential array of numbers, what
is the advantage of using your longer formula:

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,256,1))

versus

=ROW($1:$256)

At least in Excel 2007, it seems that your formula requires that Sheet1 exist
in the workbook at the time the defined name is created or else it will give a
#REF error.
--ron
 
T

T. Valko

Ron Rosenfeld said:
Certainly a different approach, to return LEN(A1)+1 if A1=A2

I have a question, Harlan:

If using a defined name like seq to return a sequential array of numbers,
what
is the advantage of using your longer formula:

=ROW(INDEX(Sheet1!$1:$65536,1,1):INDEX(Sheet1!$1:$65536,256,1))

versus

=ROW($1:$256)

At least in Excel 2007, it seems that your formula requires that Sheet1
exist
in the workbook at the time the defined name is created or else it will
give a
#REF error.
--ron

My interpretation....

ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of
that but is volatile:

ROW(INDIRECT("1:256"))

ROW(INDEX(....)) is an alternative to using INDIRECT.
 
R

Ron Rosenfeld

My interpretation....

ROW($1:$256) is susceptible to row insertions. Using INDIRECT takes care of
that but is volatile:

ROW(INDIRECT("1:256"))

ROW(INDEX(....)) is an alternative to using INDIRECT.

Good point. Thanks
--ron
 
L

Lori

That's true but there are several other issues with the defined name method:

- Using a reference to all of the sheet is less efficient since the formula
recalculates whenever any change is made anywhere on the same sheet.

- Dynamic ranges recalculate at startup, causing the save changes prompt to
appear by default when a workbook is closed (even if nothing has changed).

- If you delete all cells on the sheet by right-clicking the top left corner
and clicking delete, the name will return #ref!.

A simple way around might be to refer to a hidden sheet where no changes are
made but this causes additional overhead.

Another option is to use an active sheet name: =row(!$1:$256)
This is more efficient for recalculation and robust to row insertions, etc.
but can cause an error when copied between workbooks.

You could also try using a relatively defined name: =row(1:256)-row(1:1) but
this causes wraparound problems when you reach the end.

In summary all methods have drawbacks and the straightforward method often
works fine provided few changes are planned.
 
H

Harlan Grove

Lori said:
That's true but there are several other issues with the defined name method:

And there are counterissues when using alternatives.
- Using a reference to all of the sheet is less efficient since the formula
recalculates whenever any change is made anywhere on the same sheet.

As also happens when using INDIRECT. However, formulas referring to
the entire worksheet don't recalculate when making changes in OTHER
worksheets, but formulas using INDIRECT still do.

There's no way to eliminate problems with row/column insertion/
deletion other than using INDIRECT or INDEX:INDEX.
- Dynamic ranges recalculate at startup, causing the save changes prompt to
appear by default when a workbook is closed (even if nothing has changed).

Same using either INDIRECT or INDEX:INDEX, but there's no way to avoid
this AND protect against row/column insertion/deletion. It's a trade-
off, but if macros were allowed, this could be addressed by something
like this.

Private Sub Workbook_Open()
Me.Saved = True
End Sub
- If you delete all cells on the sheet by right-clicking the top left corner
and clicking delete, the name will return #ref!.

Yes, and that's the ONLY way to affect references to $1:$65536 (Excel
97 to Excel 2003) other than deleting the entire worksheet. If you
have users that make a habit of deleting all cells in worksheets, you
have my sympathy, but in my experience this happens rarely enough that
it doesn't require special handling. But if you must guard against
this, once again event handlers to the rescue.

Private Sub Workbook_SheetSelectionChange(ByVal Sh As Object, ByVal
Target As Range)
Application.EnableCancelKey = xlDisabled
Application.EnableEvents = False
If Target.Address(0, 0, xlA1, 0) = "1:65536" Then
MsgBox Prompt:="You may not select the entire worksheet.", _
Title:="No! No! No! Bad user!!"
Target.Cells(1, 1).Select
End If
Application.EnableEvents = True
Application.EnableCancelKey = xlInterrupt
End Sub
A simple way around might be to refer to a hidden sheet where no changes are
made but this causes additional overhead.

This is a good idea. There are usually many things that could and
should be stored in hidden worksheets, and the overhead isn't all that
much as long as you don't to any formatting in that hidden worksheet.
Another option is to use an active sheet name: =row(!$1:$256)
This is more efficient for recalculation and robust to row insertions, etc.
but can cause an error when copied between workbooks.

There are other problems besides copying between workbooks.
You could also try using a relatively defined name: =row(1:256)-row(1:1) but
this causes wraparound problems when you reach the end.

This is also susceptible to selecting all cells and deleting.
In summary all methods have drawbacks and the straightforward method often
works fine provided few changes are planned.

Using a hidden worksheet is the best of these possibilities. The
overhead should be negligible compared to the benefits.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top