Find the n-th occurance of a sustring within a string

T

TomCon

One can use SEARCH() or FIND() to find a string within another. And, if you
wanted to find the first few, you can nest the calls and it probably works
ok. But, this has its limits. Lets say you wanted to find the 10th
occurance of a particular string within a string, now the formula gets
really hairy, if you nest.

Any clever ideas out there to find the n-th occurance of a string within
another?

Thanks,
tom
 
M

macropod

Hi Tom,

Suppose the string you want to search is in A1, the substring you want to find is in A2 and the substring instance you want to find
is in A3, then the following formula will return the position where the nth occurrence of the substring being searched for starts:
=FIND(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

Cheers
 
S

Stan Brown

Sun, 22 Apr 2007 09:51:25 +1000 from macropod
Suppose the string you want to search is in A1, the substring you
want to find is in A2 and the substring instance you want to find
is in A3, then the following formula will return the position where
the nth occurrence of the substring being searched for starts:
=FIND(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

I tried this in Excel 2003 and got #VALUE.

I don't know of any way to do what the OP wants, without going to
VBA.

--
Stan Brown, Oak Road Systems Tompkins County, New York, USA
http://OakRoadSystems.com
You need any friends you can get. The only thing standing
between you and a watery grave is your wits, and that's not
my idea of adequate protection. -- /Beat the Devil/ (1954)
 
E

Earl Kiosterud

That is slick. It can be fooled, however, in the unlikely event any instance of the string
you're looking for happens to have the instance number following it. This might happen in
numbers combining alpha characters and digits (like part numbers). For example:

A1: CC3C
A2: C
A3: 3

It finds the second C, and returns 2.

You can use this modification:
=FIND(A2&"|"&A3,SUBSTITUTE(A1,A2,A2&"|"&A3,A3))

You use a character known never to appear in the data, such as the pipe character in this
example. The added character can be an unlikely string, like !#$%.

=FIND(A2&"!#$%"&A3,SUBSTITUTE(A1,A2,A2&"!#$%"&A3,A3))

This unlikely string occurs only when Sarge is swearing (and beating the stuffing out of
Beetle).
--
Earl Kiosterud
www.smokeylake.com

Note: Some folks prefer bottom-posting.
But if they bottom-post to a reply that's
already top-posted, the thread gets messy. When in Rome...

-----------------------------------------------------------------------
macropod said:
Hi Tom,

Suppose the string you want to search is in A1, the substring you want to find is in A2
and the substring instance you want to find is in A3, then the following formula will
return the position where the nth occurrence of the substring being searched for starts:
=FIND(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

Cheers
--
macropod
[MVP - Microsoft Word]
-------------------------

TomCon said:
One can use SEARCH() or FIND() to find a string within another. And, if you wanted to
find the first few, you can nest the calls and it probably works ok. But, this has its
limits. Lets say you wanted to find the 10th occurance of a particular string within a
string, now the formula gets really hairy, if you nest.

Any clever ideas out there to find the n-th occurance of a string within another?

Thanks,
tom
 
R

Ron Rosenfeld

One can use SEARCH() or FIND() to find a string within another. And, if you
wanted to find the first few, you can nest the calls and it probably works
ok. But, this has its limits. Lets say you wanted to find the 10th
occurance of a particular string within a string, now the formula gets
really hairy, if you nest.

Any clever ideas out there to find the n-th occurance of a string within
another?

Thanks,
tom

In general, to find the nth occurrence of one string in another, you can use
this formula:

=FIND(CHAR(1),SUBSTITUTE(String_to_search,String_to_find,CHAR(1),Occurrence_num))

However, you should also check to ensure that the nth occurrence exists. You
could just wrap the above in an IF(ISERR(...),"error text here",<formula>).

Or you could count the number of occurrences and return a message saying "only
N occurrences".


=IF(((LEN(String_to_search)-LEN(SUBSTITUTE(String_to_search,String_to_find,"")))/
LEN(String_to_find))<Occurrence_num,"Only "& (LEN(String_to_search)-LEN(
SUBSTITUTE(String_to_search,String_to_find,"")))/LEN(String_to_find) &
" occurrences in string",
FIND(CHAR(1),SUBSTITUTE(String_to_search,String_to_find,CHAR(1),Occurrence_num)))


--ron
 
M

macropod

Hi Earl,

Well spotted. I like your idea. To be on the safe side, one could use a single character that would be most unlikely to appear in
any 'normal' text string (eg an ASCII 135: ‡), without the need to concoct an 'unlikely' string - though I do accept that doing so
renders false matches much less likely.

Plus, in order to deal with errors, one could return '0', via:
=IF(ISERROR(FIND(A2&"‡"&A3,SUBSTITUTE(A1,A2,A2&"‡"&A3,A3))),0,FIND(A2&"‡"&A3,SUBSTITUTE(A1,A2,A2&"‡"&A3,A3)))

Cheers
--
macropod
[MVP - Microsoft Word]
-------------------------

Earl Kiosterud said:
That is slick. It can be fooled, however, in the unlikely event any instance of the string you're looking for happens to have the
instance number following it. This might happen in numbers combining alpha characters and digits (like part numbers). For
example:

A1: CC3C
A2: C
A3: 3

It finds the second C, and returns 2.

You can use this modification:
=FIND(A2&"|"&A3,SUBSTITUTE(A1,A2,A2&"|"&A3,A3))

You use a character known never to appear in the data, such as the pipe character in this example. The added character can be an
unlikely string, like !#$%.

=FIND(A2&"!#$%"&A3,SUBSTITUTE(A1,A2,A2&"!#$%"&A3,A3))

This unlikely string occurs only when Sarge is swearing (and beating the stuffing out of Beetle).
--
Earl Kiosterud
www.smokeylake.com

Note: Some folks prefer bottom-posting.
But if they bottom-post to a reply that's
already top-posted, the thread gets messy. When in Rome...

-----------------------------------------------------------------------
macropod said:
Hi Tom,

Suppose the string you want to search is in A1, the substring you want to find is in A2 and the substring instance you want to
find is in A3, then the following formula will return the position where the nth occurrence of the substring being searched for
starts:
=FIND(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

Cheers
--
macropod
[MVP - Microsoft Word]
-------------------------

TomCon said:
One can use SEARCH() or FIND() to find a string within another. And, if you wanted to find the first few, you can nest the
calls and it probably works ok. But, this has its limits. Lets say you wanted to find the 10th occurance of a particular
string within a string, now the formula gets really hairy, if you nest.

Any clever ideas out there to find the n-th occurance of a string within another?

Thanks,
tom
 
M

macropod

Hi Stan,

Are you sure you had at least 'n' occurrences of the substring in your search string? Note too that the test is case-sensitive. To
disregard the case, you could use:
=SEARCH(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

Cheers
 
B

Bernd

Hi Ron,

If we want to make (almost) sure that no error occurs I suggest to
use:
=FIND(CHAR(1),SUBSTITUTE(String_to_search&REPT(CHAR(1),Occurrence­
_num),String_to_find,CHAR(1),Occurrence­_num))

We just define that any result > length of String_to_search means "not
found".

Regards,
Bernd
 
S

Stan Brown

Sun, 22 Apr 2007 14:46:09 +1000 from macropod
Are you sure you had at least 'n' occurrences of the substring in
your search string?

Hmm ... I'm not sure what happened last time, but when I tried it
again it worked this time.
 
E

Earl Kiosterud

macropod,

I agree. A non-keyboard character like ? is indeed even better. You could use a string of
them to make it even more likely. To keep the formula from getting so long and unwieldy
(and make it easier to edit the formula), you could name the string with Insert- Name -
Define, giving it the name "US" . "US" stands for "unlikely string."

=FIND(D2& US &D3,SUBSTITUTE(D1,D2,D2& US &D3,D3))

You could use technique this in your error-checking formula.

We should work for the CIA.
--
Earl Kiosterud
www.smokeylake.com

Note: Some folks prefer bottom-posting. But if you bottom-post to a reply that's already
top-posted, the thread gets messy. When in Rome.

-----------------------------------------------------------------------
macropod said:
Hi Earl,

Well spotted. I like your idea. To be on the safe side, one could use a single character
that would be most unlikely to appear in any 'normal' text string (eg an ASCII 135: ?),
without the need to concoct an 'unlikely' string - though I do accept that doing so
renders false matches much less likely.

Plus, in order to deal with errors, one could return '0', via:
=IF(ISERROR(FIND(A2&"?"&A3,SUBSTITUTE(A1,A2,A2&"?"&A3,A3))),0,FIND(A2&"?"&A3,SUBSTITUTE(A1,A2,A2&"?"&A3,A3)))

Cheers
--
macropod
[MVP - Microsoft Word]
-------------------------

Earl Kiosterud said:
That is slick. It can be fooled, however, in the unlikely event any instance of the
string you're looking for happens to have the instance number following it. This might
happen in numbers combining alpha characters and digits (like part numbers). For
example:

A1: CC3C
A2: C
A3: 3

It finds the second C, and returns 2.

You can use this modification:
=FIND(A2&"|"&A3,SUBSTITUTE(A1,A2,A2&"|"&A3,A3))

You use a character known never to appear in the data, such as the pipe character in this
example. The added character can be an unlikely string, like !#$%.

=FIND(A2&"!#$%"&A3,SUBSTITUTE(A1,A2,A2&"!#$%"&A3,A3))

This unlikely string occurs only when Sarge is swearing (and beating the stuffing out of
Beetle).
--
Earl Kiosterud
www.smokeylake.com

Note: Some folks prefer bottom-posting.
But if they bottom-post to a reply that's
already top-posted, the thread gets messy. When in Rome...

-----------------------------------------------------------------------
macropod said:
Hi Tom,

Suppose the string you want to search is in A1, the substring you want to find is in A2
and the substring instance you want to find is in A3, then the following formula will
return the position where the nth occurrence of the substring being searched for starts:
=FIND(A2&A3,SUBSTITUTE(A1,A2,A2&A3,A3))

Cheers
--
macropod
[MVP - Microsoft Word]
-------------------------

One can use SEARCH() or FIND() to find a string within another. And, if you wanted to
find the first few, you can nest the calls and it probably works ok. But, this has its
limits. Lets say you wanted to find the 10th occurance of a particular string within a
string, now the formula gets really hairy, if you nest.

Any clever ideas out there to find the n-th occurance of a string within another?

Thanks,
tom
 
R

Ron Rosenfeld

Hi Ron,

If we want to make (almost) sure that no error occurs I suggest to
use:
=FIND(CHAR(1),SUBSTITUTE(String_to_search&REPT(CHAR(1),Occurrence­
_num),String_to_find,CHAR(1),Occurrence­_num))

We just define that any result > length of String_to_search means "not
found".

Regards,
Bernd

That's true, but I'm not sure if it is any more efficient than just:

=IF(ISERR(FIND(CHAR(1),SUBSTITUTE(String_to_search,String_to_find,CHAR(1),
Occurrence_num))),"not found",FIND(CHAR(1),
SUBSTITUTE(String_to_search,String_to_find,CHAR(1),Occurrence_num)))


--ron
 
H

Harlan Grove

Ron Rosenfeld said:
That's true, but I'm not sure if it is any more efficient than just:

=IF(ISERR(FIND(CHAR(1),SUBSTITUTE(String_to_search,String_to_find,CHAR(1),
Occurrence_num))),"not found",FIND(CHAR(1),
SUBSTITUTE(String_to_search,String_to_find,CHAR(1),Occurrence_num)))

If you want efficient, then simplify the 1st arg to IF.

=IF(SUBSTITUTE(string,substring,"",n)=string,"not found",
FIND(CHAR(1),SUBSTITUTE(string,substring,CHAR(1),n)))
 
H

Harlan Grove

Earl Kiosterud said:
I agree. A non-keyboard character like ? is indeed even better. You could use a
string of them to make it even more likely. . . . ....
....

If you want to protect against truly general strings, then no
contrived substring replacement would be foolproof.

The truly robust way to do this is to look at every substring of the
appropriate length. Easiest to start with a name like seq defined as

=ROW(INDEX($1:$65536,1,1):INDEX($1:$65536,1024,1))

and use the following formula to count the instances of the substring
sought (sss),

=SUMPRODUCT(--(MID(s,seq,LEN(sss))=sss))

and use the following array formula to return the position of the n_th
instance,

=MATCH(n,MMULT(--(seq>=TRANSPOSE(seq)),--(MID(s,seq,LEN(sss))=sss)),0)
 
R

Ron Rosenfeld

If you want efficient, then simplify the 1st arg to IF.

=IF(SUBSTITUTE(string,substring,"",n)=string,"not found",
FIND(CHAR(1),SUBSTITUTE(string,substring,CHAR(1),n)))

I like that. Thanks.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top