How to extract capitalized group in text string

N

newbie

How would I extract the capitalized, 3 character group from a text string?

I have 283 rows of text of varying lengths, each with a 3 character upper
case string imbedded at any point, followed by more characters. I would like
to isolate the 3 characters of upper case text and place them in a single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare
 
R

Ron Rosenfeld

How would I extract the capitalized, 3 character group from a text string?

I have 283 rows of text of varying lengths, each with a 3 character upper
case string imbedded at any point, followed by more characters. I would like
to isolate the 3 characters of upper case text and place them in a single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare

You can do it with a User Defined Function:

<alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.

To extract the first "stand-alone" three character code, use the following
formula:

=ReExtr(A1,"\b[A-Z]{3}\b")

==============================
Option Explicit
Function ReExtr(str As String, sPattern As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
If re.test(str) = True Then
Set mc = re.Execute(str)
ReExtr = mc(0).Value
End If
End Function
===============================

--ron
 
N

newbie

I inserted the code in VBA, inserted =ReExtr(A1,"\b[A-Z]{3}\b") at B1, and
got #NAME
--
newwbie


Ron Rosenfeld said:
How would I extract the capitalized, 3 character group from a text string?

I have 283 rows of text of varying lengths, each with a 3 character upper
case string imbedded at any point, followed by more characters. I would like
to isolate the 3 characters of upper case text and place them in a single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare

You can do it with a User Defined Function:

<alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.

To extract the first "stand-alone" three character code, use the following
formula:

=ReExtr(A1,"\b[A-Z]{3}\b")

==============================
Option Explicit
Function ReExtr(str As String, sPattern As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
If re.test(str) = True Then
Set mc = re.Execute(str)
ReExtr = mc(0).Value
End If
End Function
===============================

--ron
 
N

newbie

I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left, whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need GCE)

--
newwbie


Ron Rosenfeld said:
How would I extract the capitalized, 3 character group from a text string?

I have 283 rows of text of varying lengths, each with a 3 character upper
case string imbedded at any point, followed by more characters. I would like
to isolate the 3 characters of upper case text and place them in a single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare

You can do it with a User Defined Function:

<alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.

To extract the first "stand-alone" three character code, use the following
formula:

=ReExtr(A1,"\b[A-Z]{3}\b")

==============================
Option Explicit
Function ReExtr(str As String, sPattern As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
If re.test(str) = True Then
Set mc = re.Execute(str)
ReExtr = mc(0).Value
End If
End Function
===============================

--ron
 
R

Rick Rothstein \(MVP - VB\)

Here is UDF (implement it the same way you did Ron's UDF)...

Function ThreeUpperCaseLetters(R As Range) As String
Dim X As Long
Dim C As Range
Dim Words() As String
If R.Count = 1 Then
For Each C In R
Words = Split(" " & C.Value & " ")
For X = UBound(Words) To 0 Step -1
If Words(X) Like "[A-Z][A-Z][A-Z]" Then
ThreeUpperCaseLetters = Words(X)
Exit Function
End If
Next
Next
Else
ThreeUpperCaseLetters = Range("A0")
End If
End Function

If there are more than one, the UDF always returns the last grouping of
three-upper-case letters in the cell's text. It generates a #VALUE! error if
you specify a range consisting of more than one cell (I wanted to force a
#REF error, but I wasn't sure how to do that).

Rick


newbie said:
I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of
the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left,
whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need GCE)

--
newwbie


Ron Rosenfeld said:
How would I extract the capitalized, 3 character group from a text
string?

I have 283 rows of text of varying lengths, each with a 3 character
upper
case string imbedded at any point, followed by more characters. I would
like
to isolate the 3 characters of upper case text and place them in a
single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare

You can do it with a User Defined Function:

<alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.

To extract the first "stand-alone" three character code, use the
following
formula:

=ReExtr(A1,"\b[A-Z]{3}\b")

==============================
Option Explicit
Function ReExtr(str As String, sPattern As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
If re.test(str) = True Then
Set mc = re.Execute(str)
ReExtr = mc(0).Value
End If
End Function
===============================

--ron
 
R

Rick Rothstein \(MVP - VB\)

I should mention, in case your examples were not representative, that the
group of three-upper-case letters that the UDF finds can be located anywhere
in the text, but if it is not located at the beginning or end, then it must
be separated from the other text by one or more spaces.

Rick

Rick Rothstein (MVP - VB) said:
Here is UDF (implement it the same way you did Ron's UDF)...

Function ThreeUpperCaseLetters(R As Range) As String
Dim X As Long
Dim C As Range
Dim Words() As String
If R.Count = 1 Then
For Each C In R
Words = Split(" " & C.Value & " ")
For X = UBound(Words) To 0 Step -1
If Words(X) Like "[A-Z][A-Z][A-Z]" Then
ThreeUpperCaseLetters = Words(X)
Exit Function
End If
Next
Next
Else
ThreeUpperCaseLetters = Range("A0")
End If
End Function

If there are more than one, the UDF always returns the last grouping of
three-upper-case letters in the cell's text. It generates a #VALUE! error
if you specify a range consisting of more than one cell (I wanted to force
a #REF error, but I wasn't sure how to do that).

Rick


newbie said:
I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of
the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left,
whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need
GCE)

--
newwbie


Ron Rosenfeld said:
On Tue, 26 Aug 2008 18:52:00 -0700, newbie
<[email protected]>
wrote:

How would I extract the capitalized, 3 character group from a text
string?

I have 283 rows of text of varying lengths, each with a 3 character
upper
case string imbedded at any point, followed by more characters. I would
like
to isolate the 3 characters of upper case text and place them in a
single
column. Some examples are:

iPath S&P GSCI Crude Oil Tot Ret Idx OIL Specialty - Natural Resources
UltraShort Financials ProShares SKF Specialty - Income holdings
PowerShares DB Energy DBE Specialty - Healthcare

You can do it with a User Defined Function:

<alt-F11> opens the VB Editor.

Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.

To extract the first "stand-alone" three character code, use the
following
formula:

=ReExtr(A1,"\b[A-Z]{3}\b")

==============================
Option Explicit
Function ReExtr(str As String, sPattern As String) As String
Dim re As Object, mc As Object
Set re = CreateObject("vbscript.regexp")
re.Pattern = sPattern
If re.test(str) = True Then
Set mc = re.Execute(str)
ReExtr = mc(0).Value
End If
End Function
===============================

--ron
 
R

Ron Rosenfeld

I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left, whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need GCE)

Glad you figured out where to place the code.

There are a variety of methods of changing this so it will pick up the LAST
instance.

One method is to change the pattern in the argument:

"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)"

That translates into a command to

return any three-character uppercase string
that is not followed by another three-character upper case string

So your formula would now look like:

=ReExtr(A1,"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)")



--ron
 
N

newbie

That works very well.
Thank you very much.
--
newbie


Ron Rosenfeld said:
I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left, whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need GCE)

Glad you figured out where to place the code.

There are a variety of methods of changing this so it will pick up the LAST
instance.

One method is to change the pattern in the argument:

"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)"

That translates into a command to

return any three-character uppercase string
that is not followed by another three-character upper case string

So your formula would now look like:

=ReExtr(A1,"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)")



--ron
 
N

newbie

I assumed as much. I tried modifying the command with what I could gather
from other threads on similar objectives. I'm not VBA savvy, so I wasn't
successful. Thank you very much.
--
newwbie


Ron Rosenfeld said:
I didn't follow your instructions too well the first time. By going
INSERT/MODULE, it works. Now I found a different instance where some of the
text strings have more than one occurrence of 3 uppercase characters in
varying locations. The code returns the first instance from the left, whereas
I need the last instance in the string. (first from the right).

Claymore CEF GS Connect ETN GCE Asset Allocation (returns CEF, need GCE)

Glad you figured out where to place the code.

There are a variety of methods of changing this so it will pick up the LAST
instance.

One method is to change the pattern in the argument:

"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)"

That translates into a command to

return any three-character uppercase string
that is not followed by another three-character upper case string

So your formula would now look like:

=ReExtr(A1,"\b[A-Z]{3}\b(?!.*\b[A-Z]{3}\b)")



--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top