Extracting word from phrase within column

K

KH_GS

Hi All,

I have a column containing phrases. I need to match each word of the
phrase that end with letter "d", copy such words and paste onto a new
column.

Do I have to delimit the words by spaces first so I have only 1 word in
each cell?

How can I do it?
 
T

Tim Williams

You could use a regexp to do this. Search for help on the VBscript RegExp
object.

Or just split on spaces and check each word to see if it's like "*d"
 
R

Ron Rosenfeld

Hi All,

I have a column containing phrases. I need to match each word of the
phrase that end with letter "d", copy such words and paste onto a new
column.

Do I have to delimit the words by spaces first so I have only 1 word in
each cell?

How can I do it?

Give some examples of cell contents and expected output.
Multiple words in each source cell or a single word per cell?

Also what kind of data size do you have (how many characters per cell; how many
cells, on average)?
--ron
 
K

KH_GS

I have data that filled the whole worksheet, like 65536 rows of data and
2nd worksheet almost filled up to the max too. Cell contents are words,
some single word and some are phrases.

Example:

Column A
red apple
green apple
green apple with seed
orange
pear

Basically I want to scan EVERY single word in each phrase in each cell
and then copy each word that meets my criteria on a new column. So, say
I want to look for words that end with "d", my output will be:

red
seed


More details, I will not know how many words does each cell contain,
therefore delimiting it might create many columns if for example one of
the cells contain a phrase of 10 words.
 
R

Ron Rosenfeld

I have data that filled the whole worksheet, like 65536 rows of data and
2nd worksheet almost filled up to the max too. Cell contents are words,
some single word and some are phrases.

Example:

Column A
red apple
green apple
green apple with seed
orange
pear

Basically I want to scan EVERY single word in each phrase in each cell
and then copy each word that meets my criteria on a new column. So, say
I want to look for words that end with "d", my output will be:

red
seed


More details, I will not know how many words does each cell contain,
therefore delimiting it might create many columns if for example one of
the cells contain a phrase of 10 words.


You can use the VBA Regular expressions module, but I have loaded and installed
Longre's free morefunc.xll add-in which is simpler for me to implement. It can
be distributed with any workbook, so you don't have to rely on users to install
it separately.

You can download it from http://xcell05.free.fr

If you don't install the addin (Tools/Addins) then you'll have to register it
to use it in VBA. See HELP for morefunc for instructions.

A VBA routine like the following will do what you describe. Should give you
some ideas to get started:

========================
Option Explicit

Sub EndWithD()
Dim c As Range
Dim output As Range
Dim wrd As String
Dim i As Long, o As Long

Set output = [b1]
o = -1

For Each c In Selection
i = 1
Do Until i > Run([REGEX.COUNT], c.Text, "\b\w+d\b")
wrd = Run([regex.mid], c.Text, "\b\w+d\b", i)
If wrd <> "" Then
o = o + 1
output.Offset(o, 0).Value = wrd
End If
i = i + 1
Loop
Next c
End Sub
====================



--ron
 
T

Tim Williams

What if a cell has >1 D word?

Tim

KH_GS said:
I have data that filled the whole worksheet, like 65536 rows of data and
2nd worksheet almost filled up to the max too. Cell contents are words,
some single word and some are phrases.

Example:

Column A
red apple
green apple
green apple with seed
orange
pear

Basically I want to scan EVERY single word in each phrase in each cell
and then copy each word that meets my criteria on a new column. So, say
I want to look for words that end with "d", my output will be:

red
seed


More details, I will not know how many words does each cell contain,
therefore delimiting it might create many columns if for example one of
the cells contain a phrase of 10 words.
 
R

Ron Rosenfeld

I have data that filled the whole worksheet, like 65536 rows of data and
2nd worksheet almost filled up to the max too. Cell contents are words,
some single word and some are phrases.

Example:

Column A
red apple
green apple
green apple with seed
orange
pear

Basically I want to scan EVERY single word in each phrase in each cell
and then copy each word that meets my criteria on a new column. So, say
I want to look for words that end with "d", my output will be:

red
seed


More details, I will not know how many words does each cell contain,
therefore delimiting it might create many columns if for example one of
the cells contain a phrase of 10 words.


You can use the VBA Regular expressions module, but I have loaded and installed
Longre's free morefunc.xll add-in which is simpler for me to implement. It can
be distributed with any workbook, so you don't have to rely on users to install
it separately.

You can download it from http://xcell05.free.fr

If you don't install the addin (Tools/Addins) then you'll have to register it
to use it in VBA. See HELP for morefunc for instructions.

A VBA routine like the following will do what you describe. Should give you
some ideas to get started:

========================
Option Explicit

Sub EndWithD()
Dim c As Range
Dim output As Range
Dim wrd As String
Dim i As Long, o As Long

Set output = [b1]
o = -1

For Each c In Selection
i = 1
Do Until i > Run([REGEX.COUNT], c.Text, "\b\w+d\b")
wrd = Run([regex.mid], c.Text, "\b\w+d\b", i)
If wrd <> "" Then
o = o + 1
output.Offset(o, 0).Value = wrd
End If
i = i + 1
Loop
Next c
End Sub
====================



--ron

Hmmm, the IF...Then is superfluous. It was there in a preliminary version, but
not required, so should be:

==============================
Option Explicit

Sub EndWithD()
Dim c As Range
Dim output As Range
Dim wrd As String
Dim i As Long, o As Long

Set output = [b1]
o = -1

For Each c In Selection
i = 1
Do Until i > Run([REGEX.COUNT], c.Text, "\b\w+d\b")
wrd = Run([regex.mid], c.Text, "\b\w+d\b", i)
o = o + 1
output.Offset(o, 0).Value = wrd
i = i + 1
Loop
Next c
End Sub

========================

--ron
 
R

Ron Rosenfeld

What if a cell has >1 D word?

Tim

No problem.

That's the reason for the REGEX.COUNT function.

Note that the 'i' argument in the REGEX.MID function is for the instance of the
occurrence.

Note also that the If...Then in the middle is superfluous. It was present in
an earlier version, but checking the number of words with the COUNT function
eliminated the requirement to test the output; as the entire cell will be
skipped if there is no D word.
--ron
 
R

Ron Rosenfeld

I have data that filled the whole worksheet, like 65536 rows of data and
2nd worksheet almost filled up to the max too. Cell contents are words,
some single word and some are phrases.

Example:

Column A
red apple
green apple
green apple with seed
orange
pear

Basically I want to scan EVERY single word in each phrase in each cell
and then copy each word that meets my criteria on a new column. So, say
I want to look for words that end with "d", my output will be:

red
seed


More details, I will not know how many words does each cell contain,
therefore delimiting it might create many columns if for example one of
the cells contain a phrase of 10 words.


You can use the VBA Regular expressions module, but I have loaded and installed
Longre's free morefunc.xll add-in which is simpler for me to implement. It can
be distributed with any workbook, so you don't have to rely on users to install
it separately.

You can download it from http://xcell05.free.fr

If you don't install the addin (Tools/Addins) then you'll have to register it
to use it in VBA. See HELP for morefunc for instructions.

A VBA routine like the following will do what you describe. Should give you
some ideas to get started:

========================
Option Explicit

Sub EndWithD()
Dim c As Range
Dim output As Range
Dim wrd As String
Dim i As Long, o As Long

Set output = [b1]
o = -1

For Each c In Selection
i = 1
Do Until i > Run([REGEX.COUNT], c.Text, "\b\w+d\b")
wrd = Run([regex.mid], c.Text, "\b\w+d\b", i)
If wrd <> "" Then
o = o + 1
output.Offset(o, 0).Value = wrd
End If
i = i + 1
Loop
Next c
End Sub
====================



--ron

Hmmm, the IF...Then is superfluous. It was there in a preliminary version, but
not required, so should be:

==============================
Option Explicit

Sub EndWithD()
Dim c As Range
Dim output As Range
Dim wrd As String
Dim i As Long, o As Long

Set output = [b1]
o = -1

For Each c In Selection
i = 1
Do Until i > Run([REGEX.COUNT], c.Text, "\b\w+d\b")
wrd = Run([regex.mid], c.Text, "\b\w+d\b", i)
o = o + 1
output.Offset(o, 0).Value = wrd
i = i + 1
Loop
Next c
End Sub

========================

One other warning: the routine above is case sensitive. If you want it to be
case insensitive, that's a minor change.
--ron
 
T

Tim Williams

You will have to adjust this to however you want to handle multiple matches
in one piece of text.
Note if you adjust the regex pattern you need to recompile since the object
is static.

Performance is OK: ~2 sec for 20k random strings in my testing.
If you turn off screenupdating and calculation it might help a bit.


Tim.


'******************************************
Sub Extract()
Dim matches
Dim i As Integer
Dim rngText As Range

Set rngText = ActiveSheet.Range("A1")

Do While rngText.Value <> ""

Set matches = GetDwords(rngText.Value)
If Not matches Is Nothing Then
For i = 0 To matches.Count - 1
rngText.Offset(0, i + 1).Value = matches(i)
Next i
End If

Set rngText = rngText.Offset(1, 0)
Loop

End Sub


Function GetDwords(val) As Object

Static regEx As Object
Dim m
Dim i As Integer

If regEx Is Nothing Then
Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "\b(\w+d)\b"
regEx.Global = True
regEx.IgnoreCase = True
End If

Set m = regEx.Execute(val)
If m.Count > 0 Then
Set GetDwords = m
Else
Set GetDwords = Nothing
End If

End Function
'**************************************
 
T

Tim Williams

Sorry Ron - my question was for the OP. It was unclear how this was to be
handled.
I was a little behind you with my regex post: yours is much more concise...

Cheers
Tim
 
N

NickHK

KH_GS,
Private Sub CommandButton1_Click()
Dim Cell As Range
Dim Words As Variant
Dim i As Long

For Each Cell In Range(Range("A1"), Range("A1").End(xlDown))
Words = Split(Cell.Value, " ")
For i = 0 To UBound(Words)
If Right(Words(i), 1) = "d" Then
Debug.Print Words(i)
'Or do what you want with the word
End If
Next
Next
End Sub

NickHK
 
K

KH_GS

Hi NickHK

Thanks for the input! That about catches the whole thing! ;)

I just added an input value line to list out the words:

ActiveSheet.Cells(x, y).Value = Words(i)
x = x + 1


Why do u make it private though?
 
N

NickHK

KH_GS,
Private: The stub that Excel generates for a command button.
But probably better to make it a public function, passing in the range to
search, letter to find and an array to fill with matching words, returning
the number of words found. Then dump the array to the desired location.

NickHK
 
K

KH_GS

hi NickHK

I tried changing the if statement to:

If Left(Words(i), 3) = Left(Words(i + 1), 3) Then

Theres a runtime error 9, out of range.
I want to try comparing cell content instead of matching with my
specified word.


Also, how do I change this line

For Each Cell In Range(Range("A2"), Range("A2").End(xlDown))

to something more dynamic, where the range begins at the cell i click
on before running the macro.



Code:
--------------------
Private Sub CommandButton1_Click()
Dim Cell As Range
Dim Words As Variant
Dim i As Long

For Each Cell In Range(Range("A1"), Range("A1").End(xlDown))
Words = Split(Cell.Value, " ")
For i = 0 To UBound(Words)
If Right(Words(i), 1) = "d" Then
Debug.Print Words(i)
'Or do what you want with the word
End If
Next
Next
End Sub
 
N

NickHK

KH_GS,
That's because you are trying to access an element of the array beyond its
UBound, which obviously does not exist.
i.e. If i refers to the last element in the array, then i+1 will beyond the
array's bound.
So you would need some check to make sure i<UBound(Words()), then i+1 cannot
be >UBound(Words()).

Assuming you want it from the selected cell to the end of the list:
For Each Cell In Range(Selection, Selection.End(xlDown))

NickHK
 
T

Tim Williams

KH_GS

To satisfy our (my) curiosity, can you let us know why you're doing
this?
Seems like a v. large amount of data you are analyzing...

Thanks
Tim
 
K

KH_GS

Part of a project, this stage is something about catching word i
different tenses.

Any idea to go about it?
 
T

Tim Williams

Not my field at all, but wouldn't you be looking for words ending in "ed"
rather than just "d" ? Still, neither would distinguish "go/went",
"come/came" and so on. You might want to implement some kind of lookup for
those kinds of cases.

Also, if you're going to be processing that much data you're probably better
off just reading directly from a text file rather than cramming it all into
Excel.
 
N

NickHK

KH_GS,
If this is more to do with grammar than specific words/letters, I suspect
there are better ways.
An easy way would automate Word's spelling/grammar checker.
Or there are a lot of grammar components out there.

NickHK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top