Wildcards

Stephen Glynn · May 12, 2004

I've come across an invaluable article by Dave Rado on the MVP Word site
on "I have a “Name” column which I want to split into “FirstName”,
“LastName” – how can I do it?"

http://word.mvps.org/FAQs/TblsFldsFms/SplitFirstNameLastName.htm

It's too long to summarise here but it's a very handy way of doing this
by converting the column to text, using Find & Replace and some
wildcards and special characters rather than code, and then converting
it back into a table.

Anyway, at one point you need to allow for the possibility that there
may be some middle names or initials in the Name field. To deal with
this you

"select the “Use wildcards” check box in the Find and Replace dialog, and:

In the Find what box type: (<*) ([! ]@)^13
In the Replace with box type: \1^t\2^p
Click Replace All"

This certainly works. It separates out the last word (surname) with a
tab and leaves everything to the left of it separated by spaces.

Since I'm afraid I don't understand wildcards too well I've got no idea
how it works!

Can anyone please enlighten me?

TIA

Steve

Klaus Linke · May 13, 2004

In the Find what box type: (<*) ([! ]@)^13

In the Replace with box type: \1^t\2^p

Hi Stephen,

It's good to have some example text visible:

Stephen Jay Gould¶
Charles Darwin¶
William D. Hamilton¶
J. B. S. Haldane¶

We want to replace the blank between the given name(s) and the family name
with a tab.
So we look for a pattern that matches each paragraph in turn, and separates
-- the given name(s)
-- the blank between that and the last name
-- the last name and the paragraph mark.

We have to have some rule for either the given name(s) or the family name
that allows us to separate them.

Now one pretty safe assumption about the family name is that it doesn't
contain blanks (spaces).
So if we match the last word in a paragraph, that should be the family
name.

Using wildcards, we look for a space, followed by several non-space
characters, followed by the paragraph mark ^13:

Find what: ^32[!^32]@^13
(where I used ^32 for a space so it's easier to see).

The paragraph mark ^13 is the last matched character up to now, and that is
the way we need it.
Once a name has been matched (and possibly replaced by something), the
search will continue at the start of the next paragraph.

For the given name(s) we can just use the wildcard for "any text", *, since
we know the match will continue at the start of the paragraph, and it
doesn't really matter what appears there until the "space/family name/para
mark" in that paragraph is matched:
Find what: *^32[!^32]@^13

We want to get rid of the space and replace it with a tab. And we want to
re-use everything else.
So we put everything but the space into two brackets:
Find what: (*)^32([!^32]@^13)
Replace with: \1^t\2

(*) will match the given names, and is re-inserted with \1.
Then we insert ^t, then the (family name + para mark) bracket, \2.
This looks slightly different from what's in the FAQ article, but the
differences don't matter. There are usually lots of ways to achieve the
same result with wildcard matches ...and quite a few ways in which things
can go wrong; you often have to experiment around a bit until you get it
right. But if you *like* to solve puzzles, wildcard searches are definitely
fun.

Regards,
Klaus

Stephen Glynn · May 13, 2004

Klaus said:
In the Find what box type: (<*) ([! ]@)^13
In the Replace with box type: \1^t\2^p

Click to expand...

Hi Stephen,

It's good to have some example text visible:

Stephen Jay Gould¶
Charles Darwin¶
William D. Hamilton¶
J. B. S. Haldane¶

We want to replace the blank between the given name(s) and the family name
with a tab.
So we look for a pattern that matches each paragraph in turn, and separates
-- the given name(s)
-- the blank between that and the last name
-- the last name and the paragraph mark.

We have to have some rule for either the given name(s) or the family name
that allows us to separate them.

Now one pretty safe assumption about the family name is that it doesn't
contain blanks (spaces).
So if we match the last word in a paragraph, that should be the family
name.

Using wildcards, we look for a space, followed by several non-space
characters, followed by the paragraph mark ^13:

Find what: ^32[!^32]@^13
(where I used ^32 for a space so it's easier to see).

The paragraph mark ^13 is the last matched character up to now, and that is
the way we need it.
Once a name has been matched (and possibly replaced by something), the
search will continue at the start of the next paragraph.

For the given name(s) we can just use the wildcard for "any text", *, since
we know the match will continue at the start of the paragraph, and it
doesn't really matter what appears there until the "space/family name/para
mark" in that paragraph is matched:
Find what: *^32[!^32]@^13

We want to get rid of the space and replace it with a tab. And we want to
re-use everything else.
So we put everything but the space into two brackets:
Find what: (*)^32([!^32]@^13)
Replace with: \1^t\2

(*) will match the given names, and is re-inserted with \1.
Then we insert ^t, then the (family name + para mark) bracket, \2.
This looks slightly different from what's in the FAQ article, but the
differences don't matter. There are usually lots of ways to achieve the
same result with wildcard matches ...and quite a few ways in which things
can go wrong; you often have to experiment around a bit until you get it
right. But if you *like* to solve puzzles, wildcard searches are definitely
fun.

Regards,
Klaus

Dear Klaus

Many thanks for the helpful explanation.

One "Name" field we've never got right where I work is that for a rather
illustrious customer, "HRH Prince Philip, Duke of Edinburgh". He gets
his newsletters and catalogues from us hand-addressed rather than
generated by mailmerge!

Steve

Suzanne S. Barnhill · May 13, 2004

This should work fine unless you have surnames such as Du Pont. In the U.S.,
names with prefixes such as de, du, van, and von are often spelled solid,
but even when they aren't, alphabetization is based on the full name, not
just the last part. Same with double-barreled (but unhyphenated) surnames
such as Lloyd Webber and Vaughan Williams.

--
Suzanne S. Barnhill
Microsoft MVP (Word)
Words into Type
Fairhope, Alabama USA

Email cannot be acknowledged; please post all follow-ups to the newsgroup so
all may benefit.

Klaus Linke said:
In the Find what box type: (<*) ([! ]@)^13
In the Replace with box type: \1^t\2^p

Click to expand...

Hi Stephen,

It's good to have some example text visible:

Stephen Jay Gould¶
Charles Darwin¶
William D. Hamilton¶
J. B. S. Haldane¶

We want to replace the blank between the given name(s) and the family name
with a tab.
So we look for a pattern that matches each paragraph in turn, and separates
-- the given name(s)
-- the blank between that and the last name
-- the last name and the paragraph mark.

We have to have some rule for either the given name(s) or the family name
that allows us to separate them.

Now one pretty safe assumption about the family name is that it doesn't
contain blanks (spaces).
So if we match the last word in a paragraph, that should be the family
name.

Using wildcards, we look for a space, followed by several non-space
characters, followed by the paragraph mark ^13:

Find what: ^32[!^32]@^13
(where I used ^32 for a space so it's easier to see).

The paragraph mark ^13 is the last matched character up to now, and that is
the way we need it.
Once a name has been matched (and possibly replaced by something), the
search will continue at the start of the next paragraph.

For the given name(s) we can just use the wildcard for "any text", *, since
we know the match will continue at the start of the paragraph, and it
doesn't really matter what appears there until the "space/family name/para
mark" in that paragraph is matched:
Find what: *^32[!^32]@^13

We want to get rid of the space and replace it with a tab. And we want to
re-use everything else.
So we put everything but the space into two brackets:
Find what: (*)^32([!^32]@^13)
Replace with: \1^t\2

(*) will match the given names, and is re-inserted with \1.
Then we insert ^t, then the (family name + para mark) bracket, \2.
This looks slightly different from what's in the FAQ article, but the
differences don't matter. There are usually lots of ways to achieve the
same result with wildcard matches ...and quite a few ways in which things
can go wrong; you often have to experiment around a bit until you get it
right. But if you *like* to solve puzzles, wildcard searches are definitely
fun.

Regards,
Klaus

Stephen Glynn · May 13, 2004

Suzanne said:
This should work fine unless you have surnames such as Du Pont. In the U.S.,
names with prefixes such as de, du, van, and von are often spelled solid,
but even when they aren't, alphabetization is based on the full name, not
just the last part. Same with double-barreled (but unhyphenated) surnames
such as Lloyd Webber and Vaughan Williams.

Thanks for the clarification. I do most of my work with MS Access,
where this can be a real pain when you're trying to turn legacy
databases into something usable and I'm always looking for ways to
simplify the problem of decomposing name fields into something usable.

I don't think there's ever going to be a foolproof solution that doesn't
involve manual checking and editing at some point, since quite apart
from the problem of prefixes you've got to worry about suffixes (Junior,
II, Ph.D) and as soon as you've worked that one out you run into
Major-General Smith, DSO, and his friend The Very Reverend Bishop of
Somewhere.

Steve

Wildcards

Stephen Glynn

Klaus Linke

Stephen Glynn

Suzanne S. Barnhill

Stephen Glynn