It seems like this ought to be easy, but I'm stumped.
I received several thousand address records in the following format--all in
one column of an Excel worksheet:
Smith, John & Mary 123 Main St Chicago, IL 60601
Williams, Bill & Cindy 45 54th Drive Madison, WI 51375
I need to parse these into:
Last Name
First Name(s)
Street Address
City
State
Zip
I can get the last name using =LEFT(B2, FIND(",",B2)-1) and the Zip using
=RIGHT(B2,5) but I can't come up with a way to get the other fields. A quick
scan of the records shows all the street addresses seem to start with a
number, and there are no numbers in any of the first names, so it seems like
if I can find the first number reading from the left, I could grab the first
names, but neither FIND nor SEARCH seem to work. I'm using Excel 2007.
Could anyone point me to a possible approach. Seems as if this is something
others must have solved many times before.
Many thanks,
Tom K
If your layout is always in the same format, and in particular if there are
only single first names, then this can be done using Regular Expressions.
The main problem is to differentiate the first name(s) from the beginning of
the address. The rule that I used in the example below is this pattern:
<comma><space>
<word>
then, optionally:
<space>
ampersand (&)
<space>
<word>
But this will fail if you have a "double" first name. For example:
Smith, Mary Jane 47 Edgecomb rd. ...
If that is the case, we would need to come up with a different rule. But try
this, anyway, and see if it works.
If your addresses always start with a number, we could incorporate that as a
rule and get around the issue of double first names.
Also, you need to list your cities by name. In the Pattern I devised, I listed
Chicago and Madison. You can add more possibilities by extending the
pipe-delimited list in Pattern.
First, enter the UDF into your workbook:
<alt-F11> opens the VB Editor
Ensure your project is highlighted in the Project Explorer window, then
Insert/Module and paste the code below into the window that opens.
The following formulas should work:
Last Name: =remid(A1,"^[^,]*")
First Name(s): =remid(A1,Pattern,1)
Street Address: =remid(A1,Pattern,3)
City: =remid(A1,Pattern,4)
State: =remid(A1,Pattern,5)
Zip: =remid(A1,Pattern,6)
Pattern:
,\s+(\w+(\s+&\s+\w+)?)\s+(.*)\s+(Chicago|Madison),\s+([A-Z]{2})\s+(\d{5}(-\d{4})?)
=============================================
Option Explicit
Function reMid(str As String, spattern As String, Optional Index) As String
Dim re As Object
Dim mc As Object
Set re = CreateObject("vbscript.regexp")
re.Global = True
re.ignorecase = True
re.Pattern = spattern
If re.test(str) = True Then
Set mc = re.Execute(str)
If IsMissing(Index) Then
reMid = mc(0)
Else
reMid = mc(0).submatches(Index - 1)
End If
End If
End Function
=============================================
This should handle any entries that are in the pattern you presented. Examples
that don't work would be useful in trying to refine this routine.
--ron