extracting info from an address block

B

Bruce

I recieve info from clients in outlook.
The info is client addresses and phone.
I would like to build a little function that when I drop the information
into an excel worksheet, that it would process the information to extract
the
NAME ADDRESS1 ADDRESS2 CITY ST ZIP PHONE PHONE2


the info may come like
name
address1
city, st zip

name
addr1
add2
city, st zip

It my have 0-2 phones on the line after the city, st zip

I am not sure if it would be best to paste into the formula bar (and all
take 1 cell), or to paste into a cell, and then it would be 3-7 cells.
Any idea how to handle this?
How could I tell if there is 1 or 2 address lines?

Would it be best to just do something like:
NAME = A1
ADDRESS =A2
ADDRESS2 =if(search(",",A3)>0,"",A3)
City
=if(search(",",A3)>0,left(A3,search(","A3)),left(A4,search(",",A4)))
St =??
ZIP =??
phone1 =if(search(",",A3)>0,A5,A6) 'how would I format it as a phone?
Phone2 =if(search(",",A3)>0,A7,A8)

Thanks
Bruce
 
R

Ron Rosenfeld

I recieve info from clients in outlook.
The info is client addresses and phone.
I would like to build a little function that when I drop the information
into an excel worksheet, that it would process the information to extract
the
NAME ADDRESS1 ADDRESS2 CITY ST ZIP PHONE PHONE2


the info may come like
name
address1
city, st zip

name
addr1
add2
city, st zip

It my have 0-2 phones on the line after the city, st zip

I am not sure if it would be best to paste into the formula bar (and all
take 1 cell), or to paste into a cell, and then it would be 3-7 cells.
Any idea how to handle this?
How could I tell if there is 1 or 2 address lines?

Would it be best to just do something like:
NAME = A1
ADDRESS =A2
ADDRESS2 =if(search(",",A3)>0,"",A3)
City
=if(search(",",A3)>0,left(A3,search(","A3)),left(A4,search(",",A4)))
St =??
ZIP =??
phone1 =if(search(",",A3)>0,A5,A6) 'how would I format it as a phone?
Phone2 =if(search(",",A3)>0,A7,A8)

Thanks
Bruce

I'm not sure what you mean when you write you "recieve info from clients in
outlook."

However, why not just export the information from Outlook in Excel format?
--ron
 
B

Bruce

Sorry,

These come to me in the body of an email, so they are just straight text.
And they come to me usually as 2 or three address in the body per email.

Thanks
Bruce
 
R

Ron Rosenfeld

I recieve info from clients in outlook.
The info is client addresses and phone.
I would like to build a little function that when I drop the information
into an excel worksheet, that it would process the information to extract
the
NAME ADDRESS1 ADDRESS2 CITY ST ZIP PHONE PHONE2


the info may come like
name
address1
city, st zip

name
addr1
add2
city, st zip

It my have 0-2 phones on the line after the city, st zip

I am not sure if it would be best to paste into the formula bar (and all
take 1 cell), or to paste into a cell, and then it would be 3-7 cells.
Any idea how to handle this?
How could I tell if there is 1 or 2 address lines?

Would it be best to just do something like:
NAME = A1
ADDRESS =A2
ADDRESS2 =if(search(",",A3)>0,"",A3)
City
=if(search(",",A3)>0,left(A3,search(","A3)),left(A4,search(",",A4)))
St =??
ZIP =??
phone1 =if(search(",",A3)>0,A5,A6) 'how would I format it as a phone?
Phone2 =if(search(",",A3)>0,A7,A8)

Thanks
Bruce

This is a complicated problem, in part for the reasons you lay out.

One way to solve this is by using regular expressions. This would depend on
having some rules to define the various fields.

For example, let us paste the data into the function bar, so all the date will
be in one cell (e.g. A2). Then we can parse out the data into the subsequent
columns.

Label:

B1: Name
C1: Address 1
D1: Address 2
E1: City
F1: State
G1: Zip
H1: Phone 1
H2: Phone 2

We have to make some assumptions though, and the accuracy of the parsing will
depend on the accuracy of those assumptions.

I made the following assumptions:

Name: Is the first line
Address 1: Is the second line that contains ONLY letters, numbers or spaces.
Address 2: Is the third line that contains ONLY letters, numbers or spaces.
City: Is the first line that starts with a sequence of characters -- up to but
not including a comma.
State: The sequence of <non-space>'s that follows the combination of
comma<space>
Zip: The series of digits and or dash that follows the State. It needs to be
formatted as 5 digits, optional dash, optional four digits. And it may be
absent.
Phone1: First line that has a sequence of digits, dashes, spaces and
parentheses with a minimum of seven characters.
Phone2: Second line with the above sequence.

Comments:
The phone number definition could be more robust, but that may not be
necessary.

It is fairly critical that the City, state zip line be the only line that
contains a comma.

The easiest way to implement the above is to download and install Longre's free
morefunc.xll add-in from http://xcell05.free.fr/

Then use these formulas:

B2: =REGEX.MID(A2,".*")
C2: =REGEX.MID($A2,"(?m)^[\w ]*$",2)
D2: =REGEX.MID($A2,"(?m)^[\w ]*$",3)
E2: =REGEX.MID(A2,"(?m)^.*(?=,)")
F2: =REGEX.MID(A2,"(?<=,\s)\S+")
G2: =REGEX.SUBSTITUTE(A2,"(?s).*\w+,\s+\w+\s+(\d{5}-?(\d{4})?)?.*","[1]")
H2: =REGEX.MID(A2,"[-()\d ]{7,}")
I2: =REGEX.MID(A2,"[-()\d ]{7,}",2)


If the formulas do not work, then I have probably made some assumptions about
your data that are not so. So post back with sanitized examples of the problem
data.

Let me know if this works for you.
--ron
 
R

Ron Rosenfeld

Sorry,

These come to me in the body of an email, so they are just straight text.
And they come to me usually as 2 or three address in the body per email.

Thanks
Bruce

Two changes already <g> for the phone numbers:

H2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$")
I2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$",2)

I had to ensure that the ONLY characters on that line were digits, dash,
parentheses, or spaces.
--ron
 
B

Bruce

Thank you very much Ron!
Sorry I have not responded before, but I have been out of town and had no
'net access.
Seems to be getting close!
There is a little issue with phone nums, especailly if the address contains
a numerical street name:

Joe Doe
1259 98TH AVE
Anytown, CA 92111
(886) 123 3332

gives me the results: (note the 1259 98 after the zip)
Joe Doe 1259 98TH AVE Anytown CA 92111 1259 98 (886) 123 3332

Now if the address is a non numerical street, it works fine
Joe Doe
1259 Main AVE
Anytown, CA 92111
(886) 123 3332

Does this look like it will work properly? (it seems to, but I am not as
familiar with this addin as you are!)
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}")
and
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}",2)

Also, I am getting problems after I added this addin. What seems to happen
if I edit this so that I have the formula wizard (what you get when you
select the funtion from the function dropdown), I get an error that excel
has stopped responding and then it restarts. I have Excel 2007.


Again, thanks for your assistance!
Bruce




Ron Rosenfeld said:
I recieve info from clients in outlook.
The info is client addresses and phone.
I would like to build a little function that when I drop the information
into an excel worksheet, that it would process the information to extract
the
NAME ADDRESS1 ADDRESS2 CITY ST ZIP PHONE PHONE2


the info may come like
name
address1
city, st zip

name
addr1
add2
city, st zip

It my have 0-2 phones on the line after the city, st zip

I am not sure if it would be best to paste into the formula bar (and all
take 1 cell), or to paste into a cell, and then it would be 3-7 cells.
Any idea how to handle this?
How could I tell if there is 1 or 2 address lines?

Would it be best to just do something like:
NAME = A1
ADDRESS =A2
ADDRESS2 =if(search(",",A3)>0,"",A3)
City
=if(search(",",A3)>0,left(A3,search(","A3)),left(A4,search(",",A4)))
St =??
ZIP =??
phone1 =if(search(",",A3)>0,A5,A6) 'how would I format it as a phone?
Phone2 =if(search(",",A3)>0,A7,A8)

Thanks
Bruce

This is a complicated problem, in part for the reasons you lay out.

One way to solve this is by using regular expressions. This would depend
on
having some rules to define the various fields.

For example, let us paste the data into the function bar, so all the date
will
be in one cell (e.g. A2). Then we can parse out the data into the
subsequent
columns.

Label:

B1: Name
C1: Address 1
D1: Address 2
E1: City
F1: State
G1: Zip
H1: Phone 1
H2: Phone 2

We have to make some assumptions though, and the accuracy of the parsing
will
depend on the accuracy of those assumptions.

I made the following assumptions:

Name: Is the first line
Address 1: Is the second line that contains ONLY letters, numbers or
spaces.
Address 2: Is the third line that contains ONLY letters, numbers or
spaces.
City: Is the first line that starts with a sequence of characters -- up
to but
not including a comma.
State: The sequence of <non-space>'s that follows the combination of
comma<space>
Zip: The series of digits and or dash that follows the State. It needs
to be
formatted as 5 digits, optional dash, optional four digits. And it may be
absent.
Phone1: First line that has a sequence of digits, dashes, spaces and
parentheses with a minimum of seven characters.
Phone2: Second line with the above sequence.

Comments:
The phone number definition could be more robust, but that may not be
necessary.

It is fairly critical that the City, state zip line be the only line that
contains a comma.

The easiest way to implement the above is to download and install Longre's
free
morefunc.xll add-in from http://xcell05.free.fr/

Then use these formulas:

B2: =REGEX.MID(A2,".*")
C2: =REGEX.MID($A2,"(?m)^[\w ]*$",2)
D2: =REGEX.MID($A2,"(?m)^[\w ]*$",3)
E2: =REGEX.MID(A2,"(?m)^.*(?=,)")
F2: =REGEX.MID(A2,"(?<=,\s)\S+")
G2: =REGEX.SUBSTITUTE(A2,"(?s).*\w+,\s+\w+\s+(\d{5}-?(\d{4})?)?.*","[1]")
H2: =REGEX.MID(A2,"[-()\d ]{7,}")
I2: =REGEX.MID(A2,"[-()\d ]{7,}",2)


If the formulas do not work, then I have probably made some assumptions
about
your data that are not so. So post back with sanitized examples of the
problem
data.

Let me know if this works for you.
--ron
 
R

Ron Rosenfeld

Thank you very much Ron!
Sorry I have not responded before, but I have been out of town and had no
'net access.
Seems to be getting close!
There is a little issue with phone nums, especailly if the address contains
a numerical street name:

Joe Doe
1259 98TH AVE
Anytown, CA 92111
(886) 123 3332

gives me the results: (note the 1259 98 after the zip)
Joe Doe 1259 98TH AVE Anytown CA 92111 1259 98 (886) 123 3332

Now if the address is a non numerical street, it works fine
Joe Doe
1259 Main AVE
Anytown, CA 92111
(886) 123 3332

Does this look like it will work properly? (it seems to, but I am not as
familiar with this addin as you are!)
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}")
and
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}",2)

Also, I am getting problems after I added this addin. What seems to happen
if I edit this so that I have the formula wizard (what you get when you
select the funtion from the function dropdown), I get an error that excel
has stopped responding and then it restarts. I have Excel 2007.

Bruce,

It looks as if, perhaps because my posts threaded oddly, that you are using the
original and not the revised version of the phone number extraction functions.
It should be --

H2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$")
I2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$",2)

Those should work.

So far as I know, the add-in has not been tested with Excel 2007, and I have
Excel 2003, so cannot check it. So I suppose you might be seeing an
incompatibility.

Similar functions to those that Longre uses can be written in VBA, if that is
an issue.

But first, see if the revised phone number functions work properly on your
data. Then we can see about translating to VBA.




--ron
 
R

Ron Rosenfeld

Thank you very much Ron!
Sorry I have not responded before, but I have been out of town and had no
'net access.
Seems to be getting close!
There is a little issue with phone nums, especailly if the address contains
a numerical street name:

Joe Doe
1259 98TH AVE
Anytown, CA 92111
(886) 123 3332

gives me the results: (note the 1259 98 after the zip)
Joe Doe 1259 98TH AVE Anytown CA 92111 1259 98 (886) 123 3332

Now if the address is a non numerical street, it works fine
Joe Doe
1259 Main AVE
Anytown, CA 92111
(886) 123 3332

Does this look like it will work properly? (it seems to, but I am not as
familiar with this addin as you are!)
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}")
and
=REGEX.MID(RIGHT(B2,LEN(B2) - SEARCH(I3,B2)),"[-()\d ]{7,}",2)

Also, I am getting problems after I added this addin. What seems to happen
if I edit this so that I have the formula wizard (what you get when you
select the funtion from the function dropdown), I get an error that excel
has stopped responding and then it restarts. I have Excel 2007.

Bruce,

It looks as if, perhaps because my posts threaded oddly, that you are using the
original and not the revised version of the phone number extraction functions.
It should be --

H2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$")
I2: =REGEX.MID(A2,"(?m)^[-()\d ]{7,}$",2)

Those should work.

So far as I know, the add-in has not been tested with Excel 2007, and I have
Excel 2003, so cannot check it. So I suppose you might be seeing an
incompatibility.

Similar functions to those that Longre uses can be written in VBA, if that is
an issue.

But first, see if the revised phone number functions work properly on your
data. Then we can see about translating to VBA.




--ron


Here's a VBA variant. It will parse the contents of "Selection" into the cells
to the right:

=============================================
Option Explicit

Sub ParseAdrBlock()
Dim c As Range

Const pName As String = ".*"
Const pAdr1 As String = "^[\w ]*[A-Za-z]+[\w ]*$" 'Set Multiline = True; Index
2
Const pAdr2 As String = "^[\w ]*[A-Za-z]+[\w ]*$" 'Set Multiline = True; Index
3
Const pCity As String = "^.*(?=,)" 'Set Multiline = True
Const pState As String = "[\s\S]+,\s(\S+)[\s\S]+" 'Mulitline True; Return $1;
resub
Const pZip As String = "[\s\S]+,\s(\S+)[\s\S]+" 'multiline True; resub; Return
$1
Const pPhone1 As String = "^[-()\d ]{7,}$" 'Multiline True
Const pPhone2 As String = "^[-()\d ]{7,}$" 'Multiline True; Index 2

For Each c In Selection
c.Range("B1", "I1").ClearContents
c.Offset(0, 1).Value = REMid(c.Text, pName)
c.Offset(0, 2).Value = REMid(c.Text, pAdr1, 2, , True)
c.Offset(0, 3).Value = REMid(c.Text, pAdr2, 3, , True)
c.Offset(0, 4).Value = REMid(c.Text, pCity, , , True)
c.Offset(0, 5).Value = RESub(c.Text, pState, "$1")
c.Offset(0, 6).Value = RESub(c.Text, pZip, "$1")
c.Offset(0, 7).Value = REMid(c.Text, pPhone1, , , True)
c.Offset(0, 8).Value = REMid(c.Text, pPhone2, 2, , True)
Next c
End Sub

Function RESub(str As String, SrchFor As String, ReplWith As String) As String
Dim objRegExp As RegExp

Set objRegExp = New RegExp
objRegExp.Pattern = SrchFor
objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.MultiLine = True

RESub = objRegExp.Replace(str, ReplWith)

End Function
Function REMid(str As String, Pattern As String, _
Optional Index As Variant = 1, _
Optional CaseSensitive As Boolean = True, _
Optional MultiLin As Boolean = False) _
As Variant 'Variant as value may be string or array


Dim objRegExp As RegExp
Dim objMatch As Match
Dim colMatches As MatchCollection

Dim i As Long 'counter
Dim t() As String 'container for array results

' Create a regular expression object.
Set objRegExp = New RegExp

'Set the pattern by using the Pattern property.
objRegExp.Pattern = Pattern

' Set Case Insensitivity.
objRegExp.IgnoreCase = Not CaseSensitive

'Set global applicability.
objRegExp.Global = True

'Set multiline
objRegExp.MultiLine = MultiLin

'Test whether the String can be compared.
If (objRegExp.Test(str) = True) Then

'Get the matches.
Set colMatches = objRegExp.Execute(str) ' Execute search.

On Error Resume Next 'return null string if a colmatch index is non-existent
If IsArray(Index) Then
ReDim t(1 To UBound(Index))
For i = 1 To UBound(Index)
t(i) = colMatches(Index(i) - 1)
Next i
REMid = t()
Else
REMid = CStr(colMatches(Index - 1))
If IsEmpty(REMid) Then REMid = ""
End If
On Error GoTo 0 'reset error handler
Else
REMid = ""
End If
End Function

===================================
--ron
 
R

Ron Rosenfeld

Here's a VBA variant. It will parse the contents of "Selection" into the cells
to the right:

I negelected to mention that you will need to set a reference in VBA

Select, from the main menu bar in the VB Editor:

Tools/References

Then SELECT Microsoft VBScript Regular Expressions 5.5


--ron
 
B

Bruce

Thanks!
It seems to work well! I have not tested it to much yet.

Thank you for all of your help!
Bruce
 
R

Ron Rosenfeld

Thanks!
It seems to work well! I have not tested it to much yet.

Thank you for all of your help!
Bruce

Let me know if there are any problems.

I noted there were some line wrap issues in how it came through, but if you've
been using it, I presume you got it worked out.

The differences in the regex's between the Morefunc and the VBA solutions is
due to small differences in how these different programs handle regular
expressions. Especially with regard to the mulitiline command.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top