Spliting rather mixed up address

J

Jeffery B Paarsa

Hello,

On an spread sheet on a single column I receive rather mixed up address that
I need to separate the street address into one column and city in another and
state into 3rd column and off course zip on 4th column... Here is some off
the address I receive:
1295 GAUGUIN CIR ALISOVIEJO CA 926563875
93316 S ARAPAHO DR SANTA ANA CA 927042403
9345 GAUGUIN Cm ALISOVIEJO CA 926563875
2344268 ORANGE AVE APT 3 LAKE FOREST CA 926304886
361449 S EL C AMINO REAL APT 4 SAN CLEMENTE CA 926723451
235521 EL REPOSA ALISOVIEJO CA 926561109
106556 CAIXE DEL CERRO UNIT 1414 SAN CLEMENTE CA 926726075
204475 S BIRCH ST SANTA ANA CA 927072703
125542 HALLADAY ST SANTA ANA CA 927071407
103319 W MYRTLE ST SANTA ANA CA 927033911
3342 ENTERPRISE APT 4106 ALISOVIEJO CA 926567091
6100EDINGER AVE APT 326 HUNTJNGTNBCH CA 926473266
As you may have noticed not only there are a lot of errors on the spelling
but also city names sometimes come in two words i.e. Santa Ana or ALISOVIEJO
which should be "Aliso Viejo" and as you see sometimes "6100 Edinger" comes
like "6100EDINGER"

I search for "Split Address" on the Forum and I did not see anything that
can tackle such a complicated situation... Splitting each Word into a
separate column does not work because sometimes city names are two word as I
explained above.

I know I need to do some kind of hand cleaning before/after I run any
special macro that can handle such a task but at this time I am kind of
baffled how to this huge task...

I came up with idea that if I scan the address column and take the last
piece of character with a leading and trailing space i.e. 926473266 and stick
it into Zip column and prior leading/trailing space as State and stick it
into State Column then I have one more to strip of the Address strings and
that would be City name which unfortunately because I have miss spelling and
sometimes two word as city name I have problem…

For city name sometimes I have to pick up two words prior to State and Zip
sometimes only one word and sometime I have spelling problem…

On line one of the sample address above for example 927042403 can be picked
up and stick into Zip code column, CA can be picked up and stick into State
Column but there is no city name by the name of ANA in California so I have
to pick up Word before ANA too which is SANTA ANA and stick it into city name
column and for the last example address line HUNTJNGTNBCH has spelling
problem along with two word city name and the correct name is Huntington
Beach or Bch….

A macro that can look up the zip code and give me a spelling suggestion or
correct city name would be awesome.

Honestly programming such a complicated macro is way above my knowledge of
VBA programming strengths. I thought if I posted it somebody may give me
some suggestion or there is an already Add-in macro exist etc to solve such a
problem…

Regards
 
G

Gary''s Student

There is more than one way to skin a cat!

i suggest you get a ZIP code database. They are available on the Web.
Since your records include the ZIP code, you can lookup the state/city
information from the database rather than try to parse the string.

You will stil have to parse the front-end of the string to get street
information, but this simplifies the task for you.
 
E

eliano

Any code or macro already avilable anywhere to download or purchase?

Hi Jeff,
I believe that you problem have not an easy solution; however try:

First split your data using Text in Column
Then try to construct your DB with this macro:

Option Explicit

Private Sub Worksheet_BeforeDoubleClick(ByVal _
Target As Range, Cancel As Boolean)
Cancel = True
Dim C As Long, R As Long, X As Long
C = Target.Column
R = Target.Row
C = C - 1
For X = C To 1 Step -1
If Cells(R, X).Value = "" Then
Else
Target.Value = Cells(R, X).Value & " " & Target.Value
Cells(R, X).Value = ""
Exit Sub
End If
Next
End Sub

I try to explain the routine also if my english is very poor.:))
By your sample, text in column will arrange your data in the first 12
columns,
that is from column A to L.
Effect the doubleclic on the column N and that column will contain
the
number I supose to be the Zip Code. (Old zip code is blanked)
Doubleclic on column M = State (old state is blanked)
Doubleclic on column L = City (old city is blanked)
If the city is written as SANTA ANA the doubleclic on
the same column L, that now contain ANA = SANTA ANA
an so on,
Continue up to column B if necessary.

Regards
Eliano
 
E

eliano

Hi Jeff.
I forget: for the error in writing, here in Italy we have not the
"crystall ball"
and you had to provide manually.
Eliano
 
R

Ron Rosenfeld

Hello,

On an spread sheet on a single column I receive rather mixed up address that
I need to separate the street address into one column and city in another and
state into 3rd column and off course zip on 4th column... Here is some off
the address I receive:
1295 GAUGUIN CIR ALISOVIEJO CA 926563875
93316 S ARAPAHO DR SANTA ANA CA 927042403
9345 GAUGUIN Cm ALISOVIEJO CA 926563875
2344268 ORANGE AVE APT 3 LAKE FOREST CA 926304886
361449 S EL C AMINO REAL APT 4 SAN CLEMENTE CA 926723451
235521 EL REPOSA ALISOVIEJO CA 926561109
106556 CAIXE DEL CERRO UNIT 1414 SAN CLEMENTE CA 926726075
204475 S BIRCH ST SANTA ANA CA 927072703
125542 HALLADAY ST SANTA ANA CA 927071407
103319 W MYRTLE ST SANTA ANA CA 927033911
3342 ENTERPRISE APT 4106 ALISOVIEJO CA 926567091
6100EDINGER AVE APT 326 HUNTJNGTNBCH CA 926473266
As you may have noticed not only there are a lot of errors on the spelling
but also city names sometimes come in two words i.e. Santa Ana or ALISOVIEJO
which should be "Aliso Viejo" and as you see sometimes "6100 Edinger" comes
like "6100EDINGER"

I search for "Split Address" on the Forum and I did not see anything that
can tackle such a complicated situation... Splitting each Word into a
separate column does not work because sometimes city names are two word as I
explained above.

I know I need to do some kind of hand cleaning before/after I run any
special macro that can handle such a task but at this time I am kind of
baffled how to this huge task...

I came up with idea that if I scan the address column and take the last
piece of character with a leading and trailing space i.e. 926473266 and stick
it into Zip column and prior leading/trailing space as State and stick it
into State Column then I have one more to strip of the Address strings and
that would be City name which unfortunately because I have miss spelling and
sometimes two word as city name I have problem…

For city name sometimes I have to pick up two words prior to State and Zip
sometimes only one word and sometime I have spelling problem…

On line one of the sample address above for example 927042403 can be picked
up and stick into Zip code column, CA can be picked up and stick into State
Column but there is no city name by the name of ANA in California so I have
to pick up Word before ANA too which is SANTA ANA and stick it into city name
column and for the last example address line HUNTJNGTNBCH has spelling
problem along with two word city name and the correct name is Huntington
Beach or Bch….

A macro that can look up the zip code and give me a spelling suggestion or
correct city name would be awesome.

Honestly programming such a complicated macro is way above my knowledge of
VBA programming strengths. I thought if I posted it somebody may give me
some suggestion or there is an already Add-in macro exist etc to solve such a
problem…

Regards

Here's some code that seems to work on your presented data.

It is slow because it is doing its zip code lookup at the USPS site. If this
is going to be a frequent occurrence for you, with many entries, you would be
better off purchasing a zip code database and keeping it on your machine. But
it'll give you an idea of what might be doable.

Be sure to read the notes in the code.

And also look at the set rng command which is just one way of setting up the
range to be processed. There are other methods than hard-coding it. Also note
that the code clears out the five columns to the left of the data, so as to
allow space for the room.

Not knowing how much further you want to go with this, I left out stuff like
checking for invalid zip codes (or no zip codes). That should probably be
added on depending on your specific requirements.

Enjoy.

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro
by name, and <RUN>.


========================================
'*** IMPORTANT NOTE ***
'MUST SET REFERENCE (TOOLS/REFERENCES IN MAIN MENU BAR)
'TO:
' Microsoft VBScript Regular Expressions 5.5
' Microsoft Internet Controls
' In Excel 2007, this is called "Microsoft Browser Helpers"
Option Explicit
Sub ParseAdr()
Dim c As Range, rng As Range
Dim s As String
Dim ZIP9 As String, ZIP5 As String
Dim sCity As String, sState As String
Dim sStreet As String, sAddrNum As String
Dim sTemp
Dim i As Long

'just an example of one method of selecting a range
' to process
Set rng = Range("a2:a13")

'clear adjacent cells of old data
Set c = rng.Offset(0, 1).Resize(rng.Rows.Count, 5)
c.Clear

'process each entry
For Each c In rng
s = Application.WorksheetFunction.Trim(c.Value)
ZIP9 = RegexMid(s, "\d+$")
ZIP5 = Left(ZIP9, 5)
c.Offset(0, 5).Value = Val(ZIP9)
c.Offset(0, 5).NumberFormat = "[<100000]00000;00000-0000"

If ZIP5 = "" Then
Debug.Print c.Address, c.Value
Exit Sub
End If

sTemp = RevZip(ZIP5)
sCity = sTemp(0, 0)
sState = sTemp(1, 0)
For i = 1 To UBound(RevZip(ZIP5), 2)
'check for which city in the zipcode might match.
'if none match, use the primary city
'remove spaces before comparison to allow for that error in the data
If InStr(1, Replace(s, " ", ""), Replace(sTemp(0, i), " ", ""), _
vbTextCompare) > 0 Then
sCity = sTemp(0, i)
sState = sTemp(1, i)
Exit For
End If
Next i
c.Offset(0, 3).Value = sCity
c.Offset(0, 4).Value = sState
sAddrNum = RegexMid(s, "^\d+")
c.Offset(0, 1).Value = sAddrNum
sStreet = RegexMid(s, sAddrNum & "\s?(.*?)\s?(" & Left(sCity, 7) & "|" _
& Left(Replace(sCity, " ", ""), 7) & ")", , 1)
c.Offset(0, 2).Value = sStreet
Next c
End Sub

Private Function RevZip(sZip5 As String) As Variant
'returns 2D array of each city/state pair
'in the zip code
Dim IE As InternetExplorer
Const sURL As String = "http://zip4.usps.com/zip4/citytown_zip.jsp"
Dim sHTML As String
Dim sTemp() As String
Dim i As Long

' Group2 = City Group3=State IGNORE CASE
Const rePattern As String = "headers=pre>(<b>)?([^,]+),\s([^<]+)"
Dim lNumCities As Long

Application.Cursor = xlWait
Set IE = New InternetExplorer
IE.Navigate sURL
IE.Visible = False
Do While IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While IE.Busy = True
DoEvents
Loop

IE.Document.all("zip5").Value = sZip5
IE.Document.all("Submit").Click
Do While IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While IE.Busy = True
DoEvents
Loop
sHTML = IE.Document.body.innerhtml
IE.Quit
Application.Cursor = xlDefault

'Note that the USPS site can return multiple
'cities for each zip code. So we need to
'return them all
lNumCities = RegexCount(sHTML, rePattern)
ReDim sTemp(0 To 1, 0 To lNumCities - 1)
For i = 0 To lNumCities - 1
sTemp(0, i) = RegexMid(sHTML, rePattern, i + 1, 2)
sTemp(1, i) = RegexMid(sHTML, rePattern, i + 1, 3)
Next i
RevZip = sTemp
End Function
Private Function RegexMid(s As String, sPat As String, _
Optional Index As Long = 1, _
Optional Subindex As Long, _
Optional CaseIgnore As Boolean = True, _
Optional Glbl As Boolean = True, _
Optional Multiline As Boolean = False) As String

Dim re As Object, mc As Object
Dim i As Long

Set re = CreateObject("vbscript.regexp")
re.Pattern = sPat
re.IgnoreCase = CaseIgnore
re.Global = Glbl
re.Multiline = Multiline

If re.test(s) = True Then
Set mc = re.Execute(s)
If Subindex = 0 Then
RegexMid = mc(Index - 1)
ElseIf Subindex <= mc(Index - 1).SubMatches.Count Then
RegexMid = mc(Index - 1).SubMatches(Subindex - 1)
End If
End If
Set re = Nothing
End Function
Private Function RegexCount(s As String, sPat As String) As Long
Dim re As RegExp, mc As MatchCollection
Set re = New RegExp
re.Pattern = sPat
re.Global = True
re.IgnoreCase = True
Set mc = re.Execute(s)
RegexCount = mc.Count
Set re = Nothing
End Function
=============================
--ron
 
C

Compile error: User-defined type not def

I tried to run this Module and I got error on
Dim IE As InternetExplorer on Private Function RevZip(sZip5 As String)
Sounds like one can not define a viriable as InternetExplorer there is no
such a definition and I tried Application and I could not get it to work
either...

Regards

Jeff P.


Ron Rosenfeld said:
Hello,

On an spread sheet on a single column I receive rather mixed up address that
I need to separate the street address into one column and city in another and
state into 3rd column and off course zip on 4th column... Here is some off
the address I receive:
1295 GAUGUIN CIR ALISOVIEJO CA 926563875
93316 S ARAPAHO DR SANTA ANA CA 927042403
9345 GAUGUIN Cm ALISOVIEJO CA 926563875
2344268 ORANGE AVE APT 3 LAKE FOREST CA 926304886
361449 S EL C AMINO REAL APT 4 SAN CLEMENTE CA 926723451
235521 EL REPOSA ALISOVIEJO CA 926561109
106556 CAIXE DEL CERRO UNIT 1414 SAN CLEMENTE CA 926726075
204475 S BIRCH ST SANTA ANA CA 927072703
125542 HALLADAY ST SANTA ANA CA 927071407
103319 W MYRTLE ST SANTA ANA CA 927033911
3342 ENTERPRISE APT 4106 ALISOVIEJO CA 926567091
6100EDINGER AVE APT 326 HUNTJNGTNBCH CA 926473266
As you may have noticed not only there are a lot of errors on the spelling
but also city names sometimes come in two words i.e. Santa Ana or ALISOVIEJO
which should be "Aliso Viejo" and as you see sometimes "6100 Edinger" comes
like "6100EDINGER"

I search for "Split Address" on the Forum and I did not see anything that
can tackle such a complicated situation... Splitting each Word into a
separate column does not work because sometimes city names are two word as I
explained above.

I know I need to do some kind of hand cleaning before/after I run any
special macro that can handle such a task but at this time I am kind of
baffled how to this huge task...

I came up with idea that if I scan the address column and take the last
piece of character with a leading and trailing space i.e. 926473266 and stick
it into Zip column and prior leading/trailing space as State and stick it
into State Column then I have one more to strip of the Address strings and
that would be City name which unfortunately because I have miss spelling and
sometimes two word as city name I have problem…

For city name sometimes I have to pick up two words prior to State and Zip
sometimes only one word and sometime I have spelling problem…

On line one of the sample address above for example 927042403 can be picked
up and stick into Zip code column, CA can be picked up and stick into State
Column but there is no city name by the name of ANA in California so I have
to pick up Word before ANA too which is SANTA ANA and stick it into city name
column and for the last example address line HUNTJNGTNBCH has spelling
problem along with two word city name and the correct name is Huntington
Beach or Bch….

A macro that can look up the zip code and give me a spelling suggestion or
correct city name would be awesome.

Honestly programming such a complicated macro is way above my knowledge of
VBA programming strengths. I thought if I posted it somebody may give me
some suggestion or there is an already Add-in macro exist etc to solve such a
problem…

Regards

Here's some code that seems to work on your presented data.

It is slow because it is doing its zip code lookup at the USPS site. If this
is going to be a frequent occurrence for you, with many entries, you would be
better off purchasing a zip code database and keeping it on your machine. But
it'll give you an idea of what might be doable.

Be sure to read the notes in the code.

And also look at the set rng command which is just one way of setting up the
range to be processed. There are other methods than hard-coding it. Also note
that the code clears out the five columns to the left of the data, so as to
allow space for the room.

Not knowing how much further you want to go with this, I left out stuff like
checking for invalid zip codes (or no zip codes). That should probably be
added on depending on your specific requirements.

Enjoy.

To enter this Macro (Sub), <alt-F11> opens the Visual Basic Editor.
Ensure your project is highlighted in the Project Explorer window.
Then, from the top menu, select Insert/Module and
paste the code below into the window that opens.

To use this Macro (Sub), <alt-F8> opens the macro dialog box. Select the macro
by name, and <RUN>.


========================================
'*** IMPORTANT NOTE ***
'MUST SET REFERENCE (TOOLS/REFERENCES IN MAIN MENU BAR)
'TO:
' Microsoft VBScript Regular Expressions 5.5
' Microsoft Internet Controls
' In Excel 2007, this is called "Microsoft Browser Helpers"
Option Explicit
Sub ParseAdr()
Dim c As Range, rng As Range
Dim s As String
Dim ZIP9 As String, ZIP5 As String
Dim sCity As String, sState As String
Dim sStreet As String, sAddrNum As String
Dim sTemp
Dim i As Long

'just an example of one method of selecting a range
' to process
Set rng = Range("a2:a13")

'clear adjacent cells of old data
Set c = rng.Offset(0, 1).Resize(rng.Rows.Count, 5)
c.Clear

'process each entry
For Each c In rng
s = Application.WorksheetFunction.Trim(c.Value)
ZIP9 = RegexMid(s, "\d+$")
ZIP5 = Left(ZIP9, 5)
c.Offset(0, 5).Value = Val(ZIP9)
c.Offset(0, 5).NumberFormat = "[<100000]00000;00000-0000"

If ZIP5 = "" Then
Debug.Print c.Address, c.Value
Exit Sub
End If

sTemp = RevZip(ZIP5)
sCity = sTemp(0, 0)
sState = sTemp(1, 0)
For i = 1 To UBound(RevZip(ZIP5), 2)
'check for which city in the zipcode might match.
'if none match, use the primary city
'remove spaces before comparison to allow for that error in the data
If InStr(1, Replace(s, " ", ""), Replace(sTemp(0, i), " ", ""), _
vbTextCompare) > 0 Then
sCity = sTemp(0, i)
sState = sTemp(1, i)
Exit For
End If
Next i
c.Offset(0, 3).Value = sCity
c.Offset(0, 4).Value = sState
sAddrNum = RegexMid(s, "^\d+")
c.Offset(0, 1).Value = sAddrNum
sStreet = RegexMid(s, sAddrNum & "\s?(.*?)\s?(" & Left(sCity, 7) & "|" _
& Left(Replace(sCity, " ", ""), 7) & ")", , 1)
c.Offset(0, 2).Value = sStreet
Next c
End Sub

Private Function RevZip(sZip5 As String) As Variant
'returns 2D array of each city/state pair
'in the zip code
Dim IE As InternetExplorer
Const sURL As String = "http://zip4.usps.com/zip4/citytown_zip.jsp"
Dim sHTML As String
Dim sTemp() As String
Dim i As Long

' Group2 = City Group3=State IGNORE CASE
Const rePattern As String = "headers=pre>(<b>)?([^,]+),\s([^<]+)"
Dim lNumCities As Long

Application.Cursor = xlWait
Set IE = New InternetExplorer
IE.Navigate sURL
IE.Visible = False
Do While IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While IE.Busy = True
DoEvents
Loop

IE.Document.all("zip5").Value = sZip5
IE.Document.all("Submit").Click
Do While IE.ReadyState <> READYSTATE_COMPLETE
DoEvents
Loop
Do While IE.Busy = True
DoEvents
Loop
sHTML = IE.Document.body.innerhtml
IE.Quit
Application.Cursor = xlDefault

'Note that the USPS site can return multiple
'cities for each zip code. So we need to
'return them all
lNumCities = RegexCount(sHTML, rePattern)
ReDim sTemp(0 To 1, 0 To lNumCities - 1)
For i = 0 To lNumCities - 1
sTemp(0, i) = RegexMid(sHTML, rePattern, i + 1, 2)
sTemp(1, i) = RegexMid(sHTML, rePattern, i + 1, 3)
Next i
RevZip = sTemp
End Function
Private Function RegexMid(s As String, sPat As String, _
Optional Index As Long = 1, _
Optional Subindex As Long, _
Optional CaseIgnore As Boolean = True, _
Optional Glbl As Boolean = True, _
Optional Multiline As Boolean = False) As String

Dim re As Object, mc As Object
Dim i As Long

Set re = CreateObject("vbscript.regexp")
re.Pattern = sPat
re.IgnoreCase = CaseIgnore
re.Global = Glbl
re.Multiline = Multiline

If re.test(s) = True Then
Set mc = re.Execute(s)
If Subindex = 0 Then
RegexMid = mc(Index - 1)
ElseIf Subindex <= mc(Index - 1).SubMatches.Count Then
RegexMid = mc(Index - 1).SubMatches(Subindex - 1)
End If
End If
Set re = Nothing
End Function
Private Function RegexCount(s As String, sPat As String) As Long
Dim re As RegExp, mc As MatchCollection
Set re = New RegExp
re.Pattern = sPat
re.Global = True
re.IgnoreCase = True
Set mc = re.Execute(s)
RegexCount = mc.Count
Set re = Nothing
End Function
=============================
--ron
 
R

Ron Rosenfeld

I tried to run this Module and I got error on
Dim IE As InternetExplorer on Private Function RevZip(sZip5 As String)
Sounds like one can not define a viriable as InternetExplorer there is no
such a definition and I tried Application and I could not get it to work
either...

Regards

Jeff P.

You don't provide much useful information in your post.
What kind of module did you place it in?
What error did you receive?
What does "tried Application" mean?
What does "could not get it to work" mean?



With the limited information you provide, I would guess that either you did not
enter the macro in accordance with the directions I posted; and/or didn't set
the references called for at the very beginning of the macro.

We can check the latter by having you select Tools/References (from the main
VBA Editor menu bar) and post here the references that you selected.
--ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top