European Address Data

M

Mark S

Hi and thanks in advance for any help.

I have 700,000 European Companies in a csv file.

Unfortunately the COUNTRY is separate but all other information (street,
zip/postal, city, region/state) is in one ADDRESS column.

I would like to be able to "find and copy/move" info to separate
street/city/zip columns as best as possible

For example for French or German addresses the zip code is five digits so I
want to search the the ability to move '#####' ("five number characters in a
row")strings to a ZIP column

Ideally this would also allow the ability to also take everything AFTER the
"five number characters in a row" string (i.e. Paris and/or Chalonnes sur
Loire) to a CITY column given the European ordering standard of
address-zip-city)

Suggestions???
 
P

Peo Sjoblom

You almost have an impossible task, for instance in your example you can see
how hard it would be given that you have 2 cities in France, one use one
word the other 3 words, how would you know where one city ends? What about
UK postal codes or is this just for Germany and France?
 
M

Mark S

Yes, I do realize (and have been working at it for a couple of days
already). FWIW I've had "some success" replacing some spaces with _ then
using the spaces to separate into different coloumns but obviously very time
consuming -- and british codes are the worst, what with not only a space
between but variable total length of characters . . . so as before, I'm
open to suggestions!
 
N

NickHK

Mark,
You mean you have addresses like these 2 (bad) examples, i.e. no delimeter
between the various parts:

2 Rue de boulogne Chalonnes sur Loire 28544, France
17 Cul de Sac Taunton Somerset TA2 7HF, england

There are list of country names (or create your own limited ones of
interest) that you can search through.
Once you have the country name, you could search a list of that country's
postal/area/zip codes.
May be also a list of country's counties/departements and towns/cities.
Split out the relevant part each time a section is resolved.
Road names may well be a problem, but that may be all you are left with
after you have removed the other information.

Such list are commercially available. Some free lists are also available,
but quality may be patchy.
You could try throwing the address at some suitable web site that deals in
locating/mapping.

Not easy all round, but doing 700K manually would also take a while.

NickHK
 
T

Tim Williams

This is a job for regular expressions. Try this out:

'***********************
Sub ExtractZipAndCity()
Dim regEx As Object
Dim m
Dim i As Integer
Dim rngText As Range
Dim s As String

Set regEx = CreateObject("vbscript.regexp")
regEx.Pattern = "\b(\d{5})\b"
regEx.Global = True
regEx.IgnoreCase = True

Set rngText = ActiveSheet.Range("A1")

Do While rngText.Value <> ""

s = rngText.Value
Set m = regEx.Execute(s)
If Not m Is Nothing Then
If m.Count = 1 Then
rngText.Offset(0, 1).Value = m(0)
rngText.Offset(0, 2).Value = Trim(Split(s, m(0))(1))
End If
End If

Set rngText = rngText.Offset(1, 0)
Loop

End Sub
'***************************

However, with 700k records you've little hope of being able to check the
results
in any meaningful way, so your final version will need to be a lot more
robust.

Tim
 
G

Guest

There is no simple way to do this - there is no "European standard" for
addresses. You can either try a data driven approach - parse out differently
for cases where you find "england", "france" and so on, but then you have to
be cute to look for spelling mistakes.

Sorry to be a doom merchant, but if I was in your shoes I'd either grit my
teeth, roll up my sleeves and prepare for a long long job OR go back to the
source of the data and ask for the data in a better format. If you cannot get
the data in a better format then good luck.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top