Keep in mind that addresses don't always follow that (or any similar)
format. Here are a few examples:
John Smith
Smith Enterprises
P.O. Box 12345
Anytown, Nebraska
00000
Jack and Jill Hill
RR 5 Box 909
Podunk, WI 12345-7890
MR S HOLMES
2978 W MAIN ST # 12
MINNEAPOLIS MN 23976-4542
May December
Bowers Holiday Village
Bldg 91 Apt. 2-A
12 31st Street
Baltimore, Maryland
79797
USA
Herrn
Günther Meyer
Goethestraße 25
20002 HAMBURG
Federal Republic of Germany
SGT NICK FURY
HEADQUARTERS COMPANY
7TH ARMY TRAINING CENTER
ATTN: AETT-AG
UNIT 28130
APO AE 09114
CUSTOMS ATTACHE
AMERICAN EMBASSY CARACAS
UNIT 4964
APO AA 34037
MS HELEN SAUNDERS
1010 CLEAR STREET
OTTAWA ON K1A 0B1
CANADA
MS JOYCE BROWNING
2045 ROYAL ROAD
06570 ST PAUL
FRANCE
MS JOYCE BROWNING
2045 ROYAL ROAD
LONDON WIP 6HQ
ENGLAND
RUFUS LANGDON
LAW DEPARTMENT
US POSTAL SERVICE
475 L'ENFANT PLZ SW RM 6627
WASHINGTON DC 202360-1120
I have found a few references for you. However, again, this is a huge task.
There is commercial software out there that you can buy to do this sort of
parsing. Just Google for it. Here are some links to references:
http://www.columbia.edu/kermit/postal.html
http://pe.usps.com/text/pub28/welcome.htm
http://www.grcdi.nl/whitepapers.htm
http://aurora.regenstrief.org/v3dt/PAS.html
http://www.cicc.or.jp/english/hyoujy...tabook/219.htm
Good luck!
--
HTH,
Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist
A lifetime is made up of
Lots of short moments.
<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Thanks guys... couple reasponses....
>
> 1) 709 S | Milton Ave is not as valid as 709 | S | Milton ave because
> they want the direction seperate... 709 S is not the street number 709
> is and S Milton is not the street milton is.
>
> 2) Kevin, yah what I was suspecting but not wanting to think about.
> Alternative for the client is to have 4 seperate fields on the ui
> [number] [direction] [street] [type] .... but I hate this as that its
> not intuitive.... or web standard.
>
> thanks for your input guys
>
> mike
>
> Kevin Spencer wrote:
>> The first thing you've got to do is figure out all of the possible
>> permutations of combinations of tokens that may comprise an "address."
>> You
>> have only apparently noticed one or two. In fact, an "address" can take
>> many
>> combinations of many forms, and include many combinations of
>> abbreviations
>> of various kinds. In addition, the order of the elements (tokens) in an
>> address can be ordered in any number of ways, particularly if these
>> addresses come from different countries, and especially if these
>> addresses
>> have been provided by human beings rather then machines.
>>
>> IOW, you've opened up a huge can of worms for yourself. What you need is
>> not
>> just a regular expression, but a bit of AI to solve this problem. I have
>> seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps
>> can
>> do it fairly well, but Microsoft and Google have a lot of money to throw
>> at
>> this sort of problem.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> Professional Chicken Salad Alchemist
>>
>> A lifetime is made up of
>> Lots of short moments.
>>
>> <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>> > Hello all
>> >
>> > have a regex question... I want to split an address into descrete parts
>> >
>> > so
>> >
>> > 709 S Milton Ave is split into
>> > number = 709
>> > Direction = S
>> > Name = Milton
>> > Type = Ave
>> >
>> > So I have the following regex
>> >
>> > (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
>> >
>> > Which works for the folowing address
>> >
>> > 709 S S Milton ave (as in 709 S South Milton ave)
>> >
>> > as that S is part of the number
>> >
>> > but does not work for
>> >
>> > 709 S Milton ave
>> > because it thinks that the S is part of the number and not the
>> > direction....
>> >
>> > any ideas
>> >
>