PC Review


Reply
Thread Tools Rate Thread

adress regex help

 
 
mikewolfbaltimore@gmail.com
Guest
Posts: n/a
 
      14th Jun 2006
Hello all

have a regex question... I want to split an address into descrete parts

so

709 S Milton Ave is split into
number = 709
Direction = S
Name = Milton
Type = Ave

So I have the following regex

(?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)

Which works for the folowing address

709 S S Milton ave (as in 709 S South Milton ave)

as that S is part of the number

but does not work for

709 S Milton ave
because it thinks that the S is part of the number and not the
direction....

any ideas

 
Reply With Quote
 
 
 
 
Ben Voigt
Guest
Posts: n/a
 
      14th Jun 2006

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hello all
>
> have a regex question... I want to split an address into descrete parts
>
> so
>
> 709 S Milton Ave is split into
> number = 709
> Direction = S
> Name = Milton
> Type = Ave
>
> So I have the following regex
>
> (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
>
> Which works for the folowing address
>
> 709 S S Milton ave (as in 709 S South Milton ave)
>
> as that S is part of the number
>
> but does not work for
>
> 709 S Milton ave
> because it thinks that the S is part of the number and not the
> direction....


Without having a database to find out whether the city has a "South Milton
Avenue", it's ambiguous. Why isn't number "709 S" on "Milton Ave" as valid
as number "709" on "S Milton Ave".

Moreover, your regex is going to go crazy over
P.O. Box 6000

>
> any ideas
>



 
Reply With Quote
 
Kevin Spencer
Guest
Posts: n/a
 
      14th Jun 2006
The first thing you've got to do is figure out all of the possible
permutations of combinations of tokens that may comprise an "address." You
have only apparently noticed one or two. In fact, an "address" can take many
combinations of many forms, and include many combinations of abbreviations
of various kinds. In addition, the order of the elements (tokens) in an
address can be ordered in any number of ways, particularly if these
addresses come from different countries, and especially if these addresses
have been provided by human beings rather then machines.

IOW, you've opened up a huge can of worms for yourself. What you need is not
just a regular expression, but a bit of AI to solve this problem. I have
seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps can
do it fairly well, but Microsoft and Google have a lot of money to throw at
this sort of problem.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hello all
>
> have a regex question... I want to split an address into descrete parts
>
> so
>
> 709 S Milton Ave is split into
> number = 709
> Direction = S
> Name = Milton
> Type = Ave
>
> So I have the following regex
>
> (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
>
> Which works for the folowing address
>
> 709 S S Milton ave (as in 709 S South Milton ave)
>
> as that S is part of the number
>
> but does not work for
>
> 709 S Milton ave
> because it thinks that the S is part of the number and not the
> direction....
>
> any ideas
>



 
Reply With Quote
 
mikewolfbaltimore@gmail.com
Guest
Posts: n/a
 
      15th Jun 2006
Thanks guys... couple reasponses....

1) 709 S | Milton Ave is not as valid as 709 | S | Milton ave because
they want the direction seperate... 709 S is not the street number 709
is and S Milton is not the street milton is.

2) Kevin, yah what I was suspecting but not wanting to think about.
Alternative for the client is to have 4 seperate fields on the ui
[number] [direction] [street] [type] .... but I hate this as that its
not intuitive.... or web standard.

thanks for your input guys

mike

Kevin Spencer wrote:
> The first thing you've got to do is figure out all of the possible
> permutations of combinations of tokens that may comprise an "address." You
> have only apparently noticed one or two. In fact, an "address" can take many
> combinations of many forms, and include many combinations of abbreviations
> of various kinds. In addition, the order of the elements (tokens) in an
> address can be ordered in any number of ways, particularly if these
> addresses come from different countries, and especially if these addresses
> have been provided by human beings rather then machines.
>
> IOW, you've opened up a huge can of worms for yourself. What you need is not
> just a regular expression, but a bit of AI to solve this problem. I have
> seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps can
> do it fairly well, but Microsoft and Google have a lot of money to throw at
> this sort of problem.
>
> --
> HTH,
>
> Kevin Spencer
> Microsoft MVP
> Professional Chicken Salad Alchemist
>
> A lifetime is made up of
> Lots of short moments.
>
> <(E-Mail Removed)> wrote in message
> news:(E-Mail Removed)...
> > Hello all
> >
> > have a regex question... I want to split an address into descrete parts
> >
> > so
> >
> > 709 S Milton Ave is split into
> > number = 709
> > Direction = S
> > Name = Milton
> > Type = Ave
> >
> > So I have the following regex
> >
> > (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
> >
> > Which works for the folowing address
> >
> > 709 S S Milton ave (as in 709 S South Milton ave)
> >
> > as that S is part of the number
> >
> > but does not work for
> >
> > 709 S Milton ave
> > because it thinks that the S is part of the number and not the
> > direction....
> >
> > any ideas
> >


 
Reply With Quote
 
Kevin Spencer
Guest
Posts: n/a
 
      15th Jun 2006
Keep in mind that addresses don't always follow that (or any similar)
format. Here are a few examples:

John Smith
Smith Enterprises
P.O. Box 12345
Anytown, Nebraska
00000

Jack and Jill Hill
RR 5 Box 909
Podunk, WI 12345-7890

MR S HOLMES
2978 W MAIN ST # 12
MINNEAPOLIS MN 23976-4542

May December
Bowers Holiday Village
Bldg 91 Apt. 2-A
12 31st Street
Baltimore, Maryland
79797
USA

Herrn
Günther Meyer
Goethestraße 25
20002 HAMBURG
Federal Republic of Germany

SGT NICK FURY
HEADQUARTERS COMPANY
7TH ARMY TRAINING CENTER
ATTN: AETT-AG
UNIT 28130
APO AE 09114

CUSTOMS ATTACHE
AMERICAN EMBASSY CARACAS
UNIT 4964
APO AA 34037

MS HELEN SAUNDERS
1010 CLEAR STREET
OTTAWA ON K1A 0B1
CANADA

MS JOYCE BROWNING
2045 ROYAL ROAD
06570 ST PAUL
FRANCE

MS JOYCE BROWNING
2045 ROYAL ROAD
LONDON WIP 6HQ
ENGLAND

RUFUS LANGDON
LAW DEPARTMENT
US POSTAL SERVICE
475 L'ENFANT PLZ SW RM 6627
WASHINGTON DC 202360-1120

I have found a few references for you. However, again, this is a huge task.
There is commercial software out there that you can buy to do this sort of
parsing. Just Google for it. Here are some links to references:

http://www.columbia.edu/kermit/postal.html
http://pe.usps.com/text/pub28/welcome.htm
http://www.grcdi.nl/whitepapers.htm
http://aurora.regenstrief.org/v3dt/PAS.html
http://www.cicc.or.jp/english/hyoujy...tabook/219.htm

Good luck!

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Chicken Salad Alchemist

A lifetime is made up of
Lots of short moments.


<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Thanks guys... couple reasponses....
>
> 1) 709 S | Milton Ave is not as valid as 709 | S | Milton ave because
> they want the direction seperate... 709 S is not the street number 709
> is and S Milton is not the street milton is.
>
> 2) Kevin, yah what I was suspecting but not wanting to think about.
> Alternative for the client is to have 4 seperate fields on the ui
> [number] [direction] [street] [type] .... but I hate this as that its
> not intuitive.... or web standard.
>
> thanks for your input guys
>
> mike
>
> Kevin Spencer wrote:
>> The first thing you've got to do is figure out all of the possible
>> permutations of combinations of tokens that may comprise an "address."
>> You
>> have only apparently noticed one or two. In fact, an "address" can take
>> many
>> combinations of many forms, and include many combinations of
>> abbreviations
>> of various kinds. In addition, the order of the elements (tokens) in an
>> address can be ordered in any number of ways, particularly if these
>> addresses come from different countries, and especially if these
>> addresses
>> have been provided by human beings rather then machines.
>>
>> IOW, you've opened up a huge can of worms for yourself. What you need is
>> not
>> just a regular expression, but a bit of AI to solve this problem. I have
>> seen it done, but I'm not sure *how* it's done. MapPoint and Google Maps
>> can
>> do it fairly well, but Microsoft and Google have a lot of money to throw
>> at
>> this sort of problem.
>>
>> --
>> HTH,
>>
>> Kevin Spencer
>> Microsoft MVP
>> Professional Chicken Salad Alchemist
>>
>> A lifetime is made up of
>> Lots of short moments.
>>
>> <(E-Mail Removed)> wrote in message
>> news:(E-Mail Removed)...
>> > Hello all
>> >
>> > have a regex question... I want to split an address into descrete parts
>> >
>> > so
>> >
>> > 709 S Milton Ave is split into
>> > number = 709
>> > Direction = S
>> > Name = Milton
>> > Type = Ave
>> >
>> > So I have the following regex
>> >
>> > (?<number>^\d*(\s\w|\w|\-\w|\s\d/\d))\s(?<direction>(n\.|N\.|s\.|S\.|E\.|e\.|W\.|w\.|NE\.|ne\.|SE\.|se\.|NW\.|nw\.|SW\.|sw\.|n|N|s|S|E|e|W|w|NE|ne|SE|se|NW|nw|SW|sw|North|East|West|South|north|south|west|east)*)(?<street>(.*[^street|place|drive|st|pl|dr|ave|av])*)(?<type>.*)
>> >
>> > Which works for the folowing address
>> >
>> > 709 S S Milton ave (as in 709 S South Milton ave)
>> >
>> > as that S is part of the number
>> >
>> > but does not work for
>> >
>> > 709 S Milton ave
>> > because it thinks that the S is part of the number and not the
>> > direction....
>> >
>> > any ideas
>> >

>



 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex: different options on different sections of the Regex? Ethan Strauss Microsoft C# .NET 4 11th Jul 2008 03:54 PM
Adress Bar in Explorer constantly reverts back to old adress (NOT =?Utf-8?B?aXQtc3VwcG9ydA==?= Windows XP General 7 15th Jul 2007 06:28 PM
Is there a good way to turn regex patterns into properly-escaped c# regex patterns? sherifffruitfly Microsoft C# .NET 3 11th Feb 2007 09:56 AM
Add in Adress Book one adress received =?Utf-8?B?RmVybmFuZG8gUmFuaXRv?= Microsoft Outlook Contacts 1 9th Dec 2006 08:40 PM
when I type an adress in adress window does not go to that adress Ken Windows XP Internet Explorer 1 5th Aug 2003 06:40 AM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 03:11 AM.