Regular Expression in C#

J

Jianwei Sun

Hi, guys,

I am trying to parse out the apratment number in a regular expression :

If I use

Regex regex = new
Regex(@"(?<AddressWithoutApartment>.+)(?<ApartmentSeperator>\bAPT|#|UNIT\b)(?<Apartment>.+)",
RegexOptions.ExplicitCapture);

I will be able to parse out "100 main ST # C)

But if I move "#" position in regular expression, I won't be able to
parse out the same address anymore.

Regex regex = new
Regex(@"(?<AddressWithoutApartment>.+)(?<ApartmentSeperator>\bAPT|UNIT\#\b)(?<Apartment>.+)",
RegexOptions.ExplicitCapture);

Any suggestion, guru.

Thanks,
JW
 
G

Greg Bacon

: I am trying to parse out the apratment number in a regular expression :
:
: If I use
:
: Regex regex = new
: Regex(@"[...](?<ApartmentSeperator>\bAPT|#|UNIT\b)[...]",
: RegexOptions.ExplicitCapture);
:
: I will be able to parse out "100 main ST # C)
:
: But if I move "#" position in regular expression, I won't be able to
: parse out the same address anymore.
:
: Regex regex = new
: Regex(@"[...](?<ApartmentSeperator>\bAPT|UNIT\#\b)[...]",
: RegexOptions.ExplicitCapture);

The latter fails to match because there's no word boundary (\b) between
an octothorpe and a space. Remember, a word boundard occurs between a
\w and a \W or vice versa, but '#' and ' ' both match \W.

A start in the right direction is (line breaks inserted):

(?<AddressWithoutApartment>.+)
(?<ApartmentSeparator>\b(APT|UNIT)\b|#(?=\s+))
(?<Apartment>.+)

Hope this helps,
Greg
 
J

Jianwei Sun

Greg said:
: I am trying to parse out the apratment number in a regular expression :
:
: If I use
:
: Regex regex = new
: Regex(@"[...](?<ApartmentSeperator>\bAPT|#|UNIT\b)[...]",
: RegexOptions.ExplicitCapture);
:
: I will be able to parse out "100 main ST # C)
:
: But if I move "#" position in regular expression, I won't be able to
: parse out the same address anymore.
:
: Regex regex = new
: Regex(@"[...](?<ApartmentSeperator>\bAPT|UNIT\#\b)[...]",
: RegexOptions.ExplicitCapture);

The latter fails to match because there's no word boundary (\b) between
an octothorpe and a space. Remember, a word boundard occurs between a
\w and a \W or vice versa, but '#' and ' ' both match \W.

A start in the right direction is (line breaks inserted):

(?<AddressWithoutApartment>.+)
(?<ApartmentSeparator>\b(APT|UNIT)\b|#(?=\s+))
(?<Apartment>.+)

Hope this helps,
Greg
Thanks, Greg!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top