regular expression

T

Tony Johansson

Hi!

I'm new to regular expression
This simple one should mean find all words that begin with a and end with
ion and in between you can have 0 or more character. This doesn't work
string pattern = @"\ba*ion\b";

The pattern should be
string pattern = @"\ba\S*ion\b";

So my question is why can't the pattern have this format string pattern =
@"\ba*ion\b";
when looking for a word that begin with a end end with ion and in between
any char 0 or more.
I mean why does not this * mean any characters one or more.

//Tony
 
A

Alberto Poblacion

Tony Johansson said:
I mean why does not this * mean any characters one or more.

No, this is wrong. "*" means "Repeat the preceding character zero or
more times". So "a*" means "any number of a's".

To indicate "zero or more characters of any kind " you use ".*" because
"." means "any character".
 
H

Harlan Messinger

Tony said:
Hi!

I'm new to regular expression
This simple one should mean find all words that begin with a and end with
ion and in between you can have 0 or more character. This doesn't work
string pattern = @"\ba*ion\b";

The pattern should be
string pattern = @"\ba\S*ion\b";

So my question is why can't the pattern have this format string pattern =
@"\ba*ion\b";
when looking for a word that begin with a end end with ion and in between
any char 0 or more.
I mean why does not this * mean any characters one or more.

Don't confuse regular expressions with the wild card syntax operating
systems use for file names. In file name wild cards, ? means "any one
character" and * means "any number of arbitrary characters". In regular
expressions, ? means "0 or 1 of the preceding item" and * means "0 or
more of the preceding item". And + means "1 or more of the preceding item".
 
T

Tony Johansson

Harlan Messinger said:
Don't confuse regular expressions with the wild card syntax operating
systems use for file names. In file name wild cards, ? means "any one
character" and * means "any number of arbitrary characters". In regular
expressions, ? means "0 or 1 of the preceding item" and * means "0 or more
of the preceding item". And + means "1 or more of the preceding item".

Let us consider an example of the \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} regular
expression that verifies that a telephone number is entered in the correct
format. The following strings would match this regular expression.
(314).555-4000
314-555-4000
314 555-4000

But it doesn't work I get exception saying
parsar \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} - Too many ).


//Tony
 
P

Peter Duniho

Tony said:
[...]
But it doesn't work I get exception saying
parsar \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} - Too many ).

Oddly enough, the exception is telling you exactly what's wrong. There
are too many ')' characters in your expression.

More specifically, you've escaped the first one, so it's taken
literally, but you didn't escape the second one, so Regex is trying to
use it to close a preceding unescaped '(', which doesn't exist.

Hence, you have too many ')'.

Change ) to \) and it should be fine.

Pete
 
H

Harlan Messinger

Tony said:
Harlan Messinger said:
Don't confuse regular expressions with the wild card syntax operating
systems use for file names. In file name wild cards, ? means "any one
character" and * means "any number of arbitrary characters". In regular
expressions, ? means "0 or 1 of the preceding item" and * means "0 or more
of the preceding item". And + means "1 or more of the preceding item".

Let us consider an example of the \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} regular
expression that verifies that a telephone number is entered in the correct
format. The following strings would match this regular expression.
(314).555-4000
314-555-4000
314 555-4000

But it doesn't work I get exception saying
parsar \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} - Too many ).

Let's break this down:

\(? matches 0 or 1 left parenthesis,

\d{3} matches exactly three digits.

\? matches a question mark.

And right there is your first problem, and the one triggering the error,
because it's then followed by a single right parenthesis that is being
treated as a delimiter in the regular expression rather than as a
character to be matched in the pattern. If the problem is that you typed
the question mark and the right parenthesis in the wrong order, you
would have

\)?, which would match 0 or 1 right parentheses.

Next:

[-s\.]? matches 0 or 1 characters that is a hyphen, an "s", or a period.
I am supposing that you aren't trying to match an "s". If you mean a
whitespace character, it's "\s".

I'll let you fix those problems and continue from there. There's another
issue: while you want both "609" and "(609)" to be accepted for the area
code, you probably don't want to accept "(609" or "609)". So you would
probably want to have

(\d{3}|(\(\d{3}\)))

which matches something that matches either \d{3} OR \(\d{3}\) .
 
T

Tony Johansson

Peter Duniho said:
Tony said:
[...]
But it doesn't work I get exception saying
parsar \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} - Too many ).

Oddly enough, the exception is telling you exactly what's wrong. There
are too many ')' characters in your expression.

More specifically, you've escaped the first one, so it's taken literally,
but you didn't escape the second one, so Regex is trying to use it to
close a preceding unescaped '(', which doesn't exist.

Hence, you have too many ')'.

Change ) to \) and it should be fine.

Pete

Now I don't get any runtime error but I why do I not get any match ?
(314).555-4000
314-555-4000
314 555-4000

//Tony
 
H

Harlan Messinger

Tony said:
Peter Duniho said:
Tony said:
[...]
But it doesn't work I get exception saying
parsar \(?\d{3}\?)[-s\.]?\d{3}[-.]\d{4} - Too many ).
Oddly enough, the exception is telling you exactly what's wrong. There
are too many ')' characters in your expression.

More specifically, you've escaped the first one, so it's taken literally,
but you didn't escape the second one, so Regex is trying to use it to
close a preceding unescaped '(', which doesn't exist.

Hence, you have too many ')'.

Change ) to \) and it should be fine.

Pete

Now I don't get any runtime error but I why do I not get any match ?
(314).555-4000
314-555-4000
314 555-4000

Pete only answered your question about what was causing the error. If
the *only* change you made was the one he suggested, then, that's
correct, your regular expression will not match a phone number, at least
not in any country where phone numbers are written without question marks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top