regular expression problem

K

Keith G Hicks

I'm writing some vb.net code to create a CLR for my SQL app. I've got most
of it running fine but the regular expression I'm using to locate a dollar
value in a TEXT column is not working right.

Here is the expression I'm using:
\(\$((\d{1,3})(\,\d{3})*)|(\d+)(\.\d{2})?\)

I need it to locate anything like the followiing:

($61245788)
($944848.23)
($1,984,545)
($5,654.23)

But I need it to reject anything that is NOT in parentheses, does not have a
dollar symbol in front, has more than 2 digits to the left of the decimal,
and if it has commas, all the numbers must of course be in groups of 3's.
The parentheses in the value to be found are not there to indicate negative
values. The amount is simply in parentheses for readability.

I'm having a heck of a time getting the regex to work right. It's rejecting
values I think it should pick up and sometimes only getting part of the
value. I'm not very good with these yet. Any help would be greatly
appreciated.

Thanks,



Keith
 
C

Cor Ligthert[MVP]

Keith,

It seems to me forever as regular expressions is a hobby from some persons.

However it is for sure no VB Net language. Moreover a main part of VB
programmer writters try as quick to get around it as soon as it is possible.

Can you describe the solutiojn of the problem that you are after. Now it
seems more as if to use Regular expression is the chalenge.

Cor
 
B

Branco

Keith said:
I'm writing some vb.net code to create a CLR for my SQL app. I've got most
of it running fine but the regular expression I'm using to locate a dollar
value in a TEXT column is not working right.

Here is the expression I'm using:
\(\$((\d{1,3})(\,\d{3})*)|(\d+)(\.\d{2})?\)

I need it to locate anything like the followiing:

($61245788)
($944848.23)
($1,984,545)
($5,654.23)

But I need it to reject anything that is NOT in parentheses, does not have a
dollar symbol in front, has more than 2 digits to the left of the decimal,
and if it has commas, all the numbers must of course be in groups of 3's.
The parentheses in the value to be found are not there to indicate negative
values. The amount is simply in parentheses for readability.
<snip>

You have the parenthesis wrong. It seems that in the Regex language
concatenation binds tighter than "|", and this turns your pattern in
actually *two* patterns:

a) \(\$((\d{1,3})(\,\d{3})*) -- which would match ($612, ($944,
($1,984,545 and ($5,654 (from the set of examples you provided; and

b) (\d+)(\.\d{2})?\) -- which would match 45788), 848.23) and 23)

The actual regex that you need seems to be:

\(\$((\d{1,3}(\,\d{3})*)|(\d+))(\.\d{2})?\)

Or, if you want to explicitly discard the capture groups:

\(\$(?:(?:\d{1,3}(\,\d{3})*)|\d{4,})(?:\.\d{2})?\)

(capture groups are the elements between parens in your regex, which
are set aside for later reference by the regex engine. To prevent the
regex from capturing a parenthised expression, the opening parens must
be followed by "?:")

HTH.

Regards,

B.
 
E

eBob.com

I'm not one of the regex gurus here but I think that the gurus will agree
with my advice to get Expresso from UltraPico. It's free, easy to use, and
really helpful in debugging regular expressions.

Using Expresso I started to play with your expression and data and found
right away that your expression is not picking up the ending close
parenthesis. Then I noticed that the Expresso "Regex Analyzer" window was
showing that your expression was selecting from two alternatives. You may
have balanced parentheses, but I don't think that you have them right. You
want a match on the closing parenthesis, right?

Get Expresso and I think that you'll have your expression straightened out
in a few minutes. If not let us know.

Good Luck, Bob
 
E

eBob.com

Branco,

(I am not the OP, but ...) thanks so much for making me aware of the ability
to have a parenthised expression which is not a capture group. I have this
bad habit of reading just enough documentation to get the job done and
hadn't come across that.

So I went to my Expresso session and played with the expression with the
non-capturing parenthised expressions which you included in your post and
discovered that you seem to have missed one "?:". I think that the complete
expression without capturing is:

\(\$(?:(?:\d{1,3}(?:\,\d{3})*)|\d{4,})(?:\.\d{2})?\)

Thanks again for furthering my very basic knowledge of regular expressions.

Bob
 
B

Branco

eBob.com wrote:
So I went to my Expresso session and played with the expression with the
non-capturing parenthised expressions which you included in your post and
discovered that you seem to have missed one "?:".  I think that the complete
expression without capturing is:

\(\$(?:(?:\d{1,3}(?:\,\d{3})*)|\d{4,})(?:\.\d{2})?\)
<snip>

ouch! Thanks for pointing it out.

don't you hate then regular expressions? The more I look at then the
more they look like a lot of noise! =))

Regrads,

B.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top