Regex Issues

M

Mike Labosh

I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names that
come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally, if I
have a string expression that's wrong, I would Console.WriteLine() it. But
in this case, it doesn't WriteLine correctly, because some of the characters
in the expression are control characters, so what it displays is not
visually correct.

I have slaved over this issue for hours and hours and I can only guess that
one of the items must be escaped with a "\" or something, but I cannot
figure it out. I have already been all over the MSDN help topics for the
Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES
 
S

Some Guy

I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"
 
M

Mike Labosh

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")
I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"

No. It's not the concatenation. The dumb thing *used to work*, but they
want me to change a couple of the character ranges. So all I have done is
changed a couple of the character codes passed to the Chr() function. Now
it's b0rken.

What I am currently attempting is to create a Regex from each single line of
my OP so I can find which one is causing the issue, then perhaps I can
determine a workaround.
 
J

jg

make sure the new character code does not special meaning in regex. IF they
do, use the escape prefix before the " & chr...

Sorry, I don't know the details, but I am sure you can look it up in msdn
under regex

Mike Labosh said:
Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")
I think you're having problems with all those &'s between the brakets.
Maybe?

"([Chr(0) & "-" & Chr(31) ]+)|"

No. It's not the concatenation. The dumb thing *used to work*, but they
want me to change a couple of the character ranges. So all I have done is
changed a couple of the character codes passed to the Chr() function. Now
it's b0rken.

What I am currently attempting is to create a Regex from each single line
of my OP so I can find which one is causing the issue, then perhaps I can
determine a workaround.
--
Peace & happy computing,

Mike Labosh, MCSD
"Musha ring dum a doo dum a da!" -- James Hetfield
 
M

Mike Labosh

make sure the new character code does not special meaning in regex. IF
they do, use the escape prefix before the " & chr...

That's what I'm trying to do. Each line of the expression seems to work by
itself. So I am now trying varying combinations.
Sorry, I don't know the details, but I am sure you can look it up in msdn
under regex

heh. If you saw all the MSDN printouts on my desk, you would hurt me for
killing trees :)
 
C

Chris Burgess

It barks at me until I remove this line:
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _

I'm not sure why.
 
C

Chris Burgess

parsing "([>-Y]+)|" - [x-y] range in reverse order.



Chris Burgess said:
It barks at me until I remove this line:
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _

I'm not sure why.

Mike Labosh said:
I have the following System.Text.RegularExpressions.Regex that is supposed
to remove this predefined list of garbage characters from contact names
that come in on import files :

Dim _dropContactGarbage As New Regex( _
"([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
"([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
"([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
"([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
"([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
"([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
"([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
"([" & Chr(152) & "]+)|" & _
"([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
"([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
"([" & Chr(226) & "-" & Chr(255) & "]+)")

We use it like this:

value = _dropContactGarbage.Replace(value, "")

But the Regex constructor is throwing an ArgumentException whose Message
property says only "Parse ([". There is no inner exception. Normally,
if I have a string expression that's wrong, I would Console.WriteLine()
it. But in this case, it doesn't WriteLine correctly, because some of the
characters in the expression are control characters, so what it displays
is not visually correct.

I have slaved over this issue for hours and hours and I can only guess
that one of the items must be escaped with a "\" or something, but I
cannot figure it out. I have already been all over the MSDN help topics
for the Regex Class.

Help?

--
Peace & happy computing,

Mike Labosh, MCSD
"After very careful consideration, I have come
to the conclusion that this new system SUCKS"
-- General Barringer, from WARGAMES
 
J

Jay B. Harlow [MVP - Outlook]

Mike,
Rather then literally using Chr(0), Chr(31), Chr(33), ..., I would recommend
the RegEx Character Escape sequences.

http://msdn.microsoft.com/library/d...en-us/cpgenref/html/cpconcharacterescapes.asp

Something like:

' With ASCII character escapes
Dim _dropContactGarbage As New Regex( _
"([\x00-\x1F]+)|" & _
"([\x21-\x26]+)|" & _
"([\x28-\x2C]+)|" & _
...

Of course you may have problems with Chr(128) & above, as Chr(128) is an
ANSI char code, while Regex expects ASCII and/or Unicode. As you know ASCII
is 7 bit (0 to 127) & Unicode in RegEx needs 4 digits (\u0000).

' with Unicode character escapes
Dim _dropContactGarbage As New Regex( _
"([\u0000-\u001F]+)|" & _
"([\u0021-\u0026]+)|" & _
"([\u0028-\u002C]+)|" & _


It might be "easier" if you used a the predefined character classes (\s \w
\W \s ...) instead:

http://msdn.microsoft.com/library/d...en-us/cpgenref/html/cpconcharacterclasses.asp

Something like:
Dim _dropContactGarbage As New Regex("\W")

Which says match any nonword character...

Expresso & RegEx Workbench both have wizards of varying degrees to help you
build your expression, plus they allow you to test your expressions, also
the analyzer/interpreter in each is rather handy.

Expresso:
http://www.ultrapico.com/Expresso.htm

RegEx Workbench:
http://www.gotdotnet.com/Community/...pleGuid=c712f2df-b026-4d58-8961-4ee2729d7322A

tutorial & reference on using regular expressions:
http://www.regular-expressions.info/

The MSDN's documentation on regular expressions:
http://msdn.microsoft.com/library/d...l/cpconRegularExpressionsLanguageElements.asp

Hope this helps
Jay



|I have the following System.Text.RegularExpressions.Regex that is supposed
| to remove this predefined list of garbage characters from contact names
that
| come in on import files :
|
| Dim _dropContactGarbage As New Regex( _
| "([" & Chr(0) & "-" & Chr(31) & "]+)|" & _
| "([" & Chr(33) & "-" & Chr(38) & "]+)|" & _
| "([" & Chr(40) & "-" & Chr(44) & "]+)|" & _
| "([" & Chr(47) & "-" & Chr(47) & "]+)|" & _
| "([" & Chr(58) & "-" & Chr(64) & "]+)|" & _
| "([" & Chr(91) & "-" & Chr(96) & "]+)|" & _
| "([" & Chr(123) & "-" & Chr(127) & "]+)|" & _
| "([" & Chr(152) & "]+)|" & _
| "([" & Chr(155) & "-" & Chr(159) & "]+)|" & _
| "([" & Chr(166) & "-" & Chr(224) & "]+)|" & _
| "([" & Chr(226) & "-" & Chr(255) & "]+)")
|
| We use it like this:
|
| value = _dropContactGarbage.Replace(value, "")
|
| But the Regex constructor is throwing an ArgumentException whose Message
| property says only "Parse ([". There is no inner exception. Normally, if
I
| have a string expression that's wrong, I would Console.WriteLine() it.
But
| in this case, it doesn't WriteLine correctly, because some of the
characters
| in the expression are control characters, so what it displays is not
| visually correct.
|
| I have slaved over this issue for hours and hours and I can only guess
that
| one of the items must be escaped with a "\" or something, but I cannot
| figure it out. I have already been all over the MSDN help topics for the
| Regex Class.
|
| Help?
|
| --
| Peace & happy computing,
|
| Mike Labosh, MCSD
| "After very careful consideration, I have come
| to the conclusion that this new system SUCKS"
| -- General Barringer, from WARGAMES
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top