RegEx question

C

cody

Is it possible to to make all chars in a Text uppercase (except some special
letters which I define)?
I know I could solve this without RegEx but I want to learn it and Iam also
curious if it is possible with regular expressions.

so for example "p.x + p.y + p.z" should result in "p.X + p.Y + p.Z".
 
R

Rob Perkins

cody said:
Is it possible to to make all chars in a Text uppercase (except some special
letters which I define)?
I know I could solve this without RegEx but I want to learn it and Iam also
curious if it is possible with regular expressions.

so for example "p.x + p.y + p.z" should result in "p.X + p.Y + p.Z".

Sure, I'll give it a shot. Though there's probably more than one way
to do it.

Your regex in this sample case is [xyz], I think, which (someone
correct me if I'm wrong) will always match on single characters x, y,
or z. The most flexible way I can think of is to use a MatchEvaluator
delegate (predefined for you by the Framework) to process each of your
matches. Looks like this:

In your processing routine, then, do this:

Private Function MatchEval(ByVal m as Match)
return String.ToUpper(m.ToString)
End Function

And you invoke it as below:

Public Function ReplaceText(ByVal sourceText as String) As String
Dim rx as New Regex("[xyz]")
Return rx.Replace(sourceText, AddressOf MatchEval)
End Function

That's all there is to that. There's a C# translator someplace that
will make it C#-ish, if you want, I'm sure.

Rob
 
R

Rob Perkins

Rob Perkins said:
Private Function MatchEval(ByVal m as Match)

Whoops!

That should have been

Private Function MatchEval(ByVal m as Match) As String

Rob
 
C

cody

Your regex in this sample case is [xyz], I think, which (someone
correct me if I'm wrong) will always match on single characters x, y,
or z.

And how can I exclude letters?
 
R

Rob Perkins

cody said:
Your regex in this sample case is [xyz], I think, which (someone
correct me if I'm wrong) will always match on single characters x, y,
or z.

And how can I exclude letters?

I'm not sure what you mean by that. If you mean exclude letters from
being matched, you simply don't include them between the brackets. The
regex:

[xyz]

excludes all the letters which are not x, y, or z. Have a look here:
<http://www.regular-expressions.info/> for a good tutorial on how it
all works.

Rob
 
C

cody

And how can I exclude letters?
I'm not sure what you mean by that. If you mean exclude letters from
being matched, you simply don't include them between the brackets.

Thank you but the example with xyz was just an example, I want to make _all_
characters in a text uppercase and exclude some characters, so it would be
not a good idea to put all existing characters in the world in the brackets
excluding the ones I don't want.
I know i could do [a-z] and I can do [^p] if I don't want p but how can I
put these 2 in one character class?
 
B

Brian Davis

Try something like this:

[^\W_0-9A-Zp]


Brian Davis
http://www.knowdotnet.com



cody said:
I'm not sure what you mean by that. If you mean exclude letters from
being matched, you simply don't include them between the brackets.

Thank you but the example with xyz was just an example, I want to make _all_
characters in a text uppercase and exclude some characters, so it would be
not a good idea to put all existing characters in the world in the brackets
excluding the ones I don't want.
I know i could do [a-z] and I can do [^p] if I don't want p but how can I
put these 2 in one character class?

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
 
C

cody

And how can I exclude letters?
Try something like this:

[^\W_0-9A-Zp]

No. This expression excludes everything that is in the brackets. I want to
exclude only a few characters.
 
B

Brian Davis

You said you wanted to match [a-z] and [^p] in a single character class,
which means "any lower case letter except the letter 'p'".

[^\W_0-9A-Zp] does in fact match "any lower case letter except the letter
'p'". Just try it and see. If you want to match every lower case letter
except the letter "a", then use [^\W_0-9A-Za]. If you want to match every
lower case letter except the letters "a" and "p", then use
[^\W_0-9A-Zap]...etc.

The \W means "non-word character", so [^\W] = \w = [a-zA-Z0-9_]. Because
you do not need to convert numbers, underscores, and capital letters to
upper case, they are included in the negated character class. After these,
you can add any letters that you don't want to match (like "p").


Brian Davis
http://www.knowdotnet.com




cody said:
And how can I exclude letters?
Try something like this:

[^\W_0-9A-Zp]

No. This expression excludes everything that is in the brackets. I want to
exclude only a few characters.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
 
C

cody

[^\W_0-9A-Zp] does in fact match "any lower case letter except the letter
'p'". Just try it and see. If you want to match every lower case letter
except the letter "a", then use [^\W_0-9A-Za]. If you want to match every
lower case letter except the letters "a" and "p", then use
[^\W_0-9A-Zap]...etc.


No. [^\W_0-9A-Zp] matches nothing. if simply says "Match all characters
except: word characters, underscores, characters in the range 0-9 and
characters in the range A-Z and p": I tried is and it matches nothing. How
should it? Everything in brackets is together one characterclass, negated by
the ^ character.
 
B

Brian Davis

Run this program:

Imports System.Text.RegularExpressions
Module Module1
Sub Main()
Console.WriteLine(ReplaceText("p.x + p.y + p.z"))
Console.ReadLine()
End Sub
Public Function ReplaceText(ByVal sourceText As String) As String
Dim re As New Regex("[^\W_0-9A-Zp]")
Return re.Replace(sourceText, AddressOf MatchEval)
End Function
Private Function MatchEval(ByVal m As Match) As String
Return m.ToString.ToUpper
End Function
End Module


The output, as expected, is:

p.X + p.Y + p.Z


\W (note the capital "W") is the opposite of \w (lower case "w"). \w means
"word characters", so \W means "NON-word characters". \W will match things
like spaces and punctuation marks, but not letters, numbers, or the
underscore character.

\w = [a-zA-Z0-9_]
\W = [^a-zA-Z0-9_]

This means that the expression matches any character that is:
(a) not a NON-word character (double negative means that it must be a word
character - this is equivalent to \w, or [a-zA-Z0-9_])
(b) not an underscore (combined with (a) means [a-zA-Z0-9])
(c) not a number (combined with (a) and (b) means [a-zA-Z])
(d) not a capital letter (combined with (a), (b), and (c) means [a-z])
(e) not a "p" (combined with (a), (b), (c), and (d) means [a-z] and also
[^p] )

This, by definition, is any lower case letter other than "p".


Hope this clears things up,

Brian Davis
http://www.knowdotnet.com



cody said:
[^\W_0-9A-Zp] does in fact match "any lower case letter except the letter
'p'". Just try it and see. If you want to match every lower case letter
except the letter "a", then use [^\W_0-9A-Za]. If you want to match every
lower case letter except the letters "a" and "p", then use
[^\W_0-9A-Zap]...etc.


No. [^\W_0-9A-Zp] matches nothing. if simply says "Match all characters
except: word characters, underscores, characters in the range 0-9 and
characters in the range A-Z and p": I tried is and it matches nothing. How
should it? Everything in brackets is together one characterclass, negated by
the ^ character.

--
cody

[Freeware, Games and Humor]
www.deutronium.de.vu || www.deutronium.tk
 
R

Rob Perkins

Brian Davis said:
The \W means "non-word character", so [^\W] = \w = [a-zA-Z0-9_]. Because
you do not need to convert numbers, underscores, and capital letters to
upper case, they are included in the negated character class. After these,
you can add any letters that you don't want to match (like "p").

Brian, thanks very much for that explanation. Things are beginning to
be much clearer to me, now.

Rob
 
C

cody

\W (note the capital "W") is the opposite of \w (lower case "w"). \w
means
"word characters", so \W means "NON-word characters". \W will match things
like spaces and punctuation marks, but not letters, numbers, or the
underscore character.

\w = [a-zA-Z0-9_]
\W = [^a-zA-Z0-9_]

This means that the expression matches any character that is:
(a) not a NON-word character (double negative means that it must be a word
character - this is equivalent to \w, or [a-zA-Z0-9_])
(b) not an underscore (combined with (a) means [a-zA-Z0-9])
(c) not a number (combined with (a) and (b) means [a-zA-Z])
(d) not a capital letter (combined with (a), (b), and (c) means [a-z])
(e) not a "p" (combined with (a), (b), (c), and (d) means [a-z] and also
[^p] )

This, by definition, is any lower case letter other than "p".


Now I understand! Sorry, I should have read your posting more carefully.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top