Ignoring spaces in regular expression matching

Mark Rae · May 21, 2006

Hi,

I'm trying to construct a RegEx pattern which will validate a string so that
it can contain:

only the numerical characters from 0 to 9 i.e. no decimal points, negative
signs, exponentials etc
only the 26 letters of the standard Western alphabet in either upper or
lower case
spaces i.e. ASCII character 32

I seem to be doing OK with the first two criteria, but am having trouble
with the space character.

E.g. the following works perfectly:

Regex.IsMatch("ThisIsThe2ndString", @"[^0-9][^a-z][^A-Z]")

However, this doesn't work:

Regex.IsMatch("This Is The 2nd String", @"[^0-9][^a-z][^A-Z]")

I've tried various combinations of [\s] and [^\s] but with little success.

However, the following works, though I don't really understand why:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Any assistance gratefully received.

Mark

Paul E Collins · May 21, 2006

Mark said:
I'm trying to construct a RegEx pattern which will
validate a string so that it can contain [only digits.
letters and spaces]

I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group],
and there can be * zero or more of them (you'd use + if you want at
least one character in there). Be aware that "\s" would match some
things that aren't spaces (like tabs and newlines).

Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.

Finally, I'm not convinced that regexes are ideal in .NET for this
kind of trivial check (as opposed to something complicated like nested
expressions and optional segments), because they're a special library
call and not a native operator as in Perl, which I suspect you might
have come from. I expect a loop like this would be more efficient:

bool valid = true;
for (int i = 0; i < s.Length; i++)
{
if (!((s >= 'A' && s <= 'Z') || (s >= 'a' && s <= 'z')
|| (s >= '0' && s <= '9') || s == ' '))
{
valid = false; break;
}
}

Eq.

Tasos Vogiatzoglou · May 21, 2006

string[] strs = new string[] { "ABC123", "ABC1.1", "ABC 123", "ABC 123
.." };

string srx = @"[^\.]+|[\w\s\d]+";
Regex rx = new Regex(srx,RegexOptions.ECMAScript);

foreach (string str in strs)
{
Console.WriteLine("{0} {1}", str,
rx.Match(str).Length==str.Length);
}

This works (if I understood correctly your problem). IsMatch returns
true for any match in the string so I don't think this is the one you
want.

Regards,
Tasos

Kevin Spencer · May 21, 2006

You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

Mark Rae · May 21, 2006

You can use a literal space in your character set:

(?i)[^a-z 0-9]

The "(?i)" indicates case-insensitivity. Note the literal space between
"a-z" and "0-9". This excludes the space character as well.

The "\s" indicates *any* white-space character, including such things as
tabs. If that is what you want, use:

(?i)[^a-z\s0-9]

Excellent! Thanks very much.

Mark Rae · May 21, 2006

I think you want something like this:
^[a-zA-Z0-9 ]*$
i.e. every character between ^ start and $ end must be in the [group], and
there can be * zero or more of them (you'd use + if you want at least one
character in there).

Doesn't work...

Of course, if you're having special trouble with spaces, you could do
s.Replace(" ", "") first to get rid of them in your validator.

I could do that, or even not do any validation at all...

Finally, I'm not convinced that regexes are ideal in .NET for this kind of
trivial check (as opposed to something complicated like nested expressions
and optional segments), because they're a special library call and not a
native operator as in Perl, which I suspect you might have come from.

I've never written a line of Perl in my life...

I expect a loop like this would be more efficient:

I wouldn't know...

Mark Rae · May 21, 2006

This works (if I understood correctly your problem).

It doesn't.

IsMatch returns true for any match in the string so I don't think this is
the one you
want.

There you go, then... :-)

Jon Skeet [C# MVP] · May 21, 2006

Mark Rae said:
It doesn't.

When a proposed solution doesn't work, could you explain in what way?
It makes life a lot easier for people who want to make further
suggestions.

Mark Rae · May 21, 2006

When a proposed solution doesn't work, could you explain in what way?

I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...

I simply don't know enough about regular expressions to make a valuable
response - I don't mind confessing that it remains one area of coding which
I find very difficult to get my head around, to the extent where I still
find it difficult to look at even the simplest of patterns and understand
instinctively what it's trying to do...

It makes life a lot easier for people who want to make further
suggestions.

I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.

Jon Skeet [C# MVP] · May 21, 2006

Mark Rae said:
I'm afraid I can't in this case, other than to say it always seems to find a
match no matter what string I pass into it...

That's enough - just an example of something which should fail but
passes would be good.

I simply don't know enough about regular expressions to make a valuable
response

A sample which doesn't do what you want to is the most valuable
response you can make in this case

I couldn't agree more! However, in this case, Kevin Spencer has solved my
problem completely.

Right. I'd still be interested in an example which should fail but
passes, so I can try to beef up my own regex experience.

Mark Rae · May 21, 2006

That's enough - just an example of something which should fail but
passes would be good.

A sample which doesn't do what you want to is the most valuable
response you can make in this case

See the reply I'm referring to:

IsMatch returns true for any match in the string so I don't think this is
the
one you want.

That's correct - no matter what string I pass into it, it always returns
true...

Kevin Spencer · May 22, 2006

Hi Mark,

I may be able to help you there. It helps to understand how the Regular
Expressions Engine works. First, it evaluates a character at a time, and it
is procedural in nature. A regular expression is like a series of
instructions, rather than a real single pattern. In your case:

Regex.IsMatch("This is the 2nd string", @"[^0-9][^a-z][^A-Z]",
RegexOptions.IgnoreCase)

Basically, this is using character classes. A character class is a series of
tokens inside square brackets, and it can be translated as "this type of
character or this type of character or this type of character..." In other
words, multiple character types or literals are joined with an implicit "or"
operator:

[\dA!] literally means "any single digit or an 'A' or an '!' character".
Note that it also implies a singular value, that is, one character.
Quantifiers are used to indicate that anything in the character class are
repeated 0, 1 or more times, as in:

[\dA!] (any of these characters 1 time)
[\dA!]* (any of these characters 0 or more times)
[\dA!]+ (any of these characters 1 or more times)
etc.

The '^' is the logical "Not" operator, which means "Not any of these
characters."

So, you had at first "[^0-9]" (Not a digit between 0 and 9)
followed by "[^a-z]" (Not a character between a and z)
and followed by "[^A-Z]" (Not a character between A and Z)

Now, remember that it's looking for a match. A match satisfies *all* of the
criteria you specify, so you can think of this and joining all of these
character classes with "AND" as in:

"Not a digit between 0 and 9 AND not a character between a and z AND not a
character between A and Z."

Note that the space character is not any of those, so it's a match. Using
negation is tricky. In fact, *any* character that was NOT in any of those 3
character sets would be a match.

The character class is used to apply the same rules to a set of characters.
The only time you need to separate them into groups is when the rules
(specifically logical Not or quantifiers) do not apply the same to all of
the characters.

Also, as a regular expression is basically procedural (although it does
employ backtracking), you should be careful about the order of the matches.
The following 2 sets are NOT the same:

[\dA!][0X]
[0X][\dA!]

In the first case, "0X3A" would *not* match. In the second case it would.
This is because the string and the pattern are evaluated in sequence. One
term for this is "consumption" - a regular expression "consumes" a string as
it evaluates it.

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

Mark Rae · May 22, 2006

I may be able to help you there.

Very interesting - thanks.

I still find it really hard to get my head round it, though...

Jon Skeet [C# MVP] · May 22, 2006

Mark Rae said:
That's correct - no matter what string I pass into it, it always returns
true...

Well, I've only tried the version that Paul Collins gave (which you
replied to with the same "doesn't work" answer), and that seems to
work:

using System;
using System.Text.RegularExpressions;

class Test
{
static void Main()
{
Regex r = new Regex("^[a-zA-Z0-9 ]*$");
Console.WriteLine (r.IsMatch ("Hello"));
Console.WriteLine (r.IsMatch ("Hello there"));
Console.WriteLine (r.IsMatch ("Hell#o"));
}
}

Produces:
True
True
False

This is why it's important to give a specific example of something that
fails - preferrably with a short but complete program which
demonstrates what you've been trying it with.

Kevin Spencer · May 22, 2006

Hi Mark,

You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp

--
HTH,

Kevin Spencer
Microsoft MVP
Professional Numbskull

The man who questions opinions is wise.
The man who quarrels with facts is a fool.

Mark Rae · May 22, 2006

You may find the following article informative:

http://www.codeproject.com/csharp/regex.asp

I love it - it's almost "RegEx for Dummies"... :-)

Just what I need!

Regular expression	4	Feb 21, 2011
Regular expression	4	Jan 16, 2013
Regular Expression Hangs	5	May 18, 2007
Regular expression for validating [GrandTotal]=4*[TotalCharges]+[currentCharges]+2	10	Nov 6, 2007
regex Help	3	Oct 7, 2007
Regular Expressions	4	Aug 15, 2005
Obfuscate Email	1	Nov 12, 2011
simple regex?	1	Feb 21, 2007

Ignoring spaces in regular expression matching

Mark Rae

Paul E Collins

Tasos Vogiatzoglou

Kevin Spencer

Mark Rae

Mark Rae

Mark Rae

Jon Skeet [C# MVP]

Mark Rae

Jon Skeet [C# MVP]

Mark Rae

Kevin Spencer

Mark Rae

Jon Skeet [C# MVP]

Kevin Spencer

Mark Rae

Ask a Question

Similar Threads