Regex Question

AMP · Apr 21, 2008

Hello,
I am coming back to a project and I dont remember what the following
Regex says
I do know it removes all \r\n from the string, but I dont see how.
Can someone explain this one?

Regex re = new Regex(@"([\x00-\x1F\x7E-\xFF]+)",
RegexOptions.Compiled);
string op = re.Replace(FileToParse, "");

Thanks
Mike

Gilles Kohl [MVP] · Apr 21, 2008

Hello,
I am coming back to a project and I dont remember what the following
Regex says
I do know it removes all \r\n from the string, but I dont see how.
Can someone explain this one?

Regex re = new Regex(@"([\x00-\x1F\x7E-\xFF]+)",
RegexOptions.Compiled);
string op = re.Replace(FileToParse, "");

How it works? The outer parentheses are redundant IMHO. The regex
boils down to a positive character group with two ranges, the start
and end of which (respectively) being expressed as hexadecimal
escapes: \x00-\x1F (0 to 31 in decimal) and \x7E-\xFF (126 to 255 in
decimal). With the appended "+", it basically means "one or more
characters between 0-31 resp. 126-255".

Replacing all these occurences with nothing (empty string) does far
more than just remove \r and \n - it removes all characters in the
range 0-31 and 126-255. The intention is probably to kill anything
that is not in the "ASCII" range. Unfortunately, it also kills the
tilde "~" (126).

It will also remove e.g. accents and umlaut characters in the range
128-256. What it will NOT remove are Unicode characters from 256
upwards.

Try e.g.

string originalString = "Testing <\u00e7> <\u0107> ";

Regex re = new Regex(@"([\x00-\x1F\x7E-\xFF]+)",
RegexOptions.Compiled);
string replacedString = re.Replace(originalString, "");

MessageBox.Show(originalString);
MessageBox.Show(replacedString);

The first "special" character, a lowercase C with cedilla, will be
removed. The second one, a lowercase c with acute accent, will not be
affected.

(My suggestion, if your intention is to remove anything not in the
range 32-126, would be to use this:

Regex re = new Regex(@"[^\x20-\x7E]+", RegexOptions.Compiled);

instead.)

Regards,
Gilles.

Regex help needed	1	Apr 4, 2010
Regex - NewLine	3	Dec 11, 2008
Regex confusion	2	May 9, 2004
Regex Text parsing	2	Apr 28, 2006
Regex question	7	Oct 9, 2006
question on regex for splitting a csv file	2	Aug 11, 2004
Regex problem - any help greatfully accepted!	2	Mar 23, 2006
High speed string processing	3	May 24, 2004

Regex Question

AMP

Gilles Kohl [MVP]

Ask a Question

Similar Threads