Regex question (easy?)

S

sb

Hello,
I have a text file which contains plain text with the normal
carriage-return/linefeed line terminators. With that file I want to find
any occurence of "%R" (case-sensitive) on any line that does _not_ start
with "#"....so that I can replace it with something else later using
Regex.Replace().

An example file would look something like this (the real file is
~100kb..several hundred lines):

--- start of file ---

# description line...if there are any %R's on this line, ignore them\r\n
hey %Rthere\r\n
how's it %Rgoin?\r\n
%Rfine, you?\r\n

--- end of file ---

In the above, the regular expression should match all occurences of %R
except the one on the line that starts with "#". Can someone tell me what
the right regex search string would be? I'm sure this is very easy to do
but I'm still a newbie to regex.

Thanks in advance!
sb
 
J

Jon Skeet [C# MVP]

sb said:
I have a text file which contains plain text with the normal
carriage-return/linefeed line terminators. With that file I want to find
any occurence of "%R" (case-sensitive) on any line that does _not_ start
with "#"....so that I can replace it with something else later using
Regex.Replace().

An example file would look something like this (the real file is
~100kb..several hundred lines):

--- start of file ---

# description line...if there are any %R's on this line, ignore them\r\n
hey %Rthere\r\n
how's it %Rgoin?\r\n
%Rfine, you?\r\n

--- end of file ---

In the above, the regular expression should match all occurences of %R
except the one on the line that starts with "#". Can someone tell me what
the right regex search string would be? I'm sure this is very easy to do
but I'm still a newbie to regex.

I wouldn't use regex at all. The regular expression is going to be
harder to understand than straight calls to String methods. Assuming
you've already read a line (with StreamReader.ReadLine) you can just
do:

if (!line.StartsWith ("#") && line.Contains ("%R"))
 
S

sb

Thanks for the response Jon. I coded it your way before I posted...which
works fine of course. However, I think I over-simplified my original post
:)

I'm essentially building an rtf file from a text file (generated by another
app...not mine) that contains a lot of proprietary tags like %R, %Y,
etc...which represent color tags. To build a proper rtf file color table, I
need to ensure that a color is actually used within the file. So in short,
I need to perform the replacements for each color tag and also know that at
least one replacement was actually made before I add that tag's color to the
color table. I realize I that I can accomplish this with simple string
functions...ie Replace(), IndexOf(), Contains(). However, I figured that
using a regex may be quicker (and more readable) in that I can do the
parsing once during the creation of a MatchCollection and then just perform
the replacements using the Groups within that Match Collection.

I admit that maybe I'm optimizing too early here :)

-sb
 
J

Jon Skeet [C# MVP]

sb said:
Thanks for the response Jon. I coded it your way before I posted...which
works fine of course. However, I think I over-simplified my original post
:)

I'm essentially building an rtf file from a text file (generated by another
app...not mine) that contains a lot of proprietary tags like %R, %Y,
etc...which represent color tags. To build a proper rtf file color table, I
need to ensure that a color is actually used within the file. So in short,
I need to perform the replacements for each color tag and also know that at
least one replacement was actually made before I add that tag's color to the
color table. I realize I that I can accomplish this with simple string
functions...ie Replace(), IndexOf(), Contains(). However, I figured that
using a regex may be quicker (and more readable) in that I can do the
parsing once during the creation of a MatchCollection and then just perform
the replacements using the Groups within that Match Collection.

I admit that maybe I'm optimizing too early here :)

It *may* be faster to use a regular expression - but I wouldn't worry
about that until you've got a working implementation to start with. It
would possibly be more readable to use a regular expression if the
reader is very familiar with regular expressions - but *much* harder
for people who don't use regular expressions very often.

I suggest you write the code in the most obvious way, using Replace,
IndexOf etc, get all your unit tests in place (preferrably doing that
before implementing the code, in fact) and then you can safely change
to using regular expressions if you feel there'll be a benefit.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top