RegExp.Replace

  • Thread starter Thread starter David P. Donahue
  • Start date Start date
D

David P. Donahue

I'm using RegExp.Replace(string, string, string) to remove some pieces
of large strings. But I seem to be having trouble getting it to match
"all characters up to and including xxxxx" (which I thought would be
".*xxxxx" but it's not working). I can get it to match more precise
segments no problem, but not "all characters up to and including xxxxx"
or "xxxxx and all characters after it." Any ideas?


Regards,
David P. Donahue
(e-mail address removed)
http://www.cyber0ne.com
 
David P. Donahue said:
I'm using RegExp.Replace(string, string, string) to remove some pieces
of large strings. But I seem to be having trouble getting it to match
"all characters up to and including xxxxx" (which I thought would be
".*xxxxx" but it's not working). I can get it to match more precise
segments no problem, but not "all characters up to and including xxxxx"
or "xxxxx and all characters after it." Any ideas?

You probably need the Singleline option, to force . to match \n.
 
It was worth a try, but whenever I specify Singleline mode in the
options ("Replace(string, string, string, options)") the thread always
times out. The web server just returns a "thread was being aborted"
error after a bit of time passes.

I figured maybe I can use String.Replace to strip out all the carriage
returns, since I don't care if the string loaded with HTML code is all
one line or not, but that doesn't seem to be working either. If I use
String.Replace(Char.ToString((Char)13), "") it doesn't replace, but it
does match because if I use a non-empty string it adds it at each
carriage return.

Any ideas?


Regards,
David P. Donahue
(e-mail address removed)
http://www.cyber0ne.com
 
David P. Donahue said:
It was worth a try, but whenever I specify Singleline mode in the
options ("Replace(string, string, string, options)") the thread always
times out. The web server just returns a "thread was being aborted"
error after a bit of time passes.

What, exactly, are you using as a Regex pattern?
 
You'll probably need to be more specific about the problem you are
having, the structure/size of the text you are trying to replace, and
what you are trying to accomplish. What trouble exactly are you having?
For instance, are you trying to replace multiple instances of this
pattern?

The pattern you supplied will work just fine to replace a single
instance (with the Singleline option).
 
Basically, I read in a multi-line HTML file into a string (using
StreamReader). Once I have that string, I want to eliminate a lot of
header and footer stuff. In the HTML file itself, I have comments to
mark where I want this to happen. So, basically, I need to remove all
text (characters, special characters, whitespace, newlines, etc.) up to
the first comment and after the last comment. The comments themselves
are unique, so the first one and last one can be easily identified and
occur only once in the string.


Regards,
David P. Donahue
(e-mail address removed)
http://www.cyber0ne.com
 
David P. Donahue said:
".*<--------------------START HERE-------------------->"

As per Paul Walls' msg, that does work just fine with the Singleline
option.

A couple of thoughts:

1) It does NOT work with the IgnorePatternWhitespace option (which my
Regex Explorer sets by default) because the pattern contains white
space.

2) Do your HTMl files actually contain the HTML 'comment'
"<--------------------START HERE-------------------->"? Because that's
not really a comment. More to the point, if they actually contain
spaces between the last - and the START, or an extra space between
START and HERE, or a space between HERE and the -, then your pattern
won't work.

Try
 
.* said:
As per Paul Walls' msg, that does work just fine with the Singleline
option.

I can't imagine why it wouldn't, except that whenever I use the Singline
option it just hangs on that line of code until the web server times out.
1) It does NOT work with the IgnorePatternWhitespace option (which my
Regex Explorer sets by default) because the pattern contains white
space.

I'm not sure what a "Regex Explorer" is, but is this option always on by
default? If so, is there a way to turn it off when I specify options in
the function call?
2) Do your HTMl files actually contain the HTML 'comment'
"<--------------------START HERE-------------------->"? Because that's
not really a comment. More to the point, if they actually contain
spaces between the last - and the START, or an extra space between
START and HERE, or a space between HERE and the -, then your pattern
won't work.

Actually, I typed my reply a bit too fast that time, just to give an
idea of the pattern. It's actually:
"<!-- --------------- START HERE --------------- -->"
And that's copied and pasted directly from the HTML file into Visual
Studio, so it's exact.

Visual Studio is returning the error "Unrecognized escape sequence" when
I try to compile with that pattern. I also tried just replacing the
white spaces in my pattern with \s* but it gives the same error.


Regards,
David P. Donahue
(e-mail address removed)
http://www.cyber0ne.com
 
1) It does NOT work with the IgnorePatternWhitespace option (which my
I'm not sure what a "Regex Explorer" is, but is this option always on by
default? If so, is there a way to turn it off when I specify options in
the function call?

An app I wrote to exoeriment with regexes.
Visual Studio is returning the error "Unrecognized escape sequence" when
I try to compile with that pattern. I also tried just replacing the
white spaces in my pattern with \s* but it gives the same error.

Well, duh. You're mixing English quoting with C# quoting. Use

@".* < ! -+ \s* START \s+ HERE -+ >"
 
Back
Top