P
Peter Duniho
So, I'm trying to learn how the Regex class works, and I've been trying to
use it to do what I think ought to be simple things. Except I can't
figure out how to do everything I want.
If I want to take a string and break it into individual lines based on a
specific pattern ("\r\n" in this case, but I don't think it matters), I
can easily write a loop that does this by scanning through the string
accumulating characters and spitting out a new string each time it hits
the "\r\n". But I figured Regex ought to be able to do the scanning for
me, so that all I have to loop through are the matches.
I've tried a wide variety of expression strings, but the ones that seem to
come closest to what I want are:
"(.+)\r\n" -- works great, except that if the string doesn't terminate
in a "\r\n", the last line isn't matched
"(.+)(\r\n)*" -- the idea being to allow the last line to match if no
"\r\n" is found. works great, except that the "\r" winds up getting
captured as well (presumably because the second capture group is just
ignored and everything up to the "\n" gets captured by the first capture
group because the default is to )
"(.+?)(\r\n)*" -- works great, except that it's _too_ lazy, and
happily matches just a single character at a time
(Note: I'm using a replacement string specifying the first capture group
so that I can toss out the "\r\n", but if there's a way to match the
"\r\n" without it winding up in the match itself while at the same time
preventing it from being included in the subsequent match attempt, that
would be wonderful).
I also tried using single-line mode, trying to work around the problem in
the second example, but when I do that, the expression happily and
greedily captures _everything_ up to the very last "\r\n".
What I'm looking for is the expression that represents "capture all text
up to the first \r\n pair, allowing for the possibility of one last match
without the \r\n pair at the end of the string".
Is this actually impossible using Regex, or is there some combination of
options that will allow me to match the first \r\n pair without requiring
a \r\n pair at the end of the last match?
Thanks,
Pete
use it to do what I think ought to be simple things. Except I can't
figure out how to do everything I want.
If I want to take a string and break it into individual lines based on a
specific pattern ("\r\n" in this case, but I don't think it matters), I
can easily write a loop that does this by scanning through the string
accumulating characters and spitting out a new string each time it hits
the "\r\n". But I figured Regex ought to be able to do the scanning for
me, so that all I have to loop through are the matches.
I've tried a wide variety of expression strings, but the ones that seem to
come closest to what I want are:
"(.+)\r\n" -- works great, except that if the string doesn't terminate
in a "\r\n", the last line isn't matched
"(.+)(\r\n)*" -- the idea being to allow the last line to match if no
"\r\n" is found. works great, except that the "\r" winds up getting
captured as well (presumably because the second capture group is just
ignored and everything up to the "\n" gets captured by the first capture
group because the default is to )
"(.+?)(\r\n)*" -- works great, except that it's _too_ lazy, and
happily matches just a single character at a time
(Note: I'm using a replacement string specifying the first capture group
so that I can toss out the "\r\n", but if there's a way to match the
"\r\n" without it winding up in the match itself while at the same time
preventing it from being included in the subsequent match attempt, that
would be wonderful).
I also tried using single-line mode, trying to work around the problem in
the second example, but when I do that, the expression happily and
greedily captures _everything_ up to the very last "\r\n".
What I'm looking for is the expression that represents "capture all text
up to the first \r\n pair, allowing for the possibility of one last match
without the \r\n pair at the end of the string".
Is this actually impossible using Regex, or is there some combination of
options that will allow me to match the first \r\n pair without requiring
a \r\n pair at the end of the last match?
Thanks,
Pete