Regular Expression Help

N

NvrBst

I want to match sections in a multilined string. For example

-----TEXT File-----
SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
-----EOF-----

AKA: I want a pattern which will match "^SECTION$" then anything but
"^ENDSECTION$" and then "^ENDSECTION$". I have the following: Note:
My "$" are actually "\r?$", but I kept them as "$" so they are easier
to read.

--Attempt One---
@"^SECTION$.*^ENDSECTION$", RegexOptions.Multiline |
RegexOptions.Singleline
--Problem--
It matches the first "^SECTION$" and the very last "^ENDSECTION$". I
want it to match the next "^ENDSECTION$" not the last.

--Attempt Two---
@"^SECTION$[^(ENDSECTION)]*^ENDSECTION$", RegexOptions.Multiline
--Problem--
The [^(ENDSECTION)] is matching like [^ENDSECTION]; the parenthesis
are not doing anything. "^" and "$" also have to go around it.


How would I match any character/word/etc except the pattern
"^ENDSECTION$", or maybe the word "\nENDSECTION\n" if it is easier?
Thanks.
 
N

NvrBst

I found out a solution:

@"^SECTION$.*?^ENDSECTION$", RegexOptions.Multiline |
RegexOptions.Singleline

Using ".*?" instead of ".*" makes it match as few as possible. I
could also probably do something like (^ENDSECTION$){1} at the end
instead. Thanks. If someone has a better way to do it, more
efficient, or other comments, feel free.
 
A

Arne Vajhøj

NvrBst said:
I want to match sections in a multilined string. For example

-----TEXT File-----
SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
-----EOF-----

AKA: I want a pattern which will match "^SECTION$" then anything but
"^ENDSECTION$" and then "^ENDSECTION$". I have the following: Note:
My "$" are actually "\r?$", but I kept them as "$" so they are easier
to read.

See example below.

Arne

=============================

using System;
using System.Text.RegularExpressions;

namespace E
{
public class Program
{
private static readonly Regex re = new
Regex(@"(?:SECTION\r\n)(.*?)(?:ENDSECTION\r\n)",

RegexOptions.Singleline | RegexOptions.Compiled);
public static void Parse(string s)
{
foreach(Match m in re.Matches(s))
{
Console.WriteLine("section=" + m.Groups[1].Value);
}
}
public static void Main(string[] args)
{
string s = @"SECTION
hello this is my section
it can use SECTION or ENDSECTION if I want it to
and can be multi lined
ENDSECTION
SECTION
Another Section, can be any amount of sections in a file
ENDSECTION
";
Parse(s);
Console.ReadKey();
}
}
}
 
N

NvrBst

Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)
 
E

Eps

NvrBst said:
On Dec 29, 6:20 pm, Arne Vajhøj <[email protected]> wrote:

Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)

If you have any control over the data format at all you may want to
consider storing the data in a database. The overhead of sqlite is very
low considering the advantages it offers.
 
J

Jesse Houwing

Hello NvrBst,
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)


You can use negative look arounds for that:

(?:(?!word).)* would match anything but the word specified.
 
A

Arne Vajhøj

NvrBst said:
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

I think the reluctant qualifier is the way to do it.

Arne
 
A

Arne Vajhøj

Jesse said:
Hello NvrBst,
On Dec 29, 6:20 pm, Arne Vajhøj <[email protected]> wrote:
Yup, the ".*?" solution has been working great for the current problem
I had, thanks. For future reference though, is there an easy way to
match anything but a word? Something like [^ ] except for words/
patterns instead characters? I remember doing something like "~
(WORD)" in a regular expression in the past to match anything except
"WORD", however, forgot what I did it in; it doesn't seem to work for
the .NET Regex class.

The problem I had is already solved, however, not matching something
seems to be the way I think about problems initially, and haven't been
able to figure it out in regular expression form other than expanding;
IE, anything but "IN" could be "(([^I]N) | (I[^N]) | ([^IN]))*" type
thing, which would be tedious for larger words.

Also, to correct myself, "(^ENDSECTION$){1}" wouldn't work for the
above after thinking about it a little more ;)


You can use negative look arounds for that:

(?:(?!word).)* would match anything but the word specified.

I don't think that will work in this case.

Negative lookahead will not match if something is followed
by the word, but it is needed not to match if something is
the word.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top