Regular Expression

G

Guest

Dear all,

I want to read a file data block by block using regular expression. The
file contents is like

MWH ........
.................
.....................
MWH .................
...........................
...........................
.....................
MWH ....

Each block starts with 'MWH' and has different length of contents. How can
I compose the regular express pattern so I can get all the matches?

Thanks for any help.

Tedmond
 
B

Brian Gideon

Tedmond,

What about using negative lookahead assertions.

MWH.*(?!MWH)

I didn't actually test it, but that's the first thing that came to
mind.

Brian
 
K

Kevin Spencer

Here you go:

(?s)(MWH)(?:.(?!\1))*

Working with negations of literal sequences of characters is tricky (as
you've apparently observed!) In fact, it did take me awhile to work this one
out, so I added it to my library (it will work with any sequence of
characters).

Here's the low-down:

The first expression indicates that the dot ('.') matches new lines. After
that there is a single Group, Group 1, which contains the sequence 'MWH'.

This is followed by a non-matching group which begins with the dot ('.')
character (any character), and a negative look-ahead, which indicates that
the character may *not* be followed by a match of Group 1. The non-matching
Group has a quantifier at the end which states that it may be repeated 0 or
more times. That is, any character *not* followed by 'MWH' is a match. The
quantifier is for the Group, not for the character, so, in effect, it limits
the character to any character *not* followed by 'MWH' as many times as
possible.

In this case, the 'MWH' does not have to be at the beginning of a line. That
is, if that sequence is inside a block, and must not be recognized unless it
is at the beginning of a line, you would use:

(?s)(?m)(^MWH)(?:.(?!\1))*

This has an additional directive at the beginning ("(?m)" - indicating that
the caret (^) and dollar sign ($) match either at the beginning/ending of
the string, or at the beginning/ending of a line. The caret precedes the
'MWH' in the first Group, indicating that it must be at the beginning of a
line to form a match.

--
HTH,

Kevin Spencer
Microsoft MVP
Software Composer
http://unclechutney.blogspot.com

A watched clock never boils.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top