Capturing text between two words with Regex

  • Thread starter Thread starter ffreino
  • Start date Start date
F

ffreino

Hi,
I'm trying to capture text between two words in a multi-line string.
For example:
89. This is the text I would like to capture.
This is another line.
90. End of the example.

I would like to capture text between the words (numbers in this case)
'89' and '90'

Any ideas?
Thanks in advance.
 
did not test in code but try this:
string ResultString = null;
try {
ResultString = Regex.Match(SubjectString, "(?<=89)[a-zA-Z\\.
\\r\\n0-9]*(?=90)").Value;
} catch (ArgumentException ex) {
//error
}
you can replace 89 and 90 with whatever your two words are
if there are other special characters that show up in between you need to
place them in the [] brakets.
hope this get's you started at least...
 
The Regex to do that needs to specify a group to be captured and also allow
multiline captures.

Try this

Regex MyRegex = new
Regex(@"89(?<GroupToCapture>.*)90,RegexOptions.SingleLine);
//The parentheses define a group and the ?<> name it.
//The option selected allows ".*" to match across multiple lines

MyRegex.Match(Mytext).Groups["GroupToCapture"].Value


I am pretty sure that is correct, but I did it from memory, so my syntax may
be off a little in the line to retrieve the group. I am almost certain the
Regex is correct.

Ethan
 
Regex(@"89(?<GroupToCapture>.*)90,RegexOptions.SingleLine);

You have to be careful with .*, since it's "greedy". It will grab all the
rest of the characters in the target text, including any "90"s it sees.

You need to use .*? to make the search nongreedy:

89(?<GroupToCapture>.*?)90

///ark
 
Thanks a lot.
I wanted to capture questions from a text exam. The exam has this
format:
1. First question
a. Answer number one
b. Answer number two
2. Second question
a. ...
b. ...
And so on...
I didn't understand very well the meaning of (?<=...) but now I think
It's like an anchor, isn't it?
I also had problems with the greedy . as Mark Wilden pointed out.

Finally, this is the pattern to capture the questions (?<=\d+)(.+?)(?=
\d+) with the RegexOptions.SingleLine of the text exam.

Thanks.

did not test in code but try this:
string ResultString = null;
try {
ResultString = Regex.Match(SubjectString, "(?<=89)[a-zA-Z\\.
\\r\\n0-9]*(?=90)").Value;} catch (ArgumentException ex) {
//error
}

you can replace 89 and 90 with whatever your two words are
if there are other special characters that show up in between you need to
place them in the [] brakets.
hope this get's you started at least...


Hi,
I'm trying to capture text between two words in a multi-line string.
For example:
89. This is the text I would like to capture.
This is another line.
90. End of the example.
I would like to capture text between the words (numbers in this case)
'89' and '90'
Any ideas?
Thanks in advance.
 
There is a problem with the pattern (?<=\d+)(.+?)(?=\d+)
I can't capture question number because group(0) it's only the '(.+?)'
part. The first part of the pattern '(?<=\d+)' it is not capturing.
How can I capture, the question number?
Thanks a lot.


Thanks a lot.
I wanted to capture questions from a text exam. The exam has this
format:
1. First question
a. Answer number one
b. Answer number two
2. Second question
a. ...
b. ...
And so on...
I didn't understand very well the meaning of (?<=...) but now I think
It's like an anchor, isn't it?
I also had problems with the greedy . as Mark Wilden pointed out.

Finally, this is the pattern to capture the questions (?<=\d+)(.+?)(?=
\d+) with the RegexOptions.SingleLine of the text exam.

Thanks.

did not test in code but try this:
string ResultString = null;
try {
ResultString = Regex.Match(SubjectString, "(?<=89)[a-zA-Z\\.
\\r\\n0-9]*(?=90)").Value;} catch (ArgumentException ex) {
//error
}
you can replace 89 and 90 with whatever your two words are
if there are other special characters that show up in between you need to
place them in the [] brakets.
hope this get's you started at least...
 
Ok.
I have tried this pattern (\d+)\.\s+(.+?)\s+(?=\d+\.) with
RegexOptions.Singleline and it works perfectly.
Thanks a lot


There is a problem with the pattern (?<=\d+)(.+?)(?=\d+)
I can't capture question number because group(0) it's only the '(.+?)'
part. The first part of the pattern '(?<=\d+)' it is not capturing.
How can I capture, the question number?
Thanks a lot.

Thanks a lot.
I wanted to capture questions from a text exam. The exam has this
format:
1. First question
a. Answer number one
b. Answer number two
2. Second question
a. ...
b. ...
And so on...
I didn't understand very well the meaning of (?<=...) but now I think
It's like an anchor, isn't it?
I also had problems with the greedy . as Mark Wilden pointed out.
Finally, this is the pattern to capture the questions (?<=\d+)(.+?)(?=
\d+) with the RegexOptions.SingleLine of the text exam.

did not test in code but try this:
string ResultString = null;
try {
ResultString = Regex.Match(SubjectString, "(?<=89)[a-zA-Z\\.
\\r\\n0-9]*(?=90)").Value;} catch (ArgumentException ex) {
//error
}
you can replace 89 and 90 with whatever your two words are
if there are other special characters that show up in between you need to
place them in the [] brakets.
hope this get's you started at least...

Hi,
I'm trying to capture text between two words in a multi-line string.
For example:
89. This is the text I would like to capture.
This is another line.
90. End of the example.
I would like to capture text between the words (numbers in this case)
'89' and '90'
Any ideas?
Thanks in advance.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top