Regex first match

A

Al

Hi,

I have the following code :

string sData=@"{\*\bkmkstart adresse1} FORMTEXT }{\rtlch \af0 \ltrch
\f1\fs22\insrsid16733897\charrsid10567516 {\*\datafield
000000000000000008616472657373653100000000000000000000000000}{\*\formfield{\fftype0\fftypetxt0{\*\ffname
adresse1}}}}}{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597 {\*\bkmkend adresse1}";

string sReg = @"\{\\fldrslt \{\\rtlch.*sid[0-9]+ ";

Regex reg = new Regex(sReg,
RegexOptions.IgnoreCase|RegexOptions.Singleline);

MatchCollection mc = reg.Matches(sData);

string sResult = mc[0].Value;


sResult displays the following :
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597

In other words, it returns the !!! last !!! 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.


But what I want is the following result:
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516

In other words, I want regex to stop at the first 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.

Can anyone give me the solution ?
Thanks
Al
 
J

Jon Shemitz

Al said:
string sData=@"{\*\bkmkstart adresse1} FORMTEXT }{\rtlch \af0 \ltrch
\f1\fs22\insrsid16733897\charrsid10567516 {\*\datafield
000000000000000008616472657373653100000000000000000000000000}{\*\formfield{\fftype0\fftypetxt0{\*\ffname
adresse1}}}}}{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597 {\*\bkmkend adresse1}";

string sReg = @"\{\\fldrslt \{\\rtlch.*sid[0-9]+ ";

Regex reg = new Regex(sReg,
RegexOptions.IgnoreCase|RegexOptions.Singleline);

MatchCollection mc = reg.Matches(sData);

string sResult = mc[0].Value;

sResult displays the following :
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597

In other words, it returns the !!! last !!! 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.

But what I want is the following result:
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516

But the sData doesn't contain that string! Your regex is indeed
finding the ONLY \fldrslt in your sample data.
 
J

Jesse Houwing

* Al wrote, On 30-11-2006 16:33:
Hi,

I have the following code :

string sData=@"{\*\bkmkstart adresse1} FORMTEXT }{\rtlch \af0 \ltrch
\f1\fs22\insrsid16733897\charrsid10567516 {\*\datafield
000000000000000008616472657373653100000000000000000000000000}{\*\formfield{\fftype0\fftypetxt0{\*\ffname
adresse1}}}}}{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597 {\*\bkmkend adresse1}";

string sReg = @"\{\\fldrslt \{\\rtlch.*sid[0-9]+ ";

Regex reg = new Regex(sReg,
RegexOptions.IgnoreCase|RegexOptions.Singleline);

MatchCollection mc = reg.Matches(sData);

string sResult = mc[0].Value;


sResult displays the following :
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597

In other words, it returns the !!! last !!! 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.


But what I want is the following result:
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516

In other words, I want regex to stop at the first 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.

Can anyone give me the solution ?
Thanks
Al


You're bumping into the fact that regex is greedy by default. This is
due to performance reasons originally and due to the specs that say that
a regex parser should look for the first and longest available answer.
Though it might settle for a shorter one, this is never guaranteed.

To solve your issue you should make your expression reluctant instead of
greedy (.* becomes .*?):

@"\{\\fldrslt \{\\rtlch.*?sid[0-9]+ ";
^^^

which will cause it to match the smallest possible string instead.

Jesse Houwing
 
A

Al

Thanks Jesse,
this works perfectly.
Al
Jesse Houwing a écrit :
* Al wrote, On 30-11-2006 16:33:
Hi,

I have the following code :

string sData=@"{\*\bkmkstart adresse1} FORMTEXT }{\rtlch \af0 \ltrch
\f1\fs22\insrsid16733897\charrsid10567516 {\*\datafield
000000000000000008616472657373653100000000000000000000000000}{\*\formfield{\fftype0\fftypetxt0{\*\ffname
adresse1}}}}}{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597 {\*\bkmkend adresse1}";

string sReg = @"\{\\fldrslt \{\\rtlch.*sid[0-9]+ ";

Regex reg = new Regex(sReg,
RegexOptions.IgnoreCase|RegexOptions.Singleline);

MatchCollection mc = reg.Matches(sData);

string sResult = mc[0].Value;


sResult displays the following :
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516
\u8194\'20\u8194\'20\u8194\'20\u8194\'20\u8194\'20}}}{\rtlch \af0
\ltrch \f1\fs22\insrsid14636597

In other words, it returns the !!! last !!! 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.


But what I want is the following result:
{\fldrslt {\rtlch \af0 \ltrch
\f1\fs22\lang1024\langfe1024\noproof\insrsid10567516

In other words, I want regex to stop at the first 'sid' contained in a
'{\rtlch' itself contained in a '{\fldrslt', followed by a certain
number of numbers and followed by a space.

Can anyone give me the solution ?
Thanks
Al


You're bumping into the fact that regex is greedy by default. This is
due to performance reasons originally and due to the specs that say that
a regex parser should look for the first and longest available answer.
Though it might settle for a shorter one, this is never guaranteed.

To solve your issue you should make your expression reluctant instead of
greedy (.* becomes .*?):

@"\{\\fldrslt \{\\rtlch.*?sid[0-9]+ ";
^^^

which will cause it to match the smallest possible string instead.

Jesse Houwing
 
J

Jon Shemitz

Jon said:
But the sData doesn't contain that string! Your regex is indeed
finding the ONLY \fldrslt in your sample data.

Ooops - misread the question. (Hadn't had my coffee yet.)
 
Top