more regex question how to avoid capturing leading empty lines

G

GS

How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]


abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]



"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(? said:
\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735


abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account 236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count > 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",", "
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";


//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}



thank you for your time and expertise
 
K

Kevin Spencer

If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

GS said:
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]


abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]



"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s\[(?<AcctNbr>\d*)\].{4,8}amo
unt:\s\$(? said:
\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\w*)*)\s\[(?<FromAcctNbr>\d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735


abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count > 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";


//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}



thank you for your time and expertise
 
G

GS

thank you . I tried
but I still get the extra empty or blank line

^(?<AcctName>^\w*,{0,1}(\s\w*('s){0,1},{0,1})*)\s(?:\[)(?<AcctNbr>\d*)\].{4,
8}^\s*(?:amount):\s\$(?<Amt>\b[0-9][0-9,]*\.\d\d)\s*when:\s*(?<Dt2Pay>[ADFJM
NOS][aceopu][bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\
:\s*(?<RefNbr>\d*)\s*.{2,4}^\s*(?:from\:\s)(?<FromAcctName>\w{1,}(\s\w*)*)\s
\[(?<FromAcctNbr>\d*)\]

Right now I kluge by allowing user the option of removing all empty and
blank lines. when user check the Remove Blank Line check box, the
application will perform one more match result to remove any blank/empty
lines. It is klugy and crude and works

Kevin Spencer said:
If you use the caret (^) character with RegexOptions.MultiLine, it will
match at the beginning of a line. You can use that in your individual
matches to specify the start of a line before the match.

--
HTH,

Kevin Spencer
Microsoft MVP

DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net

GS said:
How can one avoid capturing leading empty or blank lines?

the data I deal with look like this

"will be paid on the dates you specified.

xyz supplier [123445797891]
amount: $100.52 when: September 07, 2007 reference #: 0415
from: operating account [236424735]


abc, Jane'S CHOICE [0089456881545]
amount: $487.61 when: September 08, 2007 reference #: 0416
from: finess [0236454514]



"
regexoptions are:
multi-line,explict capture, ignorecase, dotall, ignore pattern white space

regex expression used for capturing
bcglnprtvy][ya-v]{0,9}\s\d{1,2},\s\d\d\d\d\b)\s*reference\s*\#\:\s*(?<RefNbr
\d*)\s*.{2,4}\s*from:\s(?<FromAcctName>\w{1,}(\s\w*)*)\s\[(?<FromAcctNbr>\ d
*)\]

the exrpession used in Result(strGrps)
${AcctName} ${Amt} ${Dt2Pay} ${RefNbr} PCF ${FromAcctName} ${FromAcctNbr}
Result is
"
xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735


abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"
However desired result are lines with columns tab delimited and without
extra leading lines:
"xyz supplier 100.52 September 07, 2007 0415 PCF operating account
236424735
abc, PRESIDENT'S CHOICE 487.61 September 08, 2007 0416 PCF finess
0236454514"

what do I have to adjust in the regex expresiion?

or Do I have to change the codes used?:

// compile
string strRegex = textBoxRegex.Text;
bool bCompiled = false;
bool bCompiled = false;

try
{

RegexOptions regexOptn = RegexOptions.Singleline
|RegexOptions.Multiline | RegexOptions.IgnoreCase |
RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace
myRegex = new Regex(strRegex, regexOptn); // try compile
with options
bCompiled = true;
bMatched = false;
setStatusText("Regex Compiled.");
}
catch (Exception ex)
{
setMsg("Error in regex compilation or combination of regex
options. " + ex.Message);

}

// match

MatchCollection myMatch = null;
if (bCompiled ) {
myMatch = myRegex.Matches(textBoxInput.Text);
}
// capturing result
if (myMatch.Count > 0) {
string strMatchGrpVarName = textBoxGroupName.Text.Replace(",",
"
");
int i = 0;
bool bSuccess = false;

if (myMatch.Count <= 0 ) { setStatusText("No match Found");
return bSuccess; }
string mybuf = "";


//int iCapBeg = myMatch.Captures.
foreach (Match match in myMatch)
{
i++;
if (i == 1) {
mybuf = match.Result(strMatchGrpVarName);
if (bSingle) break;
} else {
string strResult = "";
mybuf += csCrLf + match.Result(strMatchGrpVarName);
}
match.NextMatch();
if (bSingle) break;
}
MessageBox.Show("count=" + strMatchGrpName.Length + csCrLf +
mybuf);
}



thank you for your time and expertise
 
Top