Regular Expression problem

Barry · Jul 30, 2007

Hello

Regex regex = new
Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the record
below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown Twp,VA,12283,United
States,519-966-9735,501,,Y,0,4/27/2007 15:04,5/10/2007
12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help

TIA
Barry

Jesse Houwing · Jul 30, 2007

* Barry wrote, On 30-7-2007 12:22:

Hello

Regex regex = new
Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the record
below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown Twp,VA,12283,United
States,519-966-9735,501,,Y,0,4/27/2007 15:04,5/10/2007
12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help

[^\",\r\n]+ in your Field definition requires at least one character in
to match (+ means one or more). Change this to * (zero or more) and
things should start working.

Regex regex = new
Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

From the regex it looks like you're trying to read multiple lines with
one Regex.Match call. This could become very expensive.

You could also give this a try:
(?

?:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0
implementation of the regex parser.)

It will match a whole line in one match object. You can extract the
values with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
foreach (Capture c in m.Groups["field"].Captures)
{
string extracted = c.Value;
}
// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the
while(m.Success) in combination with m = m.NextMatch(); which I've
commented out in the code above to loop through all the results in the
input.

I'm not sure exactly what you're doing with the values you've captured,
but you might also want to have a look at the OleDB delimited text
driver which allows you to load the contents of a comma delimited text
file into a dataset, saving you the trouble. Or you could try and use
SQL Server Integration Services to load the data directly into a database.

Jesse

Barry · Jul 30, 2007

Hi

Thanks for you quick reply.

My project is to read a file of comma-seperated record and process them to
create an xml file.

I am keen to explore this OleDB delimited text thing, can you give me some
clue on what search text i should look for in MSDN.

Barry

Jesse Houwing said:
* Barry wrote, On 30-7-2007 12:22:

Hello

Regex regex = new

Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the
record below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown
Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007
15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help

Click to expand...

[^\",\r\n]+ in your Field definition requires at least one character in to
match (+ means one or more). Change this to * (zero or more) and things
should start working.

Regex regex = new
Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

From the regex it looks like you're trying to read multiple lines with one
Regex.Match call. This could become very expensive.

You could also give this a try:
(??:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0 implementation
of the regex parser.)

It will match a whole line in one match object. You can extract the values
with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
foreach (Capture c in m.Groups["field"].Captures)
{
string extracted = c.Value;
}
// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the
while(m.Success) in combination with m = m.NextMatch(); which I've
commented out in the code above to loop through all the results in the
input.

I'm not sure exactly what you're doing with the values you've captured,
but you might also want to have a look at the OleDB delimited text driver
which allows you to load the contents of a comma delimited text file into
a dataset, saving you the trouble. Or you could try and use SQL Server
Integration Services to load the data directly into a database.

Jesse

Jesse Houwing · Jul 30, 2007

Thanks for you quick reply.

You're welcome

My project is to read a file of comma-seperated record and process them to
create an xml file.

I am keen to explore this OleDB delimited text thing, can you give me some
clue on what search text i should look for in MSDN.

Hey Barry,

have a look at:
http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx

Jesse

Jesse Houwing said:
Jesse Houwing said:

* Barry wrote, On 30-7-2007 12:22:

Hello

Regex regex = new

Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the
record below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown
Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007
15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help

Click to expand...

[^\",\r\n]+ in your Field definition requires at least one character in to
match (+ means one or more). Change this to * (zero or more) and things
should start working.

Regex regex = new
Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

From the regex it looks like you're trying to read multiple lines with one
Regex.Match call. This could become very expensive.

You could also give this a try:
(??:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0 implementation
of the regex parser.)

It will match a whole line in one match object. You can extract the values
with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
foreach (Capture c in m.Groups["field"].Captures)
{
string extracted = c.Value;
}
// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the
while(m.Success) in combination with m = m.NextMatch(); which I've
commented out in the code above to loop through all the results in the
input.

I'm not sure exactly what you're doing with the values you've captured,
but you might also want to have a look at the OleDB delimited text driver
which allows you to load the contents of a comma delimited text file into
a dataset, saving you the trouble. Or you could try and use SQL Server
Integration Services to load the data directly into a database.

Jesse

Click to expand...

Jesse Houwing · Jul 31, 2007

* Barry wrote, On 31-7-2007 22:13:

Hi Jesse,

I had searched the internet yesterday after posting my previous message and
found some code snippet.

I must say that you provided me with an Excellent tip, all this time i have
been process record-by-record and each field, with whole lot of parsing
problems, all that has been solved with the tip you provided, i have even
rewritten my code to use OleDb Delimited text.

you deserve Big Thanks you

Thank you

You're very welcome.

Jesse

Jesse Houwing said:
Barry

Jesse Houwing said:

Thanks for you quick reply.

Click to expand...

You're welcome

My project is to read a file of comma-seperated record and process them
to create an xml file.

I am keen to explore this OleDB delimited text thing, can you give me
some clue on what search text i should look for in MSDN.

Click to expand...

Hey Barry,

have a look at:
http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx

Jesse

* Barry wrote, On 30-7-2007 12:22:
Hello

Regex regex = new

Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the
record below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown
Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007
15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help
[^\",\r\n]+ in your Field definition requires at least one character in
to match (+ means one or more). Change this to * (zero or more) and
things should start working.

Regex regex = new
Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

From the regex it looks like you're trying to read multiple lines with
one Regex.Match call. This could become very expensive.

You could also give this a try:
(??:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0
implementation of the regex parser.)

It will match a whole line in one match object. You can extract the
values with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
foreach (Capture c in m.Groups["field"].Captures)
{
string extracted = c.Value;
}
// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the
while(m.Success) in combination with m = m.NextMatch(); which I've
commented out in the code above to loop through all the results in the
input.

I'm not sure exactly what you're doing with the values you've captured,
but you might also want to have a look at the OleDB delimited text
driver which allows you to load the contents of a comma delimited text
file into a dataset, saving you the trouble. Or you could try and use
SQL Server Integration Services to load the data directly into a
database.

Jesse

Click to expand...

Click to expand...

Barry · Jul 31, 2007

Hi Jesse,

I had searched the internet yesterday after posting my previous message and
found some code snippet.

I must say that you provided me with an Excellent tip, all this time i have
been process record-by-record and each field, with whole lot of parsing
problems, all that has been solved with the tip you provided, i have even
rewritten my code to use OleDb Delimited text.

you deserve Big Thanks you
Barry

Jesse Houwing said:
Thanks for you quick reply.

Click to expand...

You're welcome

My project is to read a file of comma-seperated record and process them
to create an xml file.

I am keen to explore this OleDB delimited text thing, can you give me
some clue on what search text i should look for in MSDN.

Click to expand...

Hey Barry,

have a look at:
http://www.aurigma.com/Support/DocViewer/5/AddingDatafromTextFile.htm.aspx

Jesse

Jesse Houwing said:

* Barry wrote, On 30-7-2007 12:22:
Hello

Regex regex = new

Regex("((?<field>[^\",\r\n]+)|\"(?<field>([^\"]|\"\")+)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

The above regualr expression return 24 fields instead of 42 for the
record below, it ignores empty fields like ,,,"Hello World",,,,,

1002,Jack,Wack,23772 East Rd,,Georgestown Twp,VA,48183,United
States,999-966-9735,Jack,Wack,23772 East Rd,,Georgtown
Twp,VA,12283,United States,519-966-9735,501,,Y,0,4/27/2007
15:04,5/10/2007 12:50,Shipped,,,,Regular Processing,,,,,,,,,,,

can some Regex expert help
[^\",\r\n]+ in your Field definition requires at least one character in
to match (+ means one or more). Change this to * (zero or more) and
things should start working.

Regex regex = new
Regex("((?<field>[^\",\r\n]*)|\"(?<field>([^\"]|\"\")*)\")(,|(?<rowbreak>\\r\\n|\\n|$))");

From the regex it looks like you're trying to read multiple lines with
one Regex.Match call. This could become very expensive.

You could also give this a try:
(??:^|,)(?:"(?<field>(?:[^"]|"")*)"|(?<field>[^"\r\n,]*)))*\r?$

(\r? at the end is to compensate for a bug in the .NET 2.0
implementation of the regex parser.)

It will match a whole line in one match object. You can extract the
values with the following code:

Regex rx = new Regex("...", RegexOptions.MultiLine);

Match m = rx.Match(input);
if (m.Success) // while (m.Success)
{
foreach (Capture c in m.Groups["field"].Captures)
{
string extracted = c.Value;
}
// m = m.NextMatch();
}

If you have an input string that contains multiple lines you can use the
while(m.Success) in combination with m = m.NextMatch(); which I've
commented out in the code above to loop through all the results in the
input.

I'm not sure exactly what you're doing with the values you've captured,
but you might also want to have a look at the OleDB delimited text
driver which allows you to load the contents of a comma delimited text
file into a dataset, saving you the trouble. Or you could try and use
SQL Server Integration Services to load the data directly into a
database.

Jesse

Click to expand...

Click to expand...