question on regex for splitting a csv file

Kristopher K. Kruger · Aug 11, 2004

I was hoping to parse a csv file with a regular expression, the
following is what i have:

string input = "8/5/2004,WITHDRAWAL,,\"($2,000.00)\",";
string pattern = "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)";
Regex regex = new Regex(pattern);
string[] sa = regex.Split(input);
string all = "";
int count = 0;
foreach (string s in sa)
{all += count.ToString() + ".\t" + s + "\n\r";count++;}
MessageBox.Show(all);

That code works fine except I return empty blocks where there were not
any in the source.
For instance, I get the following:
0. <blank>
1. 8/5/2004
2. <blank>
3. WITHDRAWAL
4. <blank>
5. <blank>
6. <blank>
7. "($2,000.00)"
8. <blank>
9. <blank>
10. <blank>

There should be only the one blank, the one appearing between WITHDRAWAL
and the dollar amount, where there were back-to-back commas in the source.

What I would want back instead:
0. 8/5/2004
1. WITHDRAWAL
2. <blank>
3. "($2,000.00)"
4. <blank>

(though the 4. <blank> is optional as far as I am concerned since it is
at the end)

Thanks.

Sijin Joseph · Aug 12, 2004

Try using the Regulator tool to tweak ur regex,
http://regulator.sourceforge.net/
The problem for you is that ur input contains "," in the individual elements
also.

Kristopher K. Kruger · Aug 12, 2004

Yes, thanks for mentioning that; i have actually been using the
regulator, getting familiar with it;

i was under them impression that there were a few different 'formats'
for csv files, the more basic ones containing no commas except for those
to separate fields and the more complex ones containing quotes around
any fields where commas appear within the fields

the regex i posted works fine to handle that; actually

string pattern = "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)";
//(?:^|,) ?: do not name the group, do not capture the group (groups
captured could be used later)
// ^|, maybe means anything but a comma?

//(\"
//quote

//(?:[^\"]+|\"\")*
//quote followed by anything but a quote (one or more) followed by a quote
//and one or more of this entire set

//\"|[^,]*)
//quote
// or none or more commas

it does not handle quotes within quotes but at this point i am fine with
that;

it does also parse a blank in between each and every existing field of
the csv so until i figure out or otherwise obtain the more robust regex,
i will just ignore the odd elements it gives back in the array, for
those are the blanks that did not exist in the input csv

thanks again for your suggestion

Kristopher

Regex Usage, or use Substring?	3	Jun 25, 2008
more regex question how to avoid capturing leading empty lines	2	Aug 9, 2007
Regex, replace nth field in a CSV record	3	Aug 20, 2007
Can't put a comma in a regex pattern?	4	Mar 6, 2007
Question on Regex.Split	2	Nov 12, 2003
RegEx problem	7	Jun 28, 2007
Help w/ regex for parsing parameters	2	Oct 5, 2004
Help with Regex	1	Apr 19, 2007

question on regex for splitting a csv file

Kristopher K. Kruger

Sijin Joseph

Kristopher K. Kruger

Ask a Question

Similar Threads