question on regex for splitting a csv file

  • Thread starter Kristopher K. Kruger
  • Start date
K

Kristopher K. Kruger

I was hoping to parse a csv file with a regular expression, the
following is what i have:

string input = "8/5/2004,WITHDRAWAL,,\"($2,000.00)\",";
string pattern = "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)";
Regex regex = new Regex(pattern);
string[] sa = regex.Split(input);
string all = "";
int count = 0;
foreach (string s in sa)
{all += count.ToString() + ".\t" + s + "\n\r";count++;}
MessageBox.Show(all);


That code works fine except I return empty blocks where there were not
any in the source.
For instance, I get the following:
0. <blank>
1. 8/5/2004
2. <blank>
3. WITHDRAWAL
4. <blank>
5. <blank>
6. <blank>
7. "($2,000.00)"
8. <blank>
9. <blank>
10. <blank>

There should be only the one blank, the one appearing between WITHDRAWAL
and the dollar amount, where there were back-to-back commas in the source.

What I would want back instead:
0. 8/5/2004
1. WITHDRAWAL
2. <blank>
3. "($2,000.00)"
4. <blank>

(though the 4. <blank> is optional as far as I am concerned since it is
at the end)

Thanks.
 
K

Kristopher K. Kruger

Yes, thanks for mentioning that; i have actually been using the
regulator, getting familiar with it;

i was under them impression that there were a few different 'formats'
for csv files, the more basic ones containing no commas except for those
to separate fields and the more complex ones containing quotes around
any fields where commas appear within the fields

the regex i posted works fine to handle that; actually

string pattern = "(?:^|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)";
//(?:^|,) ?: do not name the group, do not capture the group (groups
captured could be used later)
// ^|, maybe means anything but a comma?

//(\"
//quote

//(?:[^\"]+|\"\")*
//quote followed by anything but a quote (one or more) followed by a quote
//and one or more of this entire set

//\"|[^,]*)
//quote
// or none or more commas

it does not handle quotes within quotes but at this point i am fine with
that;

it does also parse a blank in between each and every existing field of
the csv so until i figure out or otherwise obtain the more robust regex,
i will just ignore the odd elements it gives back in the array, for
those are the blanks that did not exist in the input csv


thanks again for your suggestion

Kristopher
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top