parsing words in a string

  • Thread starter Thread starter asrs63
  • Start date Start date
A

asrs63

Hi,

Is there a class that can handle splitting of a string on a comma such
that the commas in quotes are ignored?

I know we can use Text::ParseWords directive in perl to do this, but I
am new to C#.Net and couldn't find anything similar.

For example

string str = "one field, two field, \"field val one, field val two,
field val three\", three field" ;

Then I should get the following in a str_arr

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

Thanks in advance,

Ashoo.
 
string [] str_arr = str.Split(',');

does not work. I had tried it. It gives me

str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "\"field val one"
str_arr[3] = "field val two"
str_arr[4] = "field val three\""
str_arr[5] = "three field"

Instead of
str_arr[0] = "one field"
str_arr[1] = "two filed"
str_arr[2] = "field val one, field val two, field val three"
str_arr[3] = "three field"

look at the str_arr[2] value.

Thanks,
Ashoo
 
At this time using split more than once seems to be the only option. I
cant recollect any other function that can help you do this quicker.
how do you use that function in perl? put some code and may be that can
ring the bell for some csharpers.

thanks
 
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+$

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out
 
hi,
you can use Regular Expression which can give u such results. here is
one regular expression which will fetch you your desired result. i
assumed that you are using VS2003.

(?:(?<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+$

the ouput of your given example will be like this :

SubMatch: [Vals]
1:one field
2:two field
3:field val one, field val two,field val three
4:three field

try it out
 
This is how you cld do it in perl

#!/usr/bin/perl

use Text::ParseWords;

$line = "one field, two field, \"field val one, field val two,field val
three\", three field" ;

my @line = &parse_line('\,', 0, $line);

for (int i=0; i<$#line; i++)
print $line;

Thanks,
Ashoo
 
public System.Collections.ArrayList parseWords(string s)
{

if (s == null)
{
return (null);
}


bool bQuote = false;
System.Collections.ArrayList al = new ArrayList();
System.Text.StringBuilder sTemp = new StringBuilder();

for (int i = 0; i < s.Length; i++)
{
switch (s)
{
case ',':
if (bQuote == false)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}
else
{
sTemp.Append(s);
}
break;
case '\"':
if (bQuote == true)
{
bQuote = false;
}
else
{
bQuote = true;
}

//requirement:: remove quote character
//sTemp.Append(s);
break;
default:
sTemp.Append(s);
break;
}
}

if (sTemp.Length > 0)
{
al.Add(sTemp.ToString());
sTemp.Length = 0;
}

return (al);
}
 
Have you tried using regular expressions, instead of the Split()?

~~~~~~~~~~
 
Hi,

I tried the regular expressions and I am using VS.Net 2003.
This is how I have used it.

Regex regEx = new
Regex("(??<Vals>.*?),)+\"(?<Vals>.*?)\"(?:,(?<Vals>.*?))+$");
string [] text1 = regEx.Split(text);

I am getting the following run-time error

"An unhandled exception of type 'System.ArgumentException' occurred in
system.dll

Additional information: parsing
"(??<Vals>.*?),)+"(?<Vals>.*?)"(?:,(?<Vals>.*?))+$" - Unrecognized
grouping construct."

Can you please advise as to what I am doing wrong?

Thanks,
Ashoo
 
This is good stuff. How would the rest of the code look? I tried just
using plain regex but I couldn't get it to return the array.

Ron
 
This one removes all the unwanted characters as well:



public Form1()
{
InitializeComponent();
ArrayList al = ParseString(" \"M1, M2, M3, M4, \"S1, S2, S3 , S4,\"M5, M6,\"
S5, S6, \" M7, M8, M9 \"");
foreach (String aItem in al)
{
Console.WriteLine(aItem);
}
}


public ArrayList ParseString(string strInput)
{
string strTemp = "";
string ModString = "";
Boolean bQuote = false;
ArrayList aParsedString = new ArrayList();
for(int i = 0; i < strInput.Length; i++)
{
if (strInput == '\"' && bQuote == false)
{
bQuote = true;
}
else if (strInput == '\"' && bQuote == true)
{
ModString = strTemp.ToString();
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
bQuote = false;
}
if(strInput != ',')
{
strTemp += (strInput);
}
else
{
strTemp += (strInput);
if (bQuote == false)
{
ModString = strTemp.ToString();
ModString = ModString.Replace(",", " ");
ModString = ModString.Replace("\"", "");
ModString = ModString.TrimEnd(null);
ModString = ModString.TrimStart(null);
aParsedString.Add(ModString);
strTemp = "";
}
}
}
return (aParsedString);
}
 
hi Ashoo,

here is the implimentation of the Expression


Regex reg = new Regex("(?:(?.*?),)+[\\s]\"(?.*?)\"(?:,[\\s](?.*?))+$");
MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl) {
vals = Array.CreateInstance(typeof(string),
mat.Groups["Vals"].Captures.Count);
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures) {
vals(i) = cap.Value();
i++;
}
}

i've done this in vb.net and converted into c# for you so check some
sytaxts. anyways code is running.

let me know if you have any query regarding it.

Lucky
 
also check that i've little modified the expression. you can set
properties of RegEx to ignore case, multi line as per your requirements
but in your case i think only "ignore case" is only required.
 
Lucky,

I wish I could get this to work but I'm getting two major errors:

parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.

And the looping part is generating a whole series of errors.
 
parsing "(?.*?),)+[\s]"(?.*?)"(?:,[\s](?.*?))+$" - Unrecognized grouping
construct.

Looks like the poster missed a '(' at the beginning. The second ')' is
unmatched. I haven't tested, but try just adding another '(' at the very
beginning.

-mdb
 
hi,
as i said i wrote this in vb.net and converted for you. but the
converter missed some parts. so i've manually wrote the code in c#.net.
here is the code. try it and let me know.


Regex reg = new
Regex("(?:(?<Vals>.*?),)+[\\s]\"(?<Vals>.*?)\"(?:,[\\s](?<Vals>.*?))+$");


MatchCollection MatchColl;
MatchColl = reg.Matches("one field, two field, \"field val one, field
val two, field val three\", three field");
string[] vals;
foreach (Match mat in MatchColl)
{
vals =new string[mat.Groups["Vals"].Captures.Count];
int i = 0;
foreach (Capture cap in mat.Groups["Vals"].Captures)
{
vals = cap.Value;
i++;
}
}

you need to import this namespace in order to use this code.

using System.Text.RegularExpressions;

Lucky
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top