regex split

  • Thread starter William Stacey [MVP]
  • Start date
W

William Stacey [MVP]

Would like help with a (I think) a common regex split example. Thanks for
your example in advance. Cheers!

Source Data Example:
one "two three" four

Optional, but would also like to ignore pairs of brackets like:
"one" <tab> "two three" ( four "five six" )

Want fields like:
field1:blush:ne
field2:two three
field3:four

field1:blush:ne
field2:two three
field3:four
field4:five six

Thanks much!!
 
W

William Stacey [MVP]

Should clarify a little. Basically, want to split a line that ignores all
whitespace (space, tab) except if the space is inclosed in quotes. Anything
in a quote pair is one field. A non-escaped quote (i.e. \") that does not
have a closing quote is an error. Same with the "(" parens. If paren in
not inside a quote, then it is special and needs a closing paren. If the
paren stuff makes this too hard, forget it and please help with the first
requirement. Again thanks!
 
W

William Stacey [MVP]

Thanks Uri. However I need to preserve an arg like "this is one arg" as one
field and not have that four fields as I can't figure out after the fact
that that was one argument. Probably have to manually parse this, but
thought there may be easy way using regex. Cheers!

--
William Stacey, MVP

Uri Dor said:
I think it's a common match example, but not a common split example.
split assumes you have a single regex that matches the splitting text,
but you don't, since the text between *one* and *two* must start and end
with a double quote because there's one before *one*.

you can, however, match something like

(("\[^"]+")|(\w+))( \<[^>]*\>)*

and iterate on the matches.


HTH
Should clarify a little. Basically, want to split a line that ignores all
whitespace (space, tab) except if the space is inclosed in quotes. Anything
in a quote pair is one field. A non-escaped quote (i.e. \") that does not
have a closing quote is an error. Same with the "(" parens. If paren in
not inside a quote, then it is special and needs a closing paren. If the
paren stuff makes this too hard, forget it and please help with the first
requirement. Again thanks!
 
U

Uri Dor

I think it's a common match example, but not a common split example.
split assumes you have a single regex that matches the splitting text,
but you don't, since the text between *one* and *two* must start and end
with a double quote because there's one before *one*.

you can, however, match something like

(("\[^"]+")|(\w+))( \<[^>]*\>)*

and iterate on the matches.


HTH
 
W

William Stacey [MVP]

Here is a cool little method that I modifed from a VB example. Does exactly
what I wanted. Can split on any delimiter or multiple delimiters and can
quote using any pair or chars. Very cool. Have not tested all possible
failures, etc, but appears to work well. Some clever (and generous) pattern
person may want to modify this to allow an *array of quote pairs, so you
could quote on "one two" or {one two} or (one two) in the same call. If you
do, please post update. Cheers!
==
/// <summary>
/// Split a string, dealing correctly with quoted items.
/// The quotes parm is the character pair used to quote strings
/// (default is "", the double quote).
/// You can also use a character pair (eg "{}") if the opening
/// and closing quotes are different.
///
/// For example, you can split the following string:
/// string[] fields = SplitQuoted("[one,two],three,[four,five]", , "[]")
/// into 3 items, because commas inside [] are not taken into account.
/// </summary>
/// <remarks>
/// Multiple seperators are ignored, so splitting "a,,b" using a comma as
/// the seperator will return two fields, not three. To get this behavior,
/// you could use ", " (comma and space) as seperators and default quotes.
/// Then set the string to something like ' a, "", b ' to get the empty
field.
/// You could also use comma as *only seperator and put a space to get a
space field
/// like 'a, ,b'.
/// </remarks>
/// <param name="text">string to split</param>
/// <param name="seperator">The seperator char(s) as string.</param>
/// <param name="quotes">The char pair used to quote a string.</param>
/// <returns>string[]</returns>
private string[] SplitQuoted(string text, string seperators, string quotes)
{
// Default seperators is a space and tab (e.g. " \t").
// All seperators not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if ( text == null )
throw new ArgumentNullException("text", "text is null.");
if ( seperators == null || seperators.Length < 1 )
seperators = " \t";
if ( quotes == null || quotes.Length < 1 )
quotes = "\"\"";
ArrayList res = new ArrayList();

// Get the open and close chars, escape them for use in regular
expressions.
string openChar = Regex.Escape(quotes[0].ToString());
string closeChar = Regex.Escape(quotes[quotes.Length - 1].ToString());
// Build the pattern that searches for both quoted and unquoted elements
// notice that the quoted element is defined by group #2
// and the unquoted element is defined by group #3.
string pattern = @"\s*(" + openChar + "([^" + closeChar + "]*)" +
closeChar + @"|([^" + seperators + @"]+))\s*";

// Search the string.
foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
{
string g3 = m.Groups[3].Value;
if ( g3 != null && g3.Length > 0 )
res.Add(g3);
else
{
// get the quoted string, but without the quotes.
res.Add(m.Groups[2].Value);
}
}
return (string[])res.ToArray(typeof(string));
}
 
J

Jeffrey Tan[MSFT]

Hi William,

I am glad you got what you want. Do you still have any concern on this
issue?

Please feel free to feedback. Thanks

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
W

William Stacey [MVP]

Yes thanks. Here is a ~better one that will escape "\" anything, including
a quote inside quote pairs:
public static string[] SplitQuoted(string text, string seperators)
{
// "([^"\\]*(\\.[^"\\]*)*)"
// |
// ([^\s,]+)
// Default seperators is a space and tab (e.g. " \t").
// All seperators not inside quote pair are ignored.
// Default quotes pair is two double quotes ( e.g. '""' ).
if ( text == null )
throw new ArgumentNullException("text", "text is null.");
if ( seperators == null || seperators.Length < 1 )
seperators = " \t"; // Default is space and tab.

// if ( quotes == null || quotes.Length < 1 )
// quotes = "\"\"";
ArrayList res = new ArrayList();

// Get the open and close chars, escape them for use in regular
expressions.
// string openChar = Regex.Escape(quotes[0].ToString());
// string closeChar = Regex.Escape(quotes[quotes.Length - 1].ToString());
// Build the pattern that searches for both quoted and unquoted elements
// notice that the quoted element is defined by group #2
// and the unquoted element is defined by group #3.
//| \s*("([^"]*)"|([^,]+))\s* |
// match any spaces upto first quote. that does not contain zero or more
" chars
// ending in a quote OR not one or more commas
// string pattern = @"\s*(" + openChar + "([^" + closeChar + "]*)" +
// closeChar + @"|([^" + seperators + @"]+))\s*";

//"([^"\\]*[\\.[^"\\]*]*)" //Note quotes at either end are required.
//|
//([^\s,]+)
//string[] sa = Regex.Split("my string", "pattern");
string pattern =
@"""([^""\\]*[\\.[^""\\]*]*)""" +
"|" +
@"([^" + seperators + @"]+)";

// Search the string.
foreach ( System.Text.RegularExpressions.Match m in
System.Text.RegularExpressions.Regex.Matches(text, pattern) )
{
//string g0 = m.Groups[0].Value;
string g1 = m.Groups[1].Value;
string g2 = m.Groups[2].Value;
if ( g2 != null && g2.Length > 0 )
{
res.Add(g2);
}
else
{
// get the quoted string, but without the quotes in g1;
res.Add(g1);
}
}
return (string[])res.ToArray(typeof(string));
}

--
William Stacey, MVP

"Jeffrey Tan[MSFT]" said:
Hi William,

I am glad you got what you want. Do you still have any concern on this
issue?

Please feel free to feedback. Thanks

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
J

Jeffrey Tan[MSFT]

Hi William,

Thanks for sharing your information with the community!!

Best regards,
Jeffrey Tan
Microsoft Online Partner Support
Get Secure! - www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top