complex regexp problem

  • Thread starter Thread starter papa.coen
  • Start date Start date
P

papa.coen

Hi,

I need to split/match the following type of (singleline) syntax on all
commas (or text in between) that are not between qoutes:
A,'B,B',C,,'E',F

The text between quotes can be _any_ text ( except for newlines).

the regexp must either match : A | 'B,B' | C | [empty] | 'E' | F
(pipeline serves as separator here) or split on the first,third and the
rest of the comma's.
I can't get it done, I can't figure out how to exclude s'thing from an
expression except for single characters.

I've tried serveral options but all of them fail on 1 or more of the
following cases:
- 'A' only
- B only.

or do not catch the last item (F)

BTW whats the reason for most regexp testing apps that they do not
support/offer split functionality. Should I only use matching?
 
Try the following code :
string myString="A,'B,B',C,,'E',F"
foreach(string s in Regex.Split(myString,@",(?![^']*',)(?<!,'[^']*)")) {
MessageBox.Show(s);
}

It gives : A | 'B,B' | C | [empty] | 'E' | F

Here some explanations for the regex ,(?![^']*',)(?<!,'[^']*) :
, means that you want to split with a coma
(?![^']*',) means that you do not want the coma being followed by some
characters, a closing quote and then a coma
(?<!,'[^']*) means that you do not want the coma being preceded by a
coma, an opening quote and some characters
The entire expression means that you want to split with a coma except from
the ones who are included between quotes.

Hope this helps

Ludovic SOEUR.
 
Thanks a lot, it did the trick.
I never thought of opening/closing quotes, doh!
I Also added (?<![,]$) to the expression which skips the last comma if
nothing follows.
BTW : What exactly does the '<' in the second group do?
 
Oops : the expression cannot handle starting/ending commas between
quotes. It ignores the commas around X;

a,'b,b',X,',c,',d,',Y,

My addition to the expression results in a 'Y,', which is not desired
('Y' is; not including the (last) comma).

Can you extend you expression to match the commas around X?
 
You are right. It does not work for this tricky case.
I can try to correct it but I have to know exactly what you want :

To me, a,'b,b',X,',c,',d,',Y, can't be split correctly because the
opening and closing quotes doesn't match.

With a,'b,b',X,',c,',d,',Y,' I would split like that : a | 'b,b'
| X | ',c,' | d | ',Y,'
Or if you consider ,', as correct with the same meaning as '' I would
split like that :
a | 'b,b' | X | ' | c | ' | d | '
| Y


What is the behavior you would like to have ?

Ludovic Soeur.
 
if you want
a | 'b,b' | X | ' | c | ' | d | ' | Y
you can use this one : ,(?![^']*',)(?<!,'[^']*)|,(?=',)|,(?<=',)
it's a trick to deal with ,',

Hope it helps.

Ludovic SOEUR.
 
: Oops : the expression cannot handle starting/ending commas between
: quotes. It ignores the commas around X;
:
: a,'b,b',X,',c,',d,',Y,
:
: My addition to the expression results in a 'Y,', which is not desired
: ('Y' is; not including the (last) comma).
:
: Can you extend you expression to match the commas around X?

I borrowed the pattern from "How can I split a [character] delimited
string except when inside [character]?" in section 4 of the Perl FAQ[*]:

[*] http://xrl.us/ifw4 (perldoc.perl.org)

using System;
using System.Collections;
using System.Text.RegularExpressions;

using NUnit.Framework;

namespace Lib
{
public class Record
{
static private readonly Regex extract =
new Regex(
@"^(?:
(?<field>'[^\'\\]*(?:\\.[^\'\\]*)*'),?
| (?<field>[^,]*),
| (?<field>[^,]+),?
| ,
)+$",
RegexOptions.IgnorePatternWhitespace);

string[] fields;

public Record(string line)
{
Match m = extract.Match(line);

if (m.Success)
{
ArrayList hits = new ArrayList();

foreach (Capture field in m.Groups["field"].Captures)
hits.Add(field.Value);

fields = (string[]) hits.ToArray(typeof(string));
}
}

public string[] Fields
{
get { return fields; }
}
}

[TestFixture]
public class RecordTest
{
[Test]
public void OnlyAInQuotes()
{
string input = "'A'";
string[] expect = { input };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void OnlyB()
{
string input = "B";
string[] expect = { input };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void CommaInQuotes()
{
string input = "A,'B,B',C,,'E',F";
string[] expect = { "A", "'B,B'", "C", "", "'E'", "F" };

Assert.AreEqual(expect, new Record(input).Fields);
}

[Test]
public void LeadingAndTrailingCommasInFields()
{
string input = "a,'b,b',X,',c,',d,',Y,";
string[] expect = { "a", "'b,b'", "X", "',c,'", "d", "'", "Y" };

Assert.AreEqual(expect, new Record(input).Fields);
}
}
}

Hope this helps,
Greg
 
Back
Top