Hello Flomo,
From your post, my understanding on this issue is: you want to use regex to
replace the commas which are not contained within a pair of quotes. If I'm
off base, please feel free to let me know.
I think you can refer to the following regular expression to replace the
commas. (But regex is not a recommended approach in tackling this problem,
see the comparison of performance in the end of my reply)
(?<head>".*?")*(?<remove>,*)(?<tail>".*?")*
Here is some explanations:
The first (".*?")* is trying to match any "" pair in front of the commas to
be replaced.
The last (".*?")* is trying to match "" pair behind the commas.
After all the "" pairs are matched, any commas in the remaining string
should be replaced with '\t'.
The complete C# code is listed below:
static void Main(string[] args)
{
string test = "\"a,,,,,\",\"j,dd,\"b\",\",\"";
Regex regex = new
Regex("(?<head>\".*?\")*(?<remove>,*)(?<tail>\".*?\")*");
MatchEvaluator myEvaluator = new
MatchEvaluator(Program.ReplaceFunction);
Console.WriteLine(regex.Replace(test, myEvaluator));
}
public static string ReplaceFunction(Match m)
{
return m.Groups["head"].Value + m.Groups["remove"].Value.Replace(',',
'\t') + m.Groups["tail"].Value;
}
An alternative way to accomplish the task is to purely operate on the chars
of the string. By iterating the characters in the string, the task can be
done in O(n), n is the length of the string.
string test = "\"a,,,,,\",\"j,dd,\"b\",\",\"";
char[] str = test.ToCharArray();
bool isInQuotes = false;
for (int i = 0; i < str.Length; i++)
{
if (!isInQuotes && str
== ',')
{
str = '\t';
continue;
}
if (str == '\"')
isInQuotes = !isInQuotes;
}
Console.WriteLine(str);
In the code above, isInQuotes is a flag indicating whether the current
char is contained within a pair of quotes. If isInQuotes is false and the
char is a comma, then we should replace it with a '\t'.
Here is a comparison in performance of the two approaches:
I let both methods run 100000 times on the test string:
string test = "\"a,,,,,\",\"j,dd,\"b\",\",\"";
The result is that it takes 6380ms for Regex, but only 39ms for the string
method. Therefore, I recommend the latter.
Regex is useful in some complicated cases such as the match of Email
address, but sometime, it is resource-consuming. Thus, in some cases that
can be resolved in one iteration of string, a direct operation on chars is
recommended.
Please feel free to let me know if you have any other concern.
Sincerely,
Jialiang Ge ([email protected], remove 'online.')
Microsoft Online Community Support
==================================================
For MSDN subscribers whose posts are left unanswered, please check this
document: http://blogs.msdn.com/msdnts/pages/postingAlias.aspx
Get notification to my posts through email? Please refer to
http://msdn.microsoft.com/subscriptions/managednewsgroups/default.aspx#notif
ications. If you are using Outlook Express/Windows Mail, please make sure
you clear the check box "Tools/Options/Read: Get 300 headers at a time" to
see your reply promptly.
Note: The MSDN Managed Newsgroup support offering is for non-urgent issues
where an initial response from the community or a Microsoft Support
Engineer within 1 business day is acceptable. Please note that each follow
up response may take approximately 2 business days as the support
professional working with you may need further investigation to reach the
most efficient resolution. The offering is not appropriate for situations
that require urgent, real-time or phone-based interactions or complex
project analysis and dump analysis issues. Issues of this nature are best
handled working with a dedicated Microsoft Support Engineer by contacting
Microsoft Customer Support Services (CSS) at
http://msdn.microsoft.com/subscriptions/support/default.aspx.
==================================================
This posting is provided "AS IS" with no warranties, and confers no rights.