Parsing C# string usinsg RegEx

  • Thread starter Thread starter Natalia DeBow
  • Start date Start date
N

Natalia DeBow

Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"

I am having difficulty capturing multiple comments if they are separated
by xml tags. For some odd reason, if I have more than one set of tags,
the returned result is always the right most set of comments.

Thanks so much for any input!
Natalia
 
Natalie, you need to grab the comments with XML and then post-process what
you've
grabbed using an XML dom. You can easily modify the last regex that I sent to
allow
for documentation comments ///, append all such instances into a string to
process as XML.

Regex regex = new Regex(
"(?ms)(?# Specify our options )" +
"^.*?((?<lineComment>///?)|/\\*)" +
"(?<comments>.*?)" +
"(?(lineComment)$|\\*/)");

if ( match.Groups["lineComment"].Value == "///" ) {
string xmlString += match.Groups["comments"].Value;
}

Expressions are not a jack of all trades, nor are they the best or fastest
parsing structure for
all cases. Use the right tool for the job. Hope this helps in your endeavor.
 
Hi, inline

Natalia DeBow said:
Hi there,

I have another question for .NET RegEx experts.

I am reading in a C Sharp file line by line and I am trying to detect
comments that start with either // of ///. What I am particularly
interested is the comments themselves. I am interested in some stats with
regards to the amount of comments in the file (comment bytes).

So, I tried several regular expressions, but they don't seem to work in
all the cases.

Here are the cases that I need to cover:

a. /// comments or // comments
b. /// <xml-tag> comments </xml-tag>
c. /// <xml-tag> comments <another xml-tag> comments </another xml-tag>
comments </xml-tag>
d. /// <xml-tag>
e. /// </xml-tag>

I need to be able to capture the comments but not the xml tags.

Here are a few of regular expressions that I have tried but
unsuccessfully.

@"^.*?///?\s*((</?.+>)*(?<comments>.*))*$"
@"///?\s*(</?.+>)*(?<comments>.*)"

Problems:
1) '.+' inside "</?.+>", will match anything including '>'
2) '.*' inside (?<comments>.*), will match anything including '<'

I suggest trying this:

strRex = @"///?\s(?:(?:<[^>]+>)|(?<comments>[^<]+))*";

Case d and e will not match, because they don't contain any comments you
want.

HTH,
greetings
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top