regex challenge

J

Jarlaxle

I'd like to issue a challenge (I already have a class to do it so I'm not
asking just to get the code)...

Write one regex expression that removes all comments from a string.

Hints:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. I have seen some that claim to remove all comments but are inadequate for
one reason...(QUOTES!)
 
J

Jon Skeet [C# MVP]

Jarlaxle said:
I'd like to issue a challenge (I already have a class to do it so I'm not
asking just to get the code)...

Write one regex expression that removes all comments from a string.

Hints:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. I have seen some that claim to remove all comments but are inadequate for
one reason...(QUOTES!)

Well, you need to specify the quote behaviour as well. Hint: C# quote
handling would be different to Java handling. (In fact, that goes for
various other non-quote cases, given the way Java handles Unicode
escape sequences.)
 
J

Jarlaxle

we can keep it to c#:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. quotes must be supported
a. all text from a starting quote to closing quote must not be
matched
b. must support \" escape sequence inside quotes.
 
P

Paul E Collins

Jarlaxle said:
I'd like to issue a challenge (I already have a class to do it so I'm
not asking just to get the code)... Write one regex expression that
removes all comments from a string.

How does your class handle code like this?

/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

I'm pretty sure it's impossible with a regular expression; you'd need a
true C# parser.

Eq.
 
P

Paul E Collins

Paul E Collins said:
How does your class handle code like this?
/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

Hey, here's something weird.

If you type my (top-of-head) code above into Visual Studio, it produces
one long line that's somehow a single comment, and none of it gets
compiled as actual code.

/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

But if you put a line break where you'd expect the first multi-line
comment to end -- without changing anything else -- then the second
statement gets syntax-coloured and compiled as C# code.

/* string s = "/* hello */ // how are you?"; */
string s = "/* // */";

So I think I've accidentally found a bug. What do I win?

Eq.

P.S. My question to the original poster still stands, bug or none!
 
P

Paul E Collins

Blah, never mind :) I worked out what's going on with that line.

This can serve as a lesson to anyone who would combine // and /**/
comments.

Eq.
 
J

Jon Skeet [C# MVP]

Jarlaxle said:
we can keep it to c#:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. quotes must be supported
a. all text from a starting quote to closing quote must not be
matched
b. must support \" escape sequence inside quotes.

What about @"\"//This is a comment ?
 
J

Jesse Houwing

Hello Jarlaxle,
I'd like to issue a challenge (I already have a class to do it so I'm
not asking just to get the code)...

Write one regex expression that removes all comments from a string.

Hints:

1. /* a block can contain newlines and must match till next */
2. // must match until end of line
3. I have seen some that claim to remove all comments but are
inadequate for
one reason...(QUOTES!)

I would love to take this challenge on the one hand, but am lackign time
and incentive to do so at this time.

There's a lot to keep in mind even when only considering C# as target language
for this exercise.

I say:

A that this can be done with regular expressions.
B that it will be bloody hard and unreadable
C that regex isn't the best tool for this job
D that if you mix up comment like code with string constants as in the provided
samples, you're either crazy or need total job security ;)

So for those that want to try:

A) Verbatim strings are a pain in the ass for this as it removes the security
of the normal line ends.
B) You need to understand balancing groups to get this to work
C) And a lot of look ahead's/behinds
D) And greedy matching to solve performance issues

The best way is probably to make a regex that matches from the start to the
end and use a MatchEvaluator to null all the comments found... but that wouldn't
be a pure regex solution would it?
 
B

Ben Voigt [C++ MVP]

Paul E Collins said:
Hey, here's something weird.

If you type my (top-of-head) code above into Visual Studio, it produces
one long line that's somehow a single comment, and none of it gets
compiled as actual code.

/* string s = "/* hello */ // how are you?"; */ string s = "/* // */";

But if you put a line break where you'd expect the first multi-line
comment to end -- without changing anything else -- then the second
statement gets syntax-coloured and compiled as C# code.

This just goes to show that where you'd expect the first comment to end is
not, in fact, where it does end.

Here is the first comment:
/* string s = "/* hello */

The */ inside quotes is NOT skipped because it is NOT inside a quoted string
literal because a quote inside a comment is a comment, not the beginning of
a string literal.
 
B

Ben Voigt [C++ MVP]

Paul E Collins said:
Blah, never mind :) I worked out what's going on with that line.

This can serve as a lesson to anyone who would combine // and /**/
comments.

You absolutely should combine // and /**/ comments. If you need to comment
out a block of code, you need to block prefix with // because /* */ style
comments do not nest.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top