a quick regexp question

Y

yoni

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
 
A

Artur Borecki

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks
.. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Artur said:
. (dot) seems not to match newline character.
another problem is that .* will match as much as possible.
So if you have text "/* comment */ code ... /* second comment /*" you regex
will match whole string from start to end.

/\*(.|\n)*?\*/ seems to work better.

Addition:

(.|\n) can also be written [.\n].

A set that matches any character can be made by combining any
complementing sets. I usually use [\w\W].
 
B

Bob

Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

Bob said:
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when I
did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines
that begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/
 
B

Bob

Hi Goran,
Yep,
It is not yet robust.
But the main thing IMO is that the overall comment structure is broken down
into its
component parts and each part is addressed by a group.
Of the extra examples you gave, only the single line comment with text
failed.
This is can addressed by (Single Line Comment) | (Multline Comment)
MultiLine Comment being my original post.
My Single Line offering is (\\x2F\\x2A\\s*[\\w\\s]*\\*/)

I can't see how you can simplify it down much from this without running into
the problem you mentioned earlier, namely matching code as well as comments.

My expression fails if you have a string assignment in code that imitates a
comment e.g. string s ="/* Some text */";
So 'Not quotes' (negative look ahead)? should be put on the front of all
groups. I tried it but couldn't stop the match.
The trouble with this empirical approach is you find holes and patch them
but you can't be sure you have found all the holes.
Unless your a regex expert which I am not.
If you can come up with a simpler robust regex that picks out the comments
and leaves the 'code' I would like to see it.

The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

Göran Andersson said:
Bob said:
Hi Yoni,
Have a play with
(\\x2F\\x2A\\x20[\\w\\s]+\\s)(\\x2A\\x20*[\\w\\s]+\\s)+(\\x2A\\x2F)
Seemed to work in regexBuddy with a test text of:
/* Comment 1
* comment two
* comment three
*Comment 4
*/
Code
2*3=3;
int r = 2*3;

/* Comment A
* comment B
* comment C
*/

The regex
captures 3 different type of lines in 3 different groups.
1) Comment Begin
2) Comment body
3) Comment end.

However, while regexbuddy listed all comment lines as being matched when
I did a list of all matches.
Only the last Body statement of each comment showed up in the individual
listing for the Comment body group.
So there may be a problem.
hth
Bob

Hey,
i am trying to get to the right regexp to remove everything that's a
multi line comment. in other words, everything between \*...*/. my
expression is:

/\*.*\*/

Doesnt work... Anybody sees anything wrong with that? thanks

I think that you are over-complicating it. Also you are assuming things
that are not at all required in a comment. There is no space required
after the start of the comment, and it doesn't have to contain lines that
begin with asterisks.

/*This is a perfectly legal comment*/
/*So
is
this*/
/*And the following too:*/
/**/
 
A

Artur Borecki

The new test text now follows.
regards
Bob
/* Comment Single Line space at front abd*/
/*Comment Single Line spaceless abd*/
/* comment two
I am a plain line
So am I
* comment three
*Comment 4
*/
Code Begins
int r = 2*3;
x = 5/3;//Inline Comment fails but do we want to grab it?
y=2^6;
string s = "/* this is a failing test string*/";
/* Comment A
* comment B
* comment C
*/
/* */
/**/

my regex /\*(.|\n)*?\*/ works fine for this example.
 
S

sherifffruitfly

It should match the first /* and the first */, doesn't it?

Oops - yes it does - I had an incorrect concept of *failure*. I was
under the erroneous impression that the *outer* comment-delimeters
would define a comment. In fact it's the first-from-left-to-right
matching pair that constitutes a comment.

Shorter version: Nevermind.

:)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top