Parsing C# - Syntax question

  • Thread starter Thread starter Immo Landwerth
  • Start date Start date
I

Immo Landwerth

Hi there,

I am working at a project where parsing C# is a major part.

While constructing an AST I was wondering about the syntax of the
for-loop statement.

If I write

for (int i = 0; i < 10; i)
^
|
--------------------------+

the compiler aborts compilation and says:

CS0201: Only assignment, call, increment, decrement, and new object
expressions can be used as a statement.

This happens also when I do that in the loop initializer

int i = 0;

for (i; i < 3; i++)
^
|
-------+
{
}

But if I use an expression list instead of a simple expression

int i = 0;

for (1, 2, 3; i < 3; i++)
{
}

-- or --

for (int i = 0; i < 3; 1, 2, 3)
{
}

the source becomes compiled just fine. I would have expected the
compiler to complain about the expression list "1, 2, 3" with CS0201
because the initialization/incrementation of a for-loop should be a
expression or a expression list where all expressions can be used as
statements.

Is this a known/unknown issue or is there a good reason to allow any
expression list in the loop initializer/incrementer?
 
Ok fine I see where you are coming from but I dont think you are comparing
like for like, for instance

//These first two examples both have the same item in initializer and
iterator
for (int i = 0; i < 10; i) //methodA type1

int i = 0; //methodA type2
for (i; i < 3; i++)

//wheras here you dont
int i = 0; //methodB type1
for (1, 2, 3; i < 3; i++)

for (int i = 0; i < 3; 1, 2, 3) //methodB type2


Whilst I cannot see anything in the specification to suggest that both
methods are not valid, logically when there is a relation between the
iterator and the initializer, it makes no sense to iterate like this (or
should I say not iterate what has been initialized in the initializer).
Likewise the otherway around is just as non-sensical.
Agreed the numeric methods are as pointless but it is probably less obvious
to the compiler that there is a problem here.

Also bear in mind that the specification states that the initializer,
condition and iterator are optional so that
for (int i = 0; i < 10;) //methodA type1

int i = 0; //methodA type2
for (; i < 3; i++)

will work.

Br,

Mark.
 
Looks like a compiler bug. Compiler incorrectly check constants in for
statements. 1 or 1,2,3 is not statement-expression or
statement-expression-list. Semantically statement-expression
or statement-expression-list must have side-effect, i.e. something must be
changed after executing statament. But in described examples statements
nothing is change.

Alex.
 
Mark said:
Ok fine I see where you are coming from but I dont think you are
comparing like for like, for instance

//These first two examples both have the same item in initializer and
iterator
for (int i = 0; i < 10; i) //methodA type1

int i = 0; //methodA type2
for (i; i < 3; i++)

//wheras here you dont
int i = 0; //methodB type1
for (1, 2, 3; i < 3; i++)

for (int i = 0; i < 3; 1, 2, 3) //methodB type2

Agreed. I compare simple expressions to expression lists.
Whilst I cannot see anything in the specification to suggest that both
methods are not valid,

Well I think there is :) Let's have a look at the C# language
specification [1]:

| B.2.5 Statements
|
| for-statement:
| for ( for-initializer_opt ; for-condition_opt ; for-iterator_opt)
| embedded-statement
|
| for-initializer:
| local-variable-declaration
| statement-expression-list
|
| for-condition:
| boolean-expression
|
| for-iterator:
| statement-expression-list
|
| statement-expression-list:
| statement-expression
| statement-expression-list , statement-expression
|
| statement-expression:
| invocation-expression
| object-creation-expression
| assignment
| post-increment-expression
| post-decrement-expression
| pre-increment-expression
| pre-decrement-expression

For me that implies that for-initializer and for-iterator must be valid
statements.
logically when there is a relation between the
iterator and the initializer, it makes no sense to iterate like this
(or should I say not iterate what has been initialized in the
initializer). Likewise the otherway around is just as non-sensical.

That is not the point. Normally, the compiler complains about useless
statements that consists only of effectless expression, e.g.

1;
1+ 2;
i;

etc.

I don't have any problem with the fact that the compiler does not warn
about the (somehow pointless) for-loop

for (1, 2, 3; i < 2; i++)

but I have to build up a syntax tree for C#. Some time ago I decided to
parse for-loops like this:

Match(TokenType.FOR);

ParseExpression();
ParseExpression();
ParseExpression();

Match(TokenType.RightParentheses);

...

After this, I would force that all expressions are valid statements.

But know I see that the compiler accepts expression list also. From my
point of view I don't see any reason to accept this. Ok one could say
"no problem just accept it and everything is fine" but to do so I would
have to change my AST (which is not realy obvious).

So I was just wandering if this a bug in the compiler or a
misunderstanding of C# by me.
Agreed the numeric methods are as pointless but it is probably less
obvious to the compiler that there is a problem here.

I have to disagree. From the compiler's point of view this as simple as
the the validation that the method "test" contains only valid
statements.

void test()
{
i++;
i + 2;
a(d);
}
Also bear in mind that the specification states that the initializer,
condition and iterator are optional so that
for (int i = 0; i < 10;) //methodA type1

int i = 0; //methodA type2
for (; i < 3; i++)

will work.

You are right and I handle this correctly. Any time a statement is
parsed a single semicolon and two consecutive left braces are valid
statements. But it depends on the context if the compiler warns you
about a probably mistaken null statement (CS0642).
 
Alex said:
Looks like a compiler bug. Compiler incorrectly check constants in for
statements. 1 or 1,2,3 is not statement-expression or
statement-expression-list. Semantically statement-expression
or statement-expression-list must have side-effect, i.e. something
must be changed after executing statament. But in described examples
statements nothing is change.

Ok that is excatly what I assumed but it's nice to hear it from someone
else also :)
 
Immo Landwerth said:
Hi there,

I am working at a project where parsing C# is a major part.

While constructing an AST I was wondering about the syntax of the
for-loop statement.

Why?
The C# language standard says (8.8.3):

"The for-initializer, if present, consists of either a
local-variable-declaration (Section 8.5.1) or a list of
statement-expressions (Section 8.6) separated by commas. The scope of a
local variable declared by a for-initializer starts at the
local-variable-declarator for the variable and extends to the end of the
embedded statement. The scope includes the for-condition and the
for-iterator."

Section 8.6 (statement-expressions) says:
"statement-expression:
invocation-expression
object-creation-expression
assignment
post-increment-expression
post-decrement-expression
pre-increment-expression
pre-decrement-expression
Not all expressions are permitted as statements. In particular, expressions
such as x + y and x == 1 that merely compute a value (which will be
discarded), are not permitted as statements.
Execution of an expression-statement evaluates the contained expression and
then transfers control to the end point of the expression-statement. The end
point of an expression-statement is reachable if that expression-statement
is reachable."

Don't ask me why they allow "object-creation-expressions" here, but that's
apparently what make things like these possible:
for (new String('1',6); i<10; i++,5) {}
for (1; i<10; i++,5) {}
for ("123"; i<10; i++,5) {}
the source becomes compiled just fine. I would have expected the
compiler to complain about the expression list "1, 2, 3" with CS0201
because the initialization/incrementation of a for-loop should be a
expression or a expression list where all expressions can be used as
statements.

It is. 1, 2 and 3 are all "object creation expressions": they create integer
objects.
Is this a known/unknown issue or is there a good reason to allow any
expression list in the loop initializer/incrementer?

It allows statements like:
for (i=0,j=10; i<10; i++,j+=5)

I can't really say why that also includes "object creation expressions". I
guess the compiler doesn't make any difference between these kinds of
statements.
Maybe it's also some kind of "C heritage", where the contents of a for
statement could contain pretty much anything...

Niki
 
Niki said:
Not all expressions are permitted as statements. In particular,
expressions such as x + y and x == 1 that merely compute a value
(which will be discarded), are not permitted as statements.
Execution of an expression-statement evaluates the contained
expression and then transfers control to the end point of the
expression-statement. The end point of an expression-statement is
reachable if that expression-statement is reachable."

Don't ask me why they allow "object-creation-expressions" here, but
that's apparently what make things like these possible:
for (new String('1',6); i<10; i++,5) {}
for (1; i<10; i++,5) {}
for ("123"; i<10; i++,5) {}
It is. 1, 2 and 3 are all "object creation expressions": they create
integer objects.

I have to disagree. If 1, 2, 3 were all object creation expressions the
following source would be legal:

1;
2;
3;

(which is not) because according to the grammar an object creation
expression is valid statement.

No, object creation expression are only constructor invocations by the
keyword new:

| object-creation-expression:
| new type ( argument-listopt )
 
It is. 1, 2 and 3 are all "object creation expressions": they create
integer
objects.

Niki

I don't agree. 1 or 1, 2, 3 is not object-creation-expression.
object-creation-expression create type dynamically (see gramar).

object-creation-expression:
"new" type "(" [argument-list] ")"

Alex.
 
sorry I should have said
"whilst I cannot see anything in the specification to suggest that both
methods are valid"
as I said, both methods are non-sensical

snippet{
I have to disagree. From the compiler's point of view this as simple as
the the validation that the method "test" contains only valid
statements.}
Fact is that one way is being compiled and the other is not therefore
whether you believe both methods to be as simple to validate or not is not
the issue. The issue is that it IS being missed OR is valid syntax and as
Ive mentioned, because those parts are optional to the for statement, one
can only assume that it is being missed -and if it is being missed there is
a reason (what that reason may be is an assumption).

Br,

Mark.

Immo Landwerth said:
Mark said:
Ok fine I see where you are coming from but I dont think you are
comparing like for like, for instance

//These first two examples both have the same item in initializer and
iterator
for (int i = 0; i < 10; i) //methodA type1

int i = 0; //methodA type2
for (i; i < 3; i++)

//wheras here you dont
int i = 0; //methodB type1
for (1, 2, 3; i < 3; i++)

for (int i = 0; i < 3; 1, 2, 3) //methodB type2

Agreed. I compare simple expressions to expression lists.
Whilst I cannot see anything in the specification to suggest that both
methods are not valid,

Well I think there is :) Let's have a look at the C# language
specification [1]:

| B.2.5 Statements
|
| for-statement:
| for ( for-initializer_opt ; for-condition_opt ; for-iterator_opt)
| embedded-statement
|
| for-initializer:
| local-variable-declaration
| statement-expression-list
|
| for-condition:
| boolean-expression
|
| for-iterator:
| statement-expression-list
|
| statement-expression-list:
| statement-expression
| statement-expression-list , statement-expression
|
| statement-expression:
| invocation-expression
| object-creation-expression
| assignment
| post-increment-expression
| post-decrement-expression
| pre-increment-expression
| pre-decrement-expression

For me that implies that for-initializer and for-iterator must be valid
statements.
logically when there is a relation between the
iterator and the initializer, it makes no sense to iterate like this
(or should I say not iterate what has been initialized in the
initializer). Likewise the otherway around is just as non-sensical.

That is not the point. Normally, the compiler complains about useless
statements that consists only of effectless expression, e.g.

1;
1+ 2;
i;

etc.

I don't have any problem with the fact that the compiler does not warn
about the (somehow pointless) for-loop

for (1, 2, 3; i < 2; i++)

but I have to build up a syntax tree for C#. Some time ago I decided to
parse for-loops like this:

Match(TokenType.FOR);

ParseExpression();
ParseExpression();
ParseExpression();

Match(TokenType.RightParentheses);

...

After this, I would force that all expressions are valid statements.

But know I see that the compiler accepts expression list also. From my
point of view I don't see any reason to accept this. Ok one could say
"no problem just accept it and everything is fine" but to do so I would
have to change my AST (which is not realy obvious).

So I was just wandering if this a bug in the compiler or a
misunderstanding of C# by me.
Agreed the numeric methods are as pointless but it is probably less
obvious to the compiler that there is a problem here.

I have to disagree. From the compiler's point of view this as simple as
the the validation that the method "test" contains only valid
statements.

void test()
{
i++;
i + 2;
a(d);
}
Also bear in mind that the specification states that the initializer,
condition and iterator are optional so that
for (int i = 0; i < 10;) //methodA type1

int i = 0; //methodA type2
for (; i < 3; i++)

will work.

You are right and I handle this correctly. Any time a statement is
parsed a single semicolon and two consecutive left braces are valid
statements. But it depends on the context if the compiler warns you
about a probably mistaken null statement (CS0642).
 
Yup, you're right.
I just took a look at the SSCLI implementation of the C# compiler, and it
seems as if they simply forgot to check that case.

Niki
 
Niki said:
Yup, you're right.
I just took a look at the SSCLI implementation of the C# compiler,
and it seems as if they simply forgot to check that case.

That is my impression, too.
 
Back
Top