self-confidence of compiler

  • Thread starter valentin tihomirov
  • Start date
V

valentin tihomirov

If the compiler should catch *all* cases of variables, that aren't
accessed
uninitialized it would use up to infinite processing power and infinite
quantity of analyzing code.

Are you sure about infinite program size? If I remember Turing Machines
correctly, these most powerful computers have FSM (finite) controller. This
is infinite data storage, which they feature.
 
P

Peter Duniho

Could you give an example? Here's one that *does* work:

Sure. Here's the one I posted to the VS2005 bug report database:

{
int num;
bool fValid;
try
{
num = Convert.ToInt32("50");
fValid = true;
}
catch (InvalidCastException)
{
fValid = false;
}
if (fValid)
{
Console.WriteLine("Value of number is " + num); // CS0165 here
}
}

My bug report was rejected as "by design", with the argument that it would
be too complicated for the compiler to figure out that fValid is actually
initialized in all code paths leading to its use. Note that there's no
looping, nor any interdependency between variables. It's a strictly "code
goes through here, or it goes through here" sort of thing. The only way
for the code to not reach "fValid = true" is for an exception to occur, in
which case it will reach "fValid = false" before it reaches the "if()"
statement.

Pete
 
B

Bruce Wood

Sure. Here's the one I posted to the VS2005 bug report database:

{
int num;
bool fValid;
try
{
num = Convert.ToInt32("50");
fValid = true;
}
catch (InvalidCastException)
{
fValid = false;
}
if (fValid)
{
Console.WriteLine("Value of number is " + num); // CS0165 here
}

}

My bug report was rejected as "by design", with the argument that it would
be too complicated for the compiler to figure out that fValid is actually
initialized in all code paths leading to its use. Note that there's no
looping, nor any interdependency between variables.

Umm... apart from the the interdependency between fValid and num?
 
P

Peter Duniho

Umm... apart from the the interdependency between fValid and num?

It's not any different from the "interdependency" between "args.Length"
and "x" in Jon's example. Given that, I presume that Jon's use of the
term "interdependency" refers to something more complex and I answered in
the same vein.

Pete
 
J

Jon Skeet [C# MVP]

valentin tihomirov said:
The rule: "Only initialized variables are allowed to be used" seems natural
and ultimately simple. The rules describing where you cut the analisys
should be extremely complex, I beleive.

The more complex the rules are, the harder they are to reason about and
understand. They also make it much harder to verify that a compiler is
correct.

I like languages where the rules are reasonably easy to understand,
even if that means that occasionally I have to spell things out a bit
more for the compiler's benefit.
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
Sure. Here's the one I posted to the VS2005 bug report database:

{
int num;
bool fValid;
try
{
num = Convert.ToInt32("50");
fValid = true;
}
catch (InvalidCastException)
{
fValid = false;
}
if (fValid)
{
Console.WriteLine("Value of number is " + num); // CS0165 here
}
}

My bug report was rejected as "by design", with the argument that it would
be too complicated for the compiler to figure out that fValid is actually
initialized in all code paths leading to its use. Note that there's no
looping, nor any interdependency between variables.

There certainly *is* an interdependency between variables. The only
reason it will always actually work is that there is no way for fValid
to be true unless num has already been assigned.

It's a factor of the behaviour of the value's assigned, not just flow
control - and the rules in the spec are almost entirely based on flow
control, IIRC.

A simple way to demonstrate that is to change the behaviour of the
catch block - change the false to true, and consider the case where
Convert.ToInt32 throws an InvalidCastException - it would try to read
the value of num without it being assigned.

For the assignment status of one variable in a block to depend on the
value used in an assignment statement involving a *different* variable
seems way too complicated to be able to specify in a sensible way, IMO.

I certainly wouldn't deem your example "simple" (in terms of the
reasoning the compiler would have to go through to verify that it will
always have been assigned) - nor does it agree with your description of
"code that has just two branches, both of which initialize the
variable". Both the branches (success and catching
InvalidCastException) initialize fValid, but that's not the variable
the compiler is complaining about.
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
It's not any different from the "interdependency" between "args.Length"
and "x" in Jon's example. Given that, I presume that Jon's use of the
term "interdependency" refers to something more complex and I answered in
the same vein.

My example was one which matched your description of "code that has
just two branches, both of which initialize the variable". The x
variable is initialized regardless of the value of args.Length -
whereas in your example, num can only be regarded as being definitely
assigned if fValid is true.
 
P

Peter Duniho

There certainly *is* an interdependency between variables. The only
reason it will always actually work is that there is no way for fValid
to be true unless num has already been assigned.

Ah. I see. I misinterpreted "interdependency" somehow. Hindsight being
20/20, I don't see how I did that. But I did. My mistake.

Still, I don't find this example overly complicated. After exiting the
try/catch block, there are a limited number of values that "fValid" can
have, one of which is tied to the initialization of "num" as well as the
use of "num". That is, at the point of evaluating the use of "num", the
compiler could easily detect that the block of code it's looking at only
occurs when "num" is initialized.

That said, I'm not really interested in debating whether the compiler
ought to or ought not to handle this. It doesn't today, it appears that
there is no chance it will ever handle it, and thus I find it pointless to
debate whether it ought to or not. The debate is moot and a waste of
time. Suffice to say I feel one way, others feel a different way.

Most particularly though, it's clear to me that in many cases, the current
behavior hides the very type of bug it's supposed to prevent. I find this
distasteful, if not unacceptable. In the interest of simplifying the
language (specification or compiler), the usefulness of the error has been
significantly reduced.

Pete
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
Ah. I see. I misinterpreted "interdependency" somehow. Hindsight being
20/20, I don't see how I did that. But I did. My mistake.

Still, I don't find this example overly complicated. After exiting the
try/catch block, there are a limited number of values that "fValid" can
have, one of which is tied to the initialization of "num" as well as the
use of "num". That is, at the point of evaluating the use of "num", the
compiler could easily detect that the block of code it's looking at only
occurs when "num" is initialized.

I think your idea of "easily" is different to mine :) I'd be interested
to see how you'd express the difference between the case where fValid
is true and the case where it's false in a way that doesn't make for a
horrendously complicated spec.
That said, I'm not really interested in debating whether the compiler
ought to or ought not to handle this. It doesn't today, it appears that
there is no chance it will ever handle it, and thus I find it pointless to
debate whether it ought to or not. The debate is moot and a waste of
time. Suffice to say I feel one way, others feel a different way.

Fair enough.
Most particularly though, it's clear to me that in many cases, the current
behavior hides the very type of bug it's supposed to prevent. I find this
distasteful, if not unacceptable. In the interest of simplifying the
language (specification or compiler), the usefulness of the error has been
significantly reduced.

It's been reduced *occasionally*. For the vast majority of code, it
does the right thing, without making the spec particularly complicated.
As someone who likes to be able to reason about the correct behaviour
of the compiler without taking hours about it (and the spec already
falls down on this criterion in some places) I'm happy with the balance
that's been struck.
 
P

Peter Duniho

I think your idea of "easily" is different to mine :) I'd be interested
to see how you'd express the difference between the case where fValid
is true and the case where it's false in a way that doesn't make for a
horrendously complicated spec.

Well, that comes down to the question of language design and
specification. *Far* outside my area of expertise, and I admit I don't
have any good answers for how one would express this in a reasonable,
concise specification.

But the implementation details don't seem onerous to me. I guess the main
problem would be that even solving the simpler cases I'm complaining
about, one would still be left with situations where the variable is
initialized but not recognized as such (mainly where the controlling
variable is set externally somehow or isn't a simple integral type, that
sort of thing).

I guess one thing that would reduce my complaint somewhat would be if the
error message were changed to reflect the fact that the compiler may be
mistaken. Instead of saying there's a use of an unassigned variable, it
might say something like "cannot determine that the variable is
assigned". Basically, instead of stating that the programmer has made a
mistake, admit that the code might be correct but that the compiler isn't
capable of determining that.

I know that might sound silly, but sometimes it all has to do with how
something is said. :)

Pete
 
V

valentin tihomirov

The more complex the rules are, the harder they are to reason about and
understand. They also make it much harder to verify that a compiler is
correct.

And you respond to my post where I highlight that the only rule which is
need is "(1) never use unassigned vars". It is ultimately simple and easy to
understand.

I like languages where the rules are reasonably easy to understand,
even if that means that occasionally I have to spell things out a bit
more for the compiler's benefit.

Again, "Spelling out more" means here: shutting up the compiler checks by
garbage-initializing the vars. This is after you have artificially
overcomplicated your grammar by describing where our trivial rule [1] stops
working. You are happy doing nonsense for less security claming the
opposite.


I agree that the rare weird garbage initialization is better trade-off than
no cheking at all.
 
J

Jon Skeet [C# MVP]

I guess one thing that would reduce my complaint somewhat would be if the
error message were changed to reflect the fact that the compiler may be
mistaken. Instead of saying there's a use of an unassigned variable, it
might say something like "cannot determine that the variable is
assigned". Basically, instead of stating that the programmer has made a
mistake, admit that the code might be correct but that the compiler isn't
capable of determining that.

I know that might sound silly, but sometimes it all has to do with how
something is said. :)

I take your point - I would actually use the terminology from the spec,
and say that the variable is "not definitely assigned". For one thing,
that means that a google search would come up with appropriate bits of
the spec :)
 
J

Jon Skeet [C# MVP]

valentin tihomirov said:
And you respond to my post where I highlight that the only rule which is
need is "(1) never use unassigned vars". It is ultimately simple and easy to
understand.

It's not easy for the compiler to implement, nor for the specification
to make clear what should be allowed at compile time.

Note that there's a difference between "unassigned" and "not definitely
assigned according to the language specification". The compiler
prevents you from using the latter, which in turn always prevents the
former too.
I like languages where the rules are reasonably easy to understand,
even if that means that occasionally I have to spell things out a bit
more for the compiler's benefit.

Again, "Spelling out more" means here: shutting up the compiler checks by
garbage-initializing the vars. This is after you have artificially
overcomplicated your grammar by describing where our trivial rule [1] stops
working. You are happy doing nonsense for less security claming the
opposite.

In some cases, yes. It's far from an "every day" occurrence
I agree that the rare weird garbage initialization is better trade-off than
no cheking at all.

But you would rather see a specification which is insanely complicated?

Exactly how far would you go, anyway? Consider the following program:

using System;

class Program
{
static void Main(string[] args)
{
int x;

if (ReturnTrue())
{
x = 5;
}
Console.WriteLine (x);
}

static bool ReturnTrue()
{
return true;
}
}

Should the compiler have to cope with that case as well, realising that
ReturnTrue() will always return true, and therefore x will always be
assigned a value?

How about:

using System;

class Program
{
static void Main(string[] args)
{
int x;

object o = SomeOtherClass.GetSomeReference();

if (IsNull(o) || !IsNull(o))
{
x = 5;
}
Console.WriteLine (x);
}

static bool IsNull(object o)
{
return o==null;
}
}

Now, *we* can work out that IsNull won't change its return value
between calls, and so it ends up as if (someBool || !someBool) which is
always going to be true, but would you expect the compiler to?

Sooner or later there will always be a boundary which the compiler
can't cross, unless you expect it to effectively solve the halting
problem - at which point I don't even want to *imagine* the spec.

In other words, there's a balance to be made between
compiler/specification complexity and never having to introduce an
assignment just for the sake of the compiler. I think the C# designers
got the balance about right - the spec is reasonably readable, and it
still relatively rarely complains at the wrong time.

Can you list exactly which extra situations you'd include, and which
you'd consider too complicated?
 
P

Peter Duniho

[...]
Can you list exactly which extra situations you'd include, and which
you'd consider too complicated?

Well, I'm not valentin, but I would draw the line at local versus global.
That is, if it can be determined looking only at the current function,
then it should be. Conversely, it's fair to accept that the compiler
should not have to follow calls and analyze side-effects to do the
analysis.

As I mentioned elsewhere, I admit that this means there will still be
cases where the variable is assigned, but not detected as "definitely
assigned". But it would go a long way to addressing the "I can look at
just this code and know the variable is assigned" issue.

And of course, as mentioned before, rewording the error would be good
too. I agree with your suggestion to use "not definitely assigned" so
that a Google search can lead the programmer to a more relevant discussion.

Pete
 
J

Jon Skeet [C# MVP]

Peter Duniho said:
Well, I'm not valentin, but I would draw the line at local versus global.
That is, if it can be determined looking only at the current function,
then it should be. Conversely, it's fair to accept that the compiler
should not have to follow calls and analyze side-effects to do the
analysis.

<snip>

How far should it have to go with the local analysis though? Here's an
example with simple booleans, which I suspect you'd want to pass:

static void Foo(bool a, bool b)
{
int c;
if (a && b)
{
c = 5;
}
if (b && !a)
{
c = 10;
}
if (!b)
{
c = 15;
}
Console.WriteLine (c);
}

Now let's make it a *bit* harder:

static void Foo(int a, bool b)
{
int c;
if ((a==10) && b)
{
c = 5;
}
if (b && (a != 10)
{
c = 10;
}
if (!b)
{
c = 15;
}
Console.WriteLine (c);
}

Here it the compiler would have to realise that the value of a couldn't
be equal to 10 and not equal to 10 at the same time.


Now let's make it really hard (but still only using locals):

static void Foo(int a, int b)
{
int c;

if (a < 0 || a > 5 || b < 0 || b > 5)
{
c = 0;
}
if (a+b < 10)
{
c = 1;
}
Console.WriteLine (c);
}

My *guess* is that you'd allow that to be too hard for the compiler to
understand (even though it meets your "can be determined looking only
at the current function" criterion) - but what about the middle
example? Easy enough or not?
 
A

Andy

My bug report was rejected as "by design", with the argument that it would
be too complicated for the compiler to figure out that fValid is actually
initialized in all code paths leading to its use. Note that there's no
looping, nor any interdependency between variables. It's a strictly "code
goes through here, or it goes through here" sort of thing. The only way
for the code to not reach "fValid = true" is for an exception to occur, in
which case it will reach "fValid = false" before it reaches the "if()"
statement.

While your code is simple and we can tell what you want, its also bad
design IMO. You're using exceptions to control program flow by simply
trying something to see if it bombs or not, and continuing on. If you
really don't want the ICE to stop your method from returning a value,
simply initting fValid to false before the try block and swallowing
the ICE is a better design.

Catch is meant to attempt to recover from an error; in this case,
there's nothing we can do to recover. The method may still be allowed
to execute (which is the case), so we simply do nothing.
 
J

Joel Lucsy

Peter said:
Sure. Here's the one I posted to the VS2005 bug report database:

{
int num;
bool fValid;
try
{
num = Convert.ToInt32("50");
fValid = true;
}
catch (InvalidCastException)
{
fValid = false;
}
if (fValid)
{
Console.WriteLine("Value of number is " + num); // CS0165 here
}
}

To throw in my two cents, Convert.ToInt32( String ) is documented as
throwing two exceptions: FormatException and OverflowException. Neither
of which is InvalidCastException.
Unfortunately, this opens a path to the if that will leave it in an
uninitialized state.
 
J

Jon Skeet [C# MVP]

To throw in my two cents, Convert.ToInt32( String ) is documented as
throwing two exceptions: FormatException and OverflowException. Neither
of which is InvalidCastException.
Unfortunately, this opens a path to the if that will leave it in an
uninitialized state.

No it doesn't - if either of those exceptions are thrown, the exception
will be bubbled up to the calling method without the "if" being
evaluated at at..
 
P

Peter Duniho

To throw in my two cents, Convert.ToInt32( String ) is documented as
throwing two exceptions: FormatException and OverflowException. Neither
of which is InvalidCastException.
Unfortunately, this opens a path to the if that will leave it in an
uninitialized state.

Well, a) when I wrote that code, I am sure that either the documentation I
was consulting said to do what I did, or I simply empirically determined
that the InvalidCastException would be thrown in the case I cared about,
and b) you are wrong that there is a path for num being uninitialized but
used, since if a different exception is thrown it won't be caught by my
try/catch block and no more statements in that function will execute.

More generally, the code is just an example of the situation I'm talking
about. I guess it's par for the course on Usenet, but people ought to
think twice about whether there's really a point to critiquing the exact
nature of code that is posted simply for the purpose of demonstrating some
point unrelated to their critique.

IMHO, of course.

Pete
 
P

Peter Duniho

How far should it have to go with the local analysis though? Here's an
example with simple booleans, which I suspect you'd want to pass:

static void Foo(bool a, bool b)
{
int c;
if (a && b)
{
c = 5;
}
if (b && !a)
{
c = 10;
}
if (!b)
{
c = 15;
}
Console.WriteLine (c);
}

No, actually I wouldn't expect that to pass. I agree that you don't need
to know the actual input values of a and b to resolve the question, but it
goes beyond what I expect the compiler to do. I really am just talking
about the very simplest cases, where the controlling variable is itself
initialized locally and the compiler can generate a list of known possible
values (or ranges of values) it will take on during the execution of the
method.

That sort of analysis is very basic, IMHO and would address most of the
obvious cases. As an example, the code you posted, while I can resolve
through logical analysis that c is definitely assigned, it is not readily
apparent just by inspecting the code, whereas I feel that it is in the
example I posted.

(Now...all that said, bools are a special case, in that there's only two
possible values for them anyway, and so the compiler could easily examine
all possibilities without doing the actual logical analysis. But I'm
satisfied saying that we don't need the compiler to treat bools as
different from other variable types, and so it doesn't need to take
advantage of the fact that they only have two possible values).

I suppose there's room for disagreement there too, but that's how I see it.

Pete
 
Top