Finally which ORM tool?

Frans Bouma [C# MVP] · Oct 20, 2007

This gets long and time consuming.

We're having a discussion mainly on misunderstandings from both sides,
so it's not really that relevant to keep going on for long, I'll
address briefly (I hope) what you misinterpreted from my texts.

Well, two things here:

1) There will always be an expression tree if you implement IQueryable

and who will create that tree? It you implement all extension methods
of QUeryable yourself, you can directly translate what you get passed
in. Mind you: after the first extension method, the IQUeryable object
you get passed in is your own query object you will return as the
result of the query.

Of course, you will run into problems in corner cases, so it's
advicable to implement an expression tree parser, but it's not
required. I've seen extensionmethod only linq providers.

2) Even if you're using LINQ to objects

I haven't looked into that, but it was my understanding the
IEnumerable extension methods handled the input directly.

Anyway, I think we have a misunderstanding here: IMHO the compiler
emits calls to extension methods of Queryable when you're writing the
query definition in code. This is then executed at runtime, and as the
extension methods of queryable are creating an expression tree, you'll
get an expression tree as result of that execution. As the enumerable
execution gets the expression tree (which is the query object itself,
however if your extension methods create something else, that's fine
too) and parses it, if necessary.

Though it's IMHO not a given that there is an expression tree.
Perhaps we're having a misunderstanding what an expression tree is, but

I don't find a single Expression said:
No, they're not the inner workings of the abstraction. They're part
of the C# language spec in terms of how a query expression is
compiled.

Let me make this very clear: when I say "lambda expression" I don't
mean System.Linq.Expressions.LambdaExpression I mean an expression
such as
x => x+1
or
(int x, int y) => x+y

That's the meaning of "lambda expression" in the context of C#.

Now, if you can provide a C# query expression which doesn't involve a
translation which uses lambda expressions, I'll be impressed.

1) I didn't mean what you said above. What I referred to was the where
c.CustomerID=="CHOPS" expression. That's a boolean predicate. It is
translated to a lambda and that's translated to a tree of expression
objects (memberaccess, constant etc. etc.) of various levels deep.

2) what I see is a where foo==constant predicate. If that gets
_translated_ to something else, I dont care, because I'm at the source
side, not at the side where the translated stuff is processed, that's
the compiler, why should I wonder how where foo==constant is translated
into IL and later on in WHERE [dbo].[T].[Foo]=@p, if I'm at the level
of where foo==constant?

THAT was what I was referring to. Not that there aren't lambda's.
That's a misunderstanding, of course there are lambdas. What I meant
was: in where foo==constant, there aren't lambdas IN YOUR FACE. There
might be lambdas in the translation along the way and what not, but I
don't care, I don't see them in THAT PARTICULAR statement. And because
I don't see them, I don't have to know about them, as they appear when
the compiler kicks in. But that's out of my sight, below the
abstraction level.

However, you're arguing that THAT lower abstraction level, namely the
level where: 'where foo==constant' is translated to something, is
important to be able to operate on the abstraction level of 'where
foo==constant'. I then say: IF so, the abstraction level of 'where
foo==constant' is leaky and therefore bogus.

From memory, I believe it's something along the lines of:

nw.Customers.SelectMany (c => nw.Orders,
(c, o) => c);

If there were a "where" clause or anything else between the "from"
and the "select" then it would be more complicated, with a
transparent identifier involved.

and where did you get that knowledge? You read that in a manual or did
you check how the tree gets translated? If the latter, do you expect
from developers to check every time how a query is translated to
extension methods and lambda's if they don't use extension methods and
lambda's DIRECTLY in their code? I surely hope not!

Now, that may not be exactly right, but:
a) I know which parts are used as lambda expressions

You could only know that from experiments. It's not obvious a
selectmany is introduced and a tempresult in an anonymous type.

b) If I want to know the exact details, I could look it up in the C#
spec (or a good book

It's not in the C# spec. (At least not in the 3.0 spec doc I have
here. )

Again, expression trees are in some ways irrelevant to this - the
point is that it's translated into a lambda epxression.

Ok.

I can go on and on here, but I think that's repeating myself. You seem
to have the firm stance that if you have sourcecode S in format F, and
a compiler translates that to format F', you have to know F'. I don't,
I find that irrelevant, because the compiler does that for me, I work
with F, not F', as I'm not doing the work for the compiler, the
compiler is much better in that.

The abstraction level is C#. The "query expression" to "method call
with lambda expressions" translation is at the C# level of
abstraction, which is why it's defined by the C# spec.

Then show me in the C# spec where the table is where I can find all
the translations. I tried to find it, but the doc isn't really helpful
in this. There's for example no specification in the C# 3.0 spec
document where it defines how
from o in nw.orders
join c in nw.customers on c.id equals o.cid into od

is translated, or how let x=... is translated. There are examples, but
not a full detailed list.

I didn't expect one, as it's not really relevant, because it would
mean that the abstaction level created by the C# query language is
actually moot.

No - IL isn't specified by the C# spec, but the translation of query
expressions is.

where? These translations aren't 1:1 replacements, mind you. So they
need to have a big table somewhere where everything is described, how
every code fragment is translated. But all there is is a C# language
grammar of query specifications for join, where etc. but no translation
specs.

It's like knowing that Nullable<int> and int? mean the same thing, or
that System.Int32 and int mean the same thing, or that foreach calls
GetEnumerator and the IEnumerable members (as well as Dispose at the
end).

You're mixing syntactical sugar with compiler translations. I find
that not the same thing. Substitutions of elements with other elements
like nullable<T> or System.Int32 isn't the same as translating 'where
c.foo == (c.bar * 10)' into a deep tree of Expression derived class
instances.

These are some of the rules which govern the language, and C#
developers should be aware of them.

I find synonyms between non-terminals OK to know, but I find it for
example irrelevant how foreach(){} is translated into IL. Why should I
care? If I wanted to operate on the abstraction level BELOW
foreach(){}, I'd write the code of getting the enumerator, moving along
etc. manually. We're also not arguing how a for() is rolled into code,
right?

It certainly is relevant, as it's part of the specified behaviour of
C# and it affects behaviour. How can it not be relevant?

It's common sense: something at abstraction level n+1 shouldn't be
affected by details of abstraction level n, if it would, abstaction
level n+1 is useless.

Or are you going to say that every abstraction level should be used
with the knowledge how the underlying abstraction levels work? I don't
think so.

Btw, I wasn't saying lambda's don't exist, see my remark earlier in
this post.

The exact nature of how lambda expressions are converted into
expression trees is not covered in the spec, but the translation from
query expression into non-query expression form certainly is.

Show me how join identifier in expression on expression equals
expression into identifier is translated, in the C# spec.

No, individual expression trees are built by the compiler (or rather,
the compiler inserts code to build the expression trees) and then the
Queryable extension methods combine the expression trees together.

Then what's a tree in this case? An individual Expression<Func<>>? The
tree is build by the extension methods of Queryable. If you implement
them yourself, you can do whatever you want and insert the data you get
passed in directly into the result object which is your Query object
(which implements IQueryable) so after

var q = ...;
you have q which is an IQueryable but not a tree, just a single
object.

Look at the extension methods in Queryable: they all take expression
trees as their parameters. How are you going to call them if you
don't create expression trees yourself? You could pass in null, I
suppose, but that wouldn't be terribly useful

You don't NEED to call them. They're there provided for you to build
the tree for you. But if you don't want to parse a tree, you can just
implement all the extension methods yourself, and you get the input
passed in and you can handle it directly instead of building a tree for
evaluation later on.

Then I don't see why you're getting some of the details wrong above.
You keep referring to lambda expressions as if they only exist in
expression trees, whereas I'm talking about the lambda expressions
which are part of the specified translation of query expressions.

You misunderstood me.

Yes - one is part of a query expression, in a clause which is
translated into a lambda expression, and one is just a parameter in a
method call.

where does the translation take place? Right, during compilation. Do I
have to be part of that process? no, I'm the developer feeding the
translator.

It's exactly the same as this sort of situation:

List<int> list = new List<int> { 1, 2, 3 };

int i = 0;

int index = list.FindFirst (i,
delegate (int y) { return y==i; });

Again, the variable "i" is used twice - in one place it's evaluated
as part of calling the method, whereas in the delegate it's evaluated
within the method itself.

I also find that inconsistent. Let's agree on disagreeing if this is
consistent or not.

That's only the same type of inconsistency as passing one parameter
by ref and another by value, however. The behaviour is well defined
at the C# abstraction level.

there I specify what's done. In the code examples you and I posted,
it's implicit, not explicit.

If the developer works at the C# level, they should understand C# as
a language. The fact that one of your variables is being used as part
of a lambda expression (not necessarily part of an expression tree)
is clearly part of the spec.

Somewhere translation takes place, but how exactly is unclear and IMHO
also irrelevant as it happens AT COMPILETIME.

I'm afraid I'll continue to disagree with you.

that's ok

C# is the level of abstraction they're operating on, and the
translation is part of that level.

how can translation be part of the level of the language? Because if
it would, the translation result is also part of the level of the
language, and thus: n+1 AND n are both relevant abstraction levels.
Which means n+1 is irrelevant, because n is important.

You know, what if the C# compiler does things differently in C# 6.0 ?
Then I still have my where foo==constant, but how it's translated under
the hood is irrelevant, it can be completely different from how it's
done today. THAT's why there's an abstraction level above the
translation results.

It's defined in the C# spec, and therefore I believe it's at the
right abstraction level.

where? Just some words that there is a translation and some examples,
but no clear 'this becomes that', which ARE necessary as the
translations aren't 1:1.

In particular, if a developer doesn't know about the translation,
what do you expect them to know about?

grammar.
if the developer knows what where foo==constant does, why does the
developer has to know what that becomes in the end? It has various
stages!

Obviously you can't just type
in any combination of words and get a valid C# program - how much of
C#'s rules do you expect people to know? Would you expect them to
know, for instance, that in the expression "a ? b : c" either b or c
is evaluated, but not both?

isn't that what the grammar says? If it's required to know what the
compiler bakes from a piece of sourcecode, the language is hard to use.

If you do expect people to know that, why should they also not have
to know about what query expressions are about? If they don't need to
know the translation, how should people understand query expressions?
Or do you think they should just type keywords at random until
something compiles?

If you expect them to know the translation, you also do expect them
the graph it ends up in? As that's tied to it. ANd why does it stop
there for you?

So you expect people to know how a for loop is constructed in the
translated code? I don't. I have for, the grammar of for, the
description of the for statement, that's it. How it's translated is
irrelevant, that's up to the compiler. Otherwise, I'm babysitting the
compiler.

All abstractions are leaky, IMO. All ORMs are leaky - are they
automatically flawed too?

why are all or mappers leaky? an abstraction which doesnt force you to
dig into lower layers because the details are required, isn't leaky.

I'd hope so, but it doesn't seem that way from how you use them.
Lambda expressions aren't "created at runtime" - they're present in
the *compile-time* translation of query expression to non-query
expression.

yes I know. But what's created at compile time is a lower abstraction.
I don't know a single IL statement, I hardly ever look at how the IL
looks like from C# compiled code, do I have to? I don't think so, as
I'm using a compiler to do that work for me.

You need to know what the compiler must do in terms of the C# spec.
You don't need to know how it achieves that goal.

I need to write code which complies to the C# spec so the code I've
written is doing what it is expected to do. What the compiler bakes of
it, how many passes it has, how it translates where foo == constant
into whatever, and how that influences what happens next, why should I
care?

It's not at the level of the output of the compiler: that is beyond
the remit of the C# spec. The query expression translation is within
the remit of the C# spec.

translation results are translation results, so you're saying it's
important to know translation results to be able to write code in the
format BEFORE the translation results?

then tell me, where's the spec for join how it's translated? Or let ?

Hmm... what happens, out of interest? I don't have SQL server 2000
available here.

it dies at runtime with an exception.

And did you run the SQL, or the LINQ query? I don't
know whether LINQ to SQL knows what kind of database it's talking to
at all.

the linq query. it knows, it creates the specific provider for the db
it connects to.

I don't know whether we've crossed wires or not. I'm aware that it's
distinct at a "whole result" level, but SQL Server should (IMO) be
smart enough to know that it can work out the distinctness of the
whole result just from the EmployeeID.
The EmployeeID is unique, and all the results are based on the
Employee table. Therefore if it sees an EmployeeID it's already
returning, it knows that all the rest of the results for that row
will match the row it's already returning. Likewise it knows if it
sees an EmployeeID that it hasn't already seen, that's a "new row to
return" by definition.

distinct is executed on the resultset created by the projection. It
doesn't know if employee is unique, as the last column in the set can
be a function output which differs... so it has to evaluate every field.

Does that make my meaning any clearer? Or did you understand me the
first time and I just didn't understand your reply?

I did understand you

(pfew!)

Isn't that tantamount to saying, "there's no point in letting people
write queries in anything other than SQL"? I think it's absolutely
vital that they're easily able to see the SQL generated, of course.

Most people think in sql, as that's what they know. Moving towards an
o/r query system then requires them to drop the sql completely and use
the system provided. But that's very hard as the system often mimics
sql with some things different.

Yes, it is a bit surprising. What's the SQL Server 2005 trick in
question?

Convert the image to varchar(max). That avoids the distinct error on
text for example, as text isn't a column you can use DISTINCT on, but
varchar(max) is

I thought DISTINCT worked fine in SQL Server 2000 - or is
that only for single-value-select queries? (i.e. a sequence of single
values).

it does, but DISTINCT has limitations: you can't use it on image,
text, ntext fields. so if you have one of these in your query resultset
(projection) and use DISTINCT, you're getting an error.

I don't see how you could expect that to happen without people even
knowing the C# language well enough to know which parts of query
expressions are translated into lambda expressions, and which are
evaluated immediately.

it's sad that that's required indeed, that's my whole point in this
long thread. I don't WANT TO KNOW what it ends up in. That's the point.
My query system is deterministic: people write X and get X.

Well, again I believe that some things which you regard as beneath
the abstraction layer of C# are well within that layer, but even so I
agree that the abstraction will leak.

ok, let's agree on that

However, I'd be very surprised to ever see any ORM where that's not
the case. You've said how you've got customers who have to change
their queries in order to meet the SQL specified by the DBAs - isn't
that the same kind of thing?

No, what I meant was: they wrote a query which uses joins, using
relation objects. The DBA says: "that's slow, use a subquery", so they
remove the relation objects and use FieldCompareSetPredicate objects
which are subqueries.

Just to clarify your language - do you mean a delicate matter?

oops, yes I mean delicate. I should have looked it up. delegate matter
indeed looks like something else :!

It's a
somewhat important distinction given that "delegates" mean something
rather different in this context I'm not trying to criticise your
English for the sake of it - just trying to make sure we don't talk
at cross-purposes.

I would personally have been really unhappy if it had been more
biased towards SQL at the cost of being natural for in-memory
objects. I believe the importance of LINQ to Objects has been
underplayed - I perform queries, orderings, projections etc on
in-memory collections just as often as I do against databases.
Everyone seems focused on LINQ to SQL, but I'm looking forward to
more readable code for in-memory objects.

I think it would have been better if they would have used 3 DSLs, so
you could use the best syntaxis for the target at hand. Now, a
limitation of one keeps the other one also down. (or introduces silly
stuff like 'group joins')

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Jon Skeet [C# MVP] · Oct 20, 2007

<snip>

I'm going to reply in more detail later on, but for the moment these
are the most important bits:

Though it's IMHO not a given that there is an expression tree.
Perhaps we're having a misunderstanding what an expression tree is, but
I don't find a single Expression<Func<...>> a tree.

Well, it is by every definition of expression tree I've ever seen.
Here's the description of Expression<TDelegate>:

<quote>
Represents a strongly typed lambda expression as a data structure in
the form of an expression tree.
</qutoe>

Note the "in the form of an expression tree". Even something like
foo==constant is going to be a tree with more than one node, although
even if it had just one node I'd still call it an expression tree.

Likewise that's what the C# spec calls an expression tree, too:

<quote section="4.6">
Expression trees are values of expression tree types of the form
System.Linq.Expressions.Expression<D>, where D is any delegate type.
For the remainder of this specification we will refer to these types
using the shorthand Expression<D>.
</quote>

Those seem pretty clear to me. Can you give a clear definition of what
you understand an expression tree to be, with evidence that it's a
definition which is accepted elsewhere?

It's not in the C# spec. (At least not in the 3.0 spec doc I have
here. )

You must be using a very out of date spec then.
Download the Unified C# 3.0 spec from here:

http://download.microsoft.com/download/3/8/8/388e7205-bc10-4226-b2a8-
75351c669b09/csharp%20language%20specification.doc

Then look in section 7.15. In this particular instance (multiple from
clauses) the translation to a SelectMany call is detailed in 7.15.2.4.

As I said before, it doesn't detail how a lambda expression is
converted into an expression tree, but it *does* give details of how a
query expression is translated into a "non-query" expression.

Right - more detailed reply coming later on

Jon Skeet [C# MVP] · Oct 20, 2007

You must be using a very out of date spec then.
Download the Unified C# 3.0 spec from here:

I've just checked the May 2006 version, and that had query expression
translation specified in section 26.7.1.

I can't say I remember ever seeing a version without query expression
translation - which version did you have, Frans?

Jon Skeet [C# MVP] · Oct 20, 2007

Frans Bouma said:
This gets long and time consuming.

We're having a discussion mainly on misunderstandings from both sides,
so it's not really that relevant to keep going on for long, I'll
address briefly (I hope) what you misinterpreted from my texts.

I agree there have been a lot of misunderstandings, but I believe it's
a very important topic. I'd personally be happy to leave the LINQ to
SQL flexibility side of the discussion alone, but the "what should a
developer know about query expressions" is quite fundamental, IMO.

I've addressed almost all your points in this post, but a lot of them
boil down to the same thing:

The behaviour of query expressions is well-defined, and is specified
in a C# to C# (*not* C# to IL) translation which should be
understood by developers using query expressions.

and who will create that tree? It you implement all extension methods
of QUeryable yourself, you can directly translate what you get passed
in. Mind you: after the first extension method, the IQUeryable object
you get passed in is your own query object you will return as the
result of the query.

If you write *any* C# query expression which uses an IQueryable as its
source (and doesn't also have the Select, Where methods etc taking

Of course, you will run into problems in corner cases, so it's
advicable to implement an expression tree parser, but it's not
required. I've seen extensionmethod only linq providers.

I haven't looked into that, but it was my understanding the
IEnumerable extension methods handled the input directly.

The IEnumerable extension methods accept delegate instances instead of
expression trees, so no expression trees are built

Anyway, I think we have a misunderstanding here: IMHO the compiler
emits calls to extension methods of Queryable when you're writing the
query definition in code.
Yes.

This is then executed at runtime, and as the
extension methods of queryable are creating an expression tree, you'll
get an expression tree as result of that execution.

While that much is true, the calls to Queryable in the first place
*also* create expression trees to represent the expressions used in the
query expression.

As the enumerable
execution gets the expression tree (which is the query object itself,
however if your extension methods create something else, that's fine
too) and parses it, if necessary.
Yes.

Though it's IMHO not a given that there is an expression tree.
Perhaps we're having a misunderstanding what an expression tree is, but
I don't find a single Expression<Func<...>> a tree.

As I said in my brief post earlier, using the definitions from the
standard and MSDN, they certainly *are* expression trees.

1) I didn't mean what you said above. What I referred to was the where
c.CustomerID=="CHOPS" expression. That's a boolean predicate. It is
translated to a lambda and that's translated to a tree of expression
objects (memberaccess, constant etc. etc.) of various levels deep.

Yes - and that translation isn't performed in Queryable, it's performed
by the compiler and the code to build the expression tree is in the
code which *calls* Queryable.

However, the fact that the compiler converts the where clause into a
lambda expression tree most certainly *is* relevant, and it's not the
inner workings of anything. It's not an implementation detail - it's
part of the specification.

2) what I see is a where foo==constant predicate. If that gets
_translated_ to something else, I dont care, because I'm at the source
side, not at the side where the translated stuff is processed, that's
the compiler, why should I wonder how where foo==constant is translated
into IL and later on in WHERE [dbo].[T].[Foo]=@p, if I'm at the level
of where foo==constant?

You shouldn't care about the translation into IL. You *should* care
about the translation from a query expression into a non-query
expression. The IL side of things is outside the remit of the C# spec,
but the fact that it works as a lambda expression is specified
behaviour.

Why would you not care how your code is specified to behave?

THAT was what I was referring to. Not that there aren't lambda's.
That's a misunderstanding, of course there are lambdas. What I meant
was: in where foo==constant, there aren't lambdas IN YOUR FACE. There
might be lambdas in the translation along the way and what not, but I
don't care, I don't see them in THAT PARTICULAR statement.

Just because you don't see them doesn't mean you shouldn't be aware of
them. You should be aware that anything in a where clause (or the other
relevant bits of a query expression) is converted into a lambda
expression, and that therefore it has the same behaviour with regards
to captured variables, deferred execution etc that other anonymous
functions have.

And because I don't see them, I don't have to know about them, as
they appear when the compiler kicks in. But that's out of my sight,
below the abstraction level.

If I do:

int x = 10;
long y = x;

then I don't *see* that there's a conversion involved, because it's
implicit - but it's still important to know about it, and that the
value of y really *is* a long, and not somehow an int.

However, you're arguing that THAT lower abstraction level, namely the
level where: 'where foo==constant' is translated to something, is
important to be able to operate on the abstraction level of 'where
foo==constant'. I then say: IF so, the abstraction level of 'where
foo==constant' is leaky and therefore bogus.

To me, the abstraction level is "the specified behaviour of C#". In
other words, anything within the specification is fair game as far as
I'm concerned.

I'm not saying that developers should know the spec inside out - but
they shouldn't complain when their code doesn't behave as they expect
it to, but behaves exactly as the spec says it should.

How would you describe the behaviour of query expressions? There's only
one authoritative description of the behaviour IMO, and that's the
language specification. Does the language specification define filters,
projections etc? No - it just gives a mechanical means of translating a
query expression into other known concepts: method calls and lambda
expressions.

and where did you get that knowledge? You read that in a manual or did
you check how the tree gets translated?

I read it in the C# specification.

If the latter, do you expect
from developers to check every time how a query is translated to
extension methods and lambda's if they don't use extension methods and
lambda's DIRECTLY in their code? I surely hope not!

No, checking the tree would certainly be the wrong way of going. What a
good job the specification lays it all out for us.

You could only know that from experiments. It's not obvious a
selectmany is introduced and a tempresult in an anonymous type.

I know it from reading the specification.

It's not in the C# spec. (At least not in the 3.0 spec doc I have
here. )

It's in 7.15.2.4. To quote:

<quote>
A query expression with a second from clause followed by a select
clause

from x1 in e1
from x2 in e2
select v

is translated into
( e1 ) . SelectMany( x1 => e2 , ( x1 , x2 ) => v )
</quote>

That's *exactly* as per your example (and I even got the translation
right by the looks of it

Ok.

I can go on and on here, but I think that's repeating myself. You seem
to have the firm stance that if you have sourcecode S in format F, and
a compiler translates that to format F', you have to know F'. I don't,
I find that irrelevant, because the compiler does that for me, I work
with F, not F', as I'm not doing the work for the compiler, the
compiler is much better in that.

F and F' are both C# in this case, however. F is "C# 3 including query
expressions" and F' is "C# 3 without query expressions". In other
words, if you know F then you know F'. It's converting one form of C#
to a simpler one.

I assert that that is *entirely* relevant, as it is the only way in
which the behaviour of query expressions is specified.

Then show me in the C# spec where the table is where I can find all
the translations.

The whole of section 7.15.2. It's not a table - it's a sequence of
rules.

I tried to find it, but the doc isn't really helpful
in this. There's for example no specification in the C# 3.0 spec
document where it defines how
from o in nw.orders
join c in nw.customers on c.id equals o.cid into od
is translated

From 7.15.2.4 again:

<quote>
A query expression with a join clause with an into followed by a select
clause

from x1 in e1
join x2 in e2 on k1 equals k2 into g
select v

is translated into

( e1 ) . GroupJoin( e2 , x1 => k1 , x2 => k2 , ( x1 , g ) => v )
</quote>

Note that you have to know the clause following the join - the spec
doesn't deal with the "from / join" in isolation, which is fine as it's
never valid in isolation.

or how let x=... is translated.

Again from 7.15.2.4 (it covers a lot of cases):

<quote>
A query expression with a let clause

from x in e
let y = f

is translated into
from * in ( e ) . Select ( x => new { x , y = f } )

There are examples, but not a full detailed list.

It *is* a full detailed list of rules. There are more "real world"
examples with "Country" etc, but the list of translations is all that
is required.

From the start of 7.15.2:

<quote>
A query expression is processed by repeatedly applying the following
translations until no further reductions are possible. The translations
are listed in order of application: each section assumes that the
translations in the preceding sections have been performed
exhaustively, and once exhausted, a section will not later be revisited
in the processing of the same query expression.

I didn't expect one, as it's not really relevant, because it would
mean that the abstaction level created by the C# query language is
actually moot.

Without the list of translations, the behaviour of query expressions
would be completely unspecified. They could do *anything*. The
specification would be incomplete without it. No feature should be
unspecified. Certain aspects of implementation may be unspecified
(there are a few in C#, but not many - exactly how expression trees are
created is one of them) but the translation from a query expression to
a non-query expression is absolutely rock solid.

Put it this way - if you believe there's a query expression whose
translation isn't exactly specified in the spec, I'd like to see it.

where? These translations aren't 1:1 replacements, mind you.

Again, 7.5.2 completely specifies the translation. Every query
expression is replaced with an expression which doesn't involve a query
expression, by the end of the translation process.

So they need to have a big table somewhere where everything is described,
how every code fragment is translated. But all there is is a C# language
grammar of query specifications for join, where etc. but no translation
specs.

Both the grammar and the translation specs are provided, otherwise the
spec would be sorely lacking.

Just because it isn't in the form of a table doesn't mean it's not
there.

You're mixing syntactical sugar with compiler translations. I find
that not the same thing. Substitutions of elements with other elements
like nullable<T> or System.Int32 isn't the same as translating 'where
c.foo == (c.bar * 10)' into a deep tree of Expression derived class
instances.

Be very careful about what you think I'm saying. I didn't say that the
translation resulted in an expression tree. I said it resulted in a
lambda expression. That lambda expression is then processed entirely
within the normal rules of C#, including potentially a conversion to an
expression tree. It's important to understand this two-phase
compilation - first the query expression is translated into a "normal"
C# expression (using lambda expressions and member accesses which may
or may not end up being extension method calls).

It really is syntactic sugar. To quote 7.15.2 again:

<quote>
The translation from query expressions to method invocations is a
syntactic mapping that occurs before any type binding or overload
resolution has been performed.
</quote>

In what way is "a syntactic mapping" the same as "syntactic sugar"?

I find synonyms between non-terminals OK to know, but I find it for
example irrelevant how foreach(){} is translated into IL. Why should I
care?

I don't care about how it's translated into IL - but I *do* care that
it's converted into calls to GetEnumerator(), MoveNext(), Current, etc.
Method calls are at the level of abstraction I'm comfortable with - how
they're represented in IL isn't.

In particular, the way that foreach is specified is precisely in terms
of GetEnumerator() etc. That's how the behaviour is defined.

If I wanted to operate on the abstraction level BELOW
foreach(){}, I'd write the code of getting the enumerator, moving along
etc. manually.

And it's nice to have the syntactic sugar of foreach to make it easier
- but it *is* just syntactic sugar.

We're also not arguing how a for() is rolled into code, right?

The specification of a "for" loop doesn't implicitly involve extra
method calls etc, unlike "foreach", "lock" and "using".

It's common sense: something at abstraction level n+1 shouldn't be
affected by details of abstraction level n, if it would, abstaction
level n+1 is useless.

They're both at the same abstraction level: C#. You don't need to go
below the level of the C# spec to know what the query expression is
translated into - it's all there. That means, apart from anything else,
that when Mono implements a C# 3 compiler (if they haven't already done
so - I haven't checked) we can expect the results to be the same in
terms of query expression translation. The expression trees in the
compiled code may be different, but that's a separate phase from query
expression translation.

Or are you going to say that every abstraction level should be used
with the knowledge how the underlying abstraction levels work? I don't
think so.

No, certainly not. I'm not claiming that everyone should be looking at
the generated IL (instructive as it can sometimes be). I'm claiming
that in order to understand how query expressions behave, you need to
know how they're translated, and that the translation involves lambda
expressions. That's the only way in which the behaviour is defined,
which makes it fundamental to understanding C# 3. If you don't know the
translation involved, you can't claim to understand C# 3.

Btw, I wasn't saying lambda's don't exist, see my remark earlier in
this post.

Well, you wrote things like "It's not a given there are lambda
expressions involved." When query expressions are involved, lambda
expressions are always involved, by the query expression translation
given in the spec.

Show me how join identifier in expression on expression equals
expression into identifier is translated, in the C# spec.

It's in 7.15.2.4. There are slight variations in translation based on
whether the "join" clause is followed by "into" and whether it's then
followed by "select" or something else, but all the paths are catered
for.

Then what's a tree in this case? An individual Expression<Func<>>?
Yes.

The tree is build by the extension methods of Queryable.

Those *combine* expression trees, but the C# compiler is already
creating expression trees (or more accurately, it creates code which
builds expression trees at execution time) in order to pass them as
arguments to the Queryable methods.

Here's a simple example, using LINQ to SQL (which in turn uses
Queryable). I have a single method:

public static void CreateQuery(NorthwindDataContext context)
{
var query = from employee in context.Employees
select employee;
}

That's about as simple as I could make it.

This is translated to:

var query = context.Employees.Select(employee => employee);

The parameter to Select is then compiled into code which generates an
expression tree (in C# terms, there's an implicit conversion from the
lambda expression to an expression tree).

Now, we can use the handy fact that reflector doesn't really "do"
expression trees yet. Here's identical C# to the above:

public static void CreateQuery(NorthwindDataContext context)
{
ParameterExpression CS$0$0000;
IQueryable<Employee> query =
context.Employees.Select<Employee, Employee>
(Expression.Lambda<Func<Employee, Employee>>
(CS$0$0000 = Expression.Parameter(typeof(Employee),
"employee"), new ParameterExpression[]
{ CS$0$0000 }));
}

Now, it's pretty nasty, but I hope it's clear that the code in
CreateQuery is creating an expression tree to pass to Select. The
Queryable members combine expression trees together to make new ones,
but that's hard to do if you haven't got one to begin with!

If you believed that the Queryable methods were the only ones creating
expression trees, how did you think the details of the expressions
within the query expresssions were being passed to those methods?

If you implement
them yourself, you can do whatever you want and insert the data you get
passed in directly into the result object which is your Query object
(which implements IQueryable) so after

var q = ...;
you have q which is an IQueryable but not a tree, just a single
object.

IQueryable has a property "Expression" of type
"System.Linq.Expressions.Expression" - in other words, an expression
tree.

You don't NEED to call them. They're there provided for you to build
the tree for you. But if you don't want to parse a tree, you can just
implement all the extension methods yourself, and you get the input
passed in and you can handle it directly instead of building a tree for
evaluation later on.

Any time you use query expressions, the subexpressions will either be
converted into code to generate delegate instances, or code to generate
expression trees. There's no other way round it.

You misunderstood me.

Well, given the rest of the post I'm replying to I'm still not
convinced that you know what's going on, I'm afraid. I'm willing to
*be* convinced that I've just kept on misunderstanding you, but you
don't seem willing to accept statements which are directly from the C#
specification.

where does the translation take place? Right, during compilation. Do I
have to be part of that process? no, I'm the developer feeding the
translator.

I think I've handled this elsewhere. Yes, you need to understand that
process as otherwise you can't understand C# 3 - the only way that the
behaviour is defined by the spec is in terms of the translation
process.

I also find that inconsistent. Let's agree on disagreeing if this is
consistent or not.
Okay.

there I specify what's done. In the code examples you and I posted,
it's implicit, not explicit.

Well, you're explicitly using a query expression. If you choose not to
know how expressions are handled when they're part of anonymous
functions, then you're choosing to disregard the spec. I see no benefit
in doing that.

Somewhere translation takes place, but how exactly is unclear and IMHO
also irrelevant as it happens AT COMPILETIME.

It's neither unclear (the spec is very precise) and it's not irrelevant
(without that knowledge you can't claim to understand C# 3).

The translation of a "using" statement into a call to Dispose()
effectively in a finally block happens an compilation time too - but do
you believe that's irrelevant?

how can translation be part of the level of the language? Because if
it would, the translation result is also part of the level of the
language, and thus: n+1 AND n are both relevant abstraction levels.
Which means n+1 is irrelevant, because n is important.

Your mistake is in assuming that there are two abstraction levels. The
only way of viewing things that way is to view "C# 3 with query
expressions" and "C# 3 without query expressions" as two different
levels of abstraction. They're both C# though, and they're both
specified in the same document.

You know, what if the C# compiler does things differently in C# 6.0 ?
Then I still have my where foo==constant, but how it's translated under
the hood is irrelevant, it can be completely different from how it's
done today. THAT's why there's an abstraction level above the
translation results.

That would violate the C# 3 specification, in that case, because C# 3
is quite clear how query expressions are translated.

Now, C# 6.0 *could* define that "using" statements should just be
completely ignored - but that doesn't make their behaviour in C# 3 any
less relevant to developers.

where? Just some words that there is a translation and some examples,
but no clear 'this becomes that', which ARE necessary as the
translations aren't 1:1.

There's a complete set of "this becomes that". If you want to claim
that there's any ambiguity, you should provide a complete query
expression which you believe isn't unambiguously translated by the
spec.

If you find one, that would be important as it would be a bug in the
spec. I sincerely doubt that you *will* find one though.

grammar.
if the developer knows what where foo==constant does, why does the
developer has to know what that becomes in the end? It has various
stages!

Grammar only defines whether an expression is valid or not; it doesn't
define how it behaves.

A tool could know C# 3 grammar and thus be able to say, "This is a
syntactically valid piece of C#" but not be able to compile any code.

isn't that what the grammar says?

No, the grammar just says what tokens are valid. It doesn't define the
behaviour. There could be two languages with an identical grammar but
which behave differently - one could be like C#, and another could
always evaluate all the expressions in "a ? b : c".

If it's required to know what the
compiler bakes from a piece of sourcecode, the language is hard to use.

The developer doesn't need to know what the generated IL is, but they
do need to know what the specified behaviour is. There's a big
difference between the two.

If you expect them to know the translation, you also do expect them
the graph it ends up in? As that's tied to it. ANd why does it stop
there for you?

No, I don't expect them to know the graph - and the two are *not* tied
together. They're separate phases of compilation. The translation from
query expressions into non-query expression happens before the compiler
has even decided what the types involved are. It's a source code to
source code translation.

The difference is that the query expression to non-query expression is
defined by the specification. The compilation from lambda expression to
expression tree is explicitly *not* defined by the specification. From
section 4.6:

<quote>
The exact definition of the generic type Expression<D> as well as the
precise rules for constructing an expression tree when an anonymous
function is converted to an expression tree type, are both outside the
scope of this specification, and are described elsewhere.

So you expect people to know how a for loop is constructed in the
translated code? I don't.

No, and the specification doesn't define it. Spot the difference
between that and query expressions.

I have for, the grammar of for, the
description of the for statement, that's it. How it's translated is
irrelevant, that's up to the compiler. Otherwise, I'm babysitting the
compiler.

But the difference is that with query expressions you have the grammar
of them, and the translation of them into non-query expressions. That's
it. You need to know the translation in order to have any hope of
understanding the behaviour.

why are all or mappers leaky? an abstraction which doesnt force you to
dig into lower layers because the details are required, isn't leaky.

With all ORMs, you need to be able to see the generated SQL and
potentially change your query if the one that the ORM has selected for
you is unsatisfactory in some fashion. You've already acknowledged that
elsewhere in the thread.

SQL is a lower abstraction than the query language of the ORM, so any
need to see or modify the SQL indicates a leaky abstraction.

yes I know. But what's created at compile time is a lower abstraction.
I don't know a single IL statement, I hardly ever look at how the IL
looks like from C# compiled code, do I have to? I don't think so, as
I'm using a compiler to do that work for me.

What I've suggested you *should* know isn't IL. It's a C# to C#
translation.

I need to write code which complies to the C# spec so the code I've
written is doing what it is expected to do. What the compiler bakes of
it, how many passes it has, how it translates where foo == constant
into whatever, and how that influences what happens next, why should I
care?

You should care because the only way that you can know how a query
expression should behave according to the spec is to know the
translations it specifies.

translation results are translation results, so you're saying it's
important to know translation results to be able to write code in the
format BEFORE the translation results?

There are two translations: C# to C#, and C# to IL. You don't need to
know the latter. You *do* need to know the former.

then tell me, where's the spec for join how it's translated? Or let ?

I think I've answered that above a few times

it dies at runtime with an exception.

Shame. Still, at least there are the alternatives.

the linq query. it knows, it creates the specific provider for the db
it connects to.

Right. I wonder how that works for compiled queries...

distinct is executed on the resultset created by the projection. It
doesn't know if employee is unique, as the last column in the set can
be a function output which differs... so it has to evaluate every field.

But in this case it can see that the projection *doesn't* have a
function with output which differs: CONVERT is deterministic in this
case, and it should know that. The server has all the information it
should need to work out that all it needs to look at is the ID.

Most people think in sql, as that's what they know. Moving towards an
o/r query system then requires them to drop the sql completely and use
the system provided. But that's very hard as the system often mimics
sql with some things different.

Right - but that doesn't mean that the SQL isn't important, *or* that
there isn't benefit in thinking at the O/R level.

it's sad that that's required indeed, that's my whole point in this
long thread. I don't WANT TO KNOW what it ends up in. That's the point.
My query system is deterministic: people write X and get X.

C# query expression translations are deterministic too. You write a
particular query expression, you'll get the same result as if you wrote
it out "longhand" using the translated method calls etc.

Now, you claim that "people write X and get X". How many of your
customers would be able to look at *any* LLBLGen query expression you
give them, and tell you exactly what SQL will be produced? I'd be very
surprised if it's more than a very small proportion. Now, even if they
do, they've had to learn that somehow.

Why do you object to people learning the translation rules specified in
C# 3, but think it's important that they should know them for LLBLGen?

I think it would have been better if they would have used 3 DSLs, so
you could use the best syntaxis for the target at hand. Now, a
limitation of one keeps the other one also down. (or introduces silly
stuff like 'group joins')

And at that point, you've got people having to learn three things
instead. LINQ would be relatively pointless in that case, IMO. I really
*like* the consistency aspect of it.

In my book I've got the same data in five different sources: vanilla
objects, SQL, an untyped data set, a typed data set, and XML. I make a
point of showing how consistent the queries can be (within limitations,
of course - but there are query correspondencies). I think that's a
very impressive demonstration of LINQ (not my input to it, but the
consistency).

Frans Bouma [C# MVP] · Oct 21, 2007

Jon said:
I've just checked the May 2006 version, and that had query expression
translation specified in section 26.7.1.

I can't say I remember ever seeing a version without query expression
translation - which version did you have, Frans?

Hmm, I then have an outdated document.

I have the C# 3.0 language specification, which has 18 sections. I've
downloaded it when it came out, august 30. Is there yet another,
better, specification? That would be GREAT, because I can't find info I
need in this spec (the stuff you could find in your version of the spec
but I couldn't find in the one I had)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Frans Bouma [C# MVP] · Oct 21, 2007

Jon said:
I'm going to reply in more detail later on, but for the moment these
are the most important bits:

Well, it is by every definition of expression tree I've ever seen.
Here's the description of Expression<TDelegate>:

<quote>
Represents a strongly typed lambda expression as a data structure in
the form of an expression tree.
</qutoe>

Note the "in the form of an expression tree". Even something like
foo==constant is going to be a tree with more than one node, although
even if it had just one node I'd still call it an expression tree.

Ok, I see I'm yet again not clear enough.

Expression<Func<Foo,Bar>>

is a single expression, I said 'I don't find that a tree' is meant as:
a tree with 1 node isn't a tree.

If the expression object itself contains multiple nodes in the
arguments etc. etc. then you have a tree. I'll try to be more clear
next time, so we can cut down the post sizes

Likewise that's what the C# spec calls an expression tree, too:

<quote section="4.6">
Expression trees are values of expression tree types of the form
System.Linq.Expressions.Expression<D>, where D is any delegate type.
For the remainder of this specification we will refer to these types
using the shorthand Expression<D>.
</quote>

Those seem pretty clear to me. Can you give a clear definition of
what you understand an expression tree to be, with evidence that it's
a definition which is accepted elsewhere?

We're on the same page, I wasn't clear enough. I was referring to a
single node, not a tree structure build with elements inside the node.
my remark was unclear.

You must be using a very out of date spec then.
Download the Unified C# 3.0 spec from here:

http://download.microsoft.com/download/3/8/8/388e7205-bc10-4226-b2a8-
75351c669b09/csharp%20language%20specification.doc

Then look in section 7.15. In this particular instance (multiple from
clauses) the translation to a SelectMany call is detailed in 7.15.2.4.

I'm indeed working with an outdated doc, I do have section 7.15.2.4,
which doesn't describe everything though (join with anonymous types to
join between compound fk/pks), and I do have 18 sections but not more.
I'll check the doc you pointed to, as I do miss info.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Jon Skeet [C# MVP] · Oct 21, 2007

Frans Bouma said:
Hmm, I then have an outdated document.

I have the C# 3.0 language specification, which has 18 sections. I've
downloaded it when it came out, august 30.

That's the same one I've got. The version with section 26.7.1 was from
May 2006.

Is there yet another,
better, specification? That would be GREAT, because I can't find info I
need in this spec (the stuff you could find in your version of the spec
but I couldn't find in the one I had)

It sounds like you *have* got the same one as me. Have a look at the
sections I specified by number.

Jon Skeet [C# MVP] · Oct 21, 2007

Frans Bouma said:
Ok, I see I'm yet again not clear enough.

Expression<Func<Foo,Bar>>

is a single expression, I said 'I don't find that a tree' is meant as:
a tree with 1 node isn't a tree.

Expression<Func<Foo,Bar>> is an expression tree by the definition of
"expression tree" in both the spec and MSDN.

When someone (using spec terminology) talks about a lambda expression
being converted into an expression tree, they mean converting a lamdba

expression into a runtime instance of Expression said:
If the expression object itself contains multiple nodes in the
arguments etc. etc. then you have a tree. I'll try to be more clear
next time, so we can cut down the post sizes

Suppose you had a data structure of "RedBlackTree" or something
similar, would you expect it to throw an exception if you only
populated it with one node?

Here's a wikipedia article on "tree" as used in computer science. Which
part of that disqualifies trees with only a root node?

We're on the same page, I wasn't clear enough. I was referring to a
single node, not a tree structure build with elements inside the node.
my remark was unclear.

So do you agree that a single node still counts as an "expression
tree" in spec terminology? Because that's what I intend to use for
clarity.

In fact, I'm not sure whether any lamdba expression will ever end up as
an expression with a single node - there will always be the

I'm indeed working with an outdated doc, I do have section 7.15.2.4,
which doesn't describe everything though (join with anonymous types to
join between compound fk/pks), and I do have 18 sections but not more.
I'll check the doc you pointed to, as I do miss info.

No, I think you've got exactly the same document as I have. Whether the
expression uses anonymous types or not is irrelevant - the translation
only deals at the level of the expression as a whole.

Compound fk/pks are just dealt with by more complicated expressions
within the join - they're mapped in exactly the same way though.

If you post an example of something you believe isn't catered for, I'll
try to detail the rules applied in the translation.

Ironically, I think the actual translation is *simpler* than you
imagine it to be, because it really doesn't do a lot - it just applies
blocks of fairly unthinking logic to the query expression, breaking it
down one bit at a time.

The only really tricky bit is transparent identifiers, IMO.

Frans Bouma [C# MVP] · Oct 21, 2007

Jon, I have to leave the discussion after this post. It's so time
consuming I can't spend more time on this. I've spend over 4 hours
replying this one, and as it's my only free day this week, I can't
afford more time on this.

I agree there have been a lot of misunderstandings, but I believe
it's a very important topic. I'd personally be happy to leave the
LINQ to SQL flexibility side of the discussion alone, but the "what
should a developer know about query expressions" is quite
fundamental, IMO.

Ok, though I have the feeling we're not going to agree on that one.

I've addressed almost all your points in this post, but a lot of them
boil down to the same thing:

The behaviour of query expressions is well-defined, and is specified
in a C# to C# (*not* C# to IL) translation which should be
understood by developers using query expressions.

I object to the 'should' part. For example the join keyword, is
'translated' to an extension method call which has as 4th parameter an
anonymous type. An identifier is introduced to REFER to that anonymous
type in subsequential extension method calls on the IQueryable returned
by the join extension method call.

Now, I personally find it too far to force people to understand that
that happens when they type in C# code with 'join' and not with a Join
extension method call, for the simple reason that it's IMHO unnecessary
info to be able to write the join code successfully.

Software development is already complex enough. If developers also
have to know the inner workings of the compiler used, it will become
more complex than IMHO necessary as there are languages which don't
force the developer to know this kind of detailed material.

If you write any C# query expression which uses an IQueryable as its
source (and doesn't also have the Select, Where methods etc taking
delegate instances instead of Expression<D>), the resulting code will
always build an expression tree, in the calling code.

A tree with a single node, not a tree with multiple nodes, as the
methods called, Where, Select etc. are YOUR extension methods. So
instead of creating method call expressions and effectively building
the tree, there's no tree. They fill in the data in the source object
(arg 1).

Yes - and that translation isn't performed in Queryable, it's
performed by the compiler and the code to build the expression tree
is in the code which calls Queryable.

though, as the compiler performs the translation, it's IMHO not info a
normal developer should know.

However, the fact that the compiler converts the where clause into a
lambda expression tree most certainly is relevant, and it's not the
inner workings of anything. It's not an implementation detail - it's
part of the specification.

I've read your other posts first, and thought about it some more, and
indeed you're right: the (c)=>(c.Orders.Count>10); expression is a
tree, as 'Orders' references etc. are added as Expression nodes in the
tree references from the LambdaExpression.Body.

2) what I see is a where foo==constant predicate. If that gets
translated to something else, I dont care, because I'm at the source
side, not at the side where the translated stuff is processed,
that's the compiler, why should I wonder how where foo==constant is
translated into IL and later on in WHERE [dbo].[T].[Foo]=@p, if I'm
at the level of where foo==constant?

Click to expand...

You shouldn't care about the translation into IL. You should care
about the translation from a query expression into a non-query
expression.

Though, why should I care about that? I haven't seen an argument about
why I should care about that as a normal developer.

The IL side of things is outside the remit of the C#
spec, but the fact that it works as a lambda expression is specified
behaviour.

Why would you not care how your code is specified to behave?

because it's expressed in the code at abstraction level n+1, which has
a described behavior, so why should the developer, writing code at n+1,
with described behavior (as in: this keyword does that), care about how
it's translated and into what it is translated at level n? Simply
because what's written at n+1 is expressive enough (with 'join' the
only way) to specify what the developer wanted.

Just because you don't see them doesn't mean you shouldn't be aware
of them. You should be aware that anything in a where clause (or the
other relevant bits of a query expression) is converted into a lambda
expression, and that therefore it has the same behaviour with regards
to captured variables, deferred execution etc that other anonymous
functions have.

I know I have to be aware of that BECAUSE the effects of the
translation are something to worry about when writing C# code, but that
doesn't describe WHY that isn't abstracted well enough: a compiler
result (be it phase one: translating into extension method calls and
lambda creation etc.) is never to be considered really useful for code
development, as it will mean the code development is done in a language
which requires you to be aware of what it ends up in, i.e. an
abstraction level that's not totally abstract.

If I do:

int x = 10;
long y = x;

then I don't see that there's a conversion involved, because it's
implicit - but it's still important to know about it, and that the
value of y really is a long, and not somehow an int.

If the compiler doesn't cry about a conversion, I should assume that y
is a long and x is an int and both have the semantic value of 10, in
their own types, as THAT's what the code says. It doesn't describe any
conversion, because the conversion is in the abstraction level below
the code's level (otherwise I would have been forced to call the
conversion explicitly!).

To me, the abstraction level is "the specified behaviour of C#". In
other words, anything within the specification is fair game as far as
I'm concerned.

Ok, fair enough. I think with a detailed spec as the C# spec you can
go very deep in what is 'with the scec'. though I look at what a
developer is required to know to be able to write code in C# when the
grammar is understood and the keywords are known. If a developer is
required to learn the grammar and a 500 page book about the spec,
before the developer is ABLE to write a program without errors
(otherwise, due to lack of knowledge of the spec, the developer can
make mistakes!), I think something is skewed.

I'm not saying that developers should know the spec inside out - but
they shouldn't complain when their code doesn't behave as they expect
it to, but behaves exactly as the spec says it should.

If they write code at the level as described in the grammar in section
7.15 (the grammar is explained there), then it should work as that
grammar describes it to. They then shouldn't be forced to learn the
specifics of the inner extension method calls or which extension
methods are called, why should they? They specified a join keyword and
the rest what the grammar describes and it therefore should work
properly. If not, why bother with that grammar anyway, if what you
should be working on (the abstraction level) is the level of extension
method calls (and passing solely lambda's instead of foo==constant. )

How would you describe the behaviour of query expressions? There's
only one authoritative description of the behaviour IMO, and that's
the language specification. Does the language specification define
filters, projections etc? No - it just gives a mechanical means of
translating a query expression into other known concepts: method
calls and lambda expressions.

That's the description of the mechanism. I find the spec particularly
lacking any info about what is to be done when a
join-clause:
join typeopt identifier in expression on expression
equals expression

non-terminal is specified in the code.

The DEVELOPER wants ABC, so s/he looks for constructs to ACHIEVE ABC
in the language and framework at hand (C# and .NET). The only thing MS
has to do is to describe the connection between ABC and C#/.NET.

How that join non-terminal is translated into extension method calls
and lambda's... is that specifying what's to be expected?

I don't see how that defines what's to be expected. Yes, it's hard,
because the provider/target for the query is the one which decides what
happens, but that's not the problem of the reader of the spec nor the
developer using the language: it's the problem of the provider of the
language.

F and F' are both C# in this case, however. F is "C# 3 including
query expressions" and F' is "C# 3 without query expressions". In
other words, if you know F then you know F'. It's converting one form
of C# to a simpler one.

I assert that that is entirely relevant, as it is the only way in
which the behaviour of query expressions is specified.

no, it's the only way in which the MECHANISM is specified, but not the
behavior. What's the result of the code is undefined unless the target
used defines the behavior. (so it's leaky beyond 1 level deeper)

Without the list of translations, the behaviour of query expressions
would be completely unspecified.

isn't it now still undefined? I mean: what the behavior is in the end
is undefined. The developer might now know the translation rules (all
10 pages of them) by the letter, but still has no clue what will be
done when 'join' is typed into the code.

They could do anything. The
specification would be incomplete without it. No feature should be
unspecified. Certain aspects of implementation may be unspecified
(there are a few in C#, but not many - exactly how expression trees
are created is one of them) but the translation from a query
expression to a non-query expression is absolutely rock solid.

Put it this way - if you believe there's a query expression whose
translation isn't exactly specified in the spec, I'd like to see it.

2 joins, one with into and a DefautltIfEmpty clause. A join with a
where following it, a join with a compound fk/pk check, requiring an
anonymous type.

One could sit down and puzzle them together with the rules they
provided, but that will take experience and time and it's not trivial
IMHO.

Also, it's complex work. Far more developers understand a left join in
SQL than the syntaxis with anonymous types which produce a hierarchical
'group' join in Linq. Do I learn that from the spec, how it works? No.
However, it is apparently required to be able to know what to write.

I find that a bit strange, considering:

var q = from c in nw.Customers
join o in nw.Orders on o.CustomerID equals c.CustomerID
where o.EmployeeID==3
select c.CompanyName;

isn't writable in C# with extension methods without a separate class as
you can't specify in the lambda passed to Where() the reference to the
anonymous type which is the result of the join.

So one has to learn n+1, but also n, however n+1 is sometimes needed,
but n is required as well.

Am I the only one here who finds this a bit of a big challenge for
novice developers and developers who have to write a linq query here
and there?

Be very careful about what you think I'm saying. I didn't say that
the translation resulted in an expression tree. I said it resulted in
a lambda expression. That lambda expression is then processed
entirely within the normal rules of C#, including potentially a
conversion to an expression tree.

It's important to understand this
two-phase compilation - first the query expression is translated into
a "normal" C# expression (using lambda expressions and member
accesses which may or may not end up being extension method calls).

It really is syntactic sugar. To quote 7.15.2 again:

<quote>
The translation from query expressions to method invocations is a
syntactic mapping that occurs before any type binding or overload
resolution has been performed.
</quote>

In what way is "a syntactic mapping" the same as "syntactic sugar"?

I said it was different, so I don't know when they're the same?

I don't care about how it's translated into IL - but I do care that
it's converted into calls to GetEnumerator(), MoveNext(), Current,
etc. Method calls are at the level of abstraction I'm comfortable
with - how they're represented in IL isn't.

In particular, the way that foreach is specified is precisely in
terms of GetEnumerator() etc. That's how the behaviour is defined.

So, it's necessary to know that the functionality under the hood is
ABC?

How do you explain to a developer who sees the foreach
keyword/statement for the first time what it does?
a) the whole getenumerator() explanation, which can go very deep
b) foreach(T a in IEnumerableT)
{
// gets executed for each a of type T in IEnumerableT
}

10 to 1 option b is grasped within seconds and a program can be
written. Sure, "in what order are the elements pulled out?" "Well, the
order in which the IEnumerableT gives them to you." "ok".

We can go all formal and how it works inside, but that's a technical
burden for developers which goes way too deep IMHO. The spec is 519
pages. You can't require from someone to know every detail in there to
be able to write software in C#. I certainly don't know many details in
there, nor do I want to know these details. I've spent enough time of
my life writing compilers and assembler, please let me live above these

.

(hmmm. I'm not even halveway through the post... )

And it's nice to have the syntactic sugar of foreach to make it
easier - but it is just syntactic sugar.

I call foreach a language element. How it works underneath, I don't
care. Example: at the moment GetEnumerator() is called. But if they
change it in C# 6.0 to a different construct, I don't care as I operate
on the level of foreach, not on the level of GetEnumerator, MoveNext()
etc., because if I wanted to live at that level, I'd call
GetEnumerator() myself.

The specification of a "for" loop doesn't implicitly involve extra
method calls etc, unlike "foreach", "lock" and "using".

though if the inner workings of the compiler change so these
methodcalls aren't necessary anymore, would the semantic behavior of
foreach change? No.

No, certainly not. I'm not claiming that everyone should be looking
at the generated IL (instructive as it can sometimes be). I'm
claiming that in order to understand how query expressions behave,
you need to know how they're translated, and that the translation
involves lambda expressions. That's the only way in which the
behaviour is defined, which makes it fundamental to understanding C#
3. If you don't know the translation involved, you can't claim to
understand C# 3.

Jon, isn't that a little far fetched? I mean: if you know the grammar
at page 223, you know how to write the queries, but you don't NEED to
know how for example the join keyword and its operands are translated
into lambda's and a method call, and how the compiler uses transparent
identifiers (any reader here who knows what these are yet? They're
essential to understand how the Join() extension method works inside a
chain of method calls for example. they're at 7.15.2.7)

I find it, sorry to say it, way too much to require to know section
7.15 to the letter to be able to understand C# 3.0.

It's in 7.15.2.4. There are slight variations in translation based on
whether the "join" clause is followed by "into" and whether it's then
followed by "select" or something else, but all the paths are catered
for.

you have to combine 7.15.2.7 with it, to understand the complex goo
you sometimes get when you add multiple joins, compound key joins etc.

As I said earlier: it's manageable to puzzle it all back to the
original rules, but if one is required to know all this to write linq
queries properly... Because, all this just describes a mechanism to get
to the door of the provider, which is half the equation.

The tree is build by the extension methods of Queryable.

Click to expand...

Those combine expression trees, but the C# compiler is already
creating expression trees (or more accurately, it creates code which
builds expression trees at execution time) in order to pass them as
arguments to the Queryable methods.

Here's a simple example, using LINQ to SQL (which in turn uses
Queryable). I have a single method:

public static void CreateQuery(NorthwindDataContext context)
{
var query = from employee in context.Employees
select employee;
}

That's about as simple as I could make it.

This is translated to:

var query = context.Employees.Select(employee => employee);

The parameter to Select is then compiled into code which generates an
expression tree (in C# terms, there's an implicit conversion from the
lambda expression to an expression tree).

Now, we can use the handy fact that reflector doesn't really "do"
expression trees yet. Here's identical C# to the above:

public static void CreateQuery(NorthwindDataContext context)
{
ParameterExpression CS$0$0000;
IQueryable<Employee> query =
context.Employees.Select<Employee, Employee>
(Expression.Lambda<Func<Employee, Employee>>
(CS$0$0000 = Expression.Parameter(typeof(Employee),
"employee"), new ParameterExpression[]
{ CS$0$0000 }));
}

Now, it's pretty nasty, but I hope it's clear that the code in
CreateQuery is creating an expression tree to pass to Select. The
Queryable members combine expression trees together to make new ones,
but that's hard to do if you haven't got one to begin with!
true.

If you believed that the Queryable methods were the only ones
creating expression trees, how did you think the details of the
expressions within the query expresssions were being passed to those
methods?

I didn't think of that, see my remark 200 pages above

I think I've handled this elsewhere. Yes, you need to understand that
process as otherwise you can't understand C# 3 - the only way that
the behaviour is defined by the spec is in terms of the translation
process.

I can only conclude that what's described in 7.15 is describing the
mechanism, but not behavior. This is essential, as the behavior is what
the developer is after, not the mechanism.

It's neither unclear (the spec is very precise) and it's not
irrelevant (without that knowledge you can't claim to understand C#
3).

The translation of a "using" statement into a call to Dispose()
effectively in a finally block happens an compilation time too - but
do you believe that's irrelevant?

I find it irrelevant it's in a finally block. I find it relevant to
know that 'using' calls IDisposable.Dispose() on the object created in
the operand of using at the end of the code block. that's it, as that's
what functionality wise is what using does. I don't care where the
construction of the object is placed inside the try or that there is a
try, why should I care about that? It's code I'll never see. Life's too
short to care about that kind of details, simply because the KEYWORD
'using' and its operands provide me with the functionality I'm looking
for. So when I apply that to the code I'm writing, I can expect what
using provides at that spot where I applied it. That's all I want to
know. HOW it's done underneath, that's great for lull moments when I've
nothing better to do (which are rare) and I definitely don't expect the
details to be first class material to know what the statement does.

Your mistake is in assuming that there are two abstraction levels.

I don't think it's a mistake, there are 2 IMHO: one without extension
methods (or only the ones not in the C# grammar for query expressions)
and one WITH extension methods, the translated versions, which are
often getting obscure because of transparent identifiers, which are
compiler generated. I never expect myself to be forced to operate on
that abstraction level, and I never expect someone else to be forced to
be working on that level as well.

The only way of viewing things that way is to view "C# 3 with query
expressions" and "C# 3 without query expressions" as two different
levels of abstraction. They're both C# though, and they're both
specified in the same document.

Doesn't that extend the essential knowledge of how C# works to more
than 500 pages of text and details? That's a truckload of info for a
language which has such a small set of keywords.

That would violate the C# 3 specification, in that case, because C# 3
is quite clear how query expressions are translated.

Now, C# 6.0 could define that "using" statements should just be
completely ignored - but that doesn't make their behaviour in C# 3
any less relevant to developers.

I was trying to describe that the behavior of using, the functionality
the statement provides doesn't change when the inner workings change,
so that the inner workings are irrelevant for understanding what the
statement does.

Grammar only defines whether an expression is valid or not; it
doesn't define how it behaves.

A tool could know C# 3 grammar and thus be able to say, "This is a
syntactically valid piece of C#" but not be able to compile any code.

I was referring to the grammar on page 223. What else is there? A 10
page list of rules which are describing a mechanism which doesn't lead
me to any more understanding what join will do, semantically. So if I
know what the statements in the grammar do (semantically) I can write
software. How they work in the compiler space, it's not of my concern.

Grammar comes with descriptions what the terminals and non-terminals
do/mean. That should be enough to learn a language. That's why I said
'grammar'.

No, the grammar just says what tokens are valid. It doesn't define
the behaviour.

neither does section 7.5.2.*. That describes a mechanism, but doesn't
describe any meaning what join does.

There could be two languages with an identical grammar
but which behave differently - one could be like C#, and another
could always evaluate all the expressions in "a ? b : c".

I know, Jon. That's not what I meant. If I have to learn a new
programming language, I look at the grammar, look at the description of
the grammar and the language should be clear. If there are fine grained
details in the language, thus with some statements, it's a cumbersome
language.

So, if you have to work with a new database, which has obviously its
own SQL dialect, how do you learn about the SELECT statement's details?
Do you look at some specs where the select statement is cut to pieces
with translations to inner algebra? Or are you looking at the EBNF and
the description per terminal/non-terminal?

No, I don't expect them to know the graph - and the two are not tied
together. They're separate phases of compilation.

Oh, so one phase is required but the other one isn't? Why? Because one
is in the doc and the other one isn't?

The translation
from query expressions into non-query expression happens before the
compiler has even decided what the types involved are. It's a source
code to source code translation.

perhaps the C# compiler has 10 phases, I don't know. Often these kind
of compilers have more than 1 pass. So do I have to know which pass is
giving what to the rest? Why? Why do I have to know what the compiler
spits out with transparent identifiers and to what they refer to?

But the difference is that with query expressions you have the
grammar of them, and the translation of them into non-query
expressions.

... which often results in a puzzle, what gets translated into what,
what becomes an anonymous type, where are which transparent identifiers
introduced...

if _THAT_ is required knowledge to write queries, I find it the most
stupid query language on the planet. (and I've seen many) Luckily, it's
not required to write queries.

That's it. You need to know the translation in order to
have any hope of understanding the behaviour.

You don't know anything about the behavior without the detailed specs
of the provider used. The mechanism described in 7.15 doesn't teach you
anything about behavior of the code at runtime.

With all ORMs, you need to be able to see the generated SQL and
potentially change your query if the one that the ORM has selected
for you is unsatisfactory in some fashion. You've already
acknowledged that elsewhere in the thread.

no I didn't acknowledge that in this thread. I said: a DBA, looking at
the performance of a query, can advice, based on statistics and data
size to use a _different_ query construct. If teh developer isn't able
to do so, the abstraction is leaky. If the developer IS able to do so
(which they can with some o/r mappers including ours) the abstraction
isn't leaky IMHO, as it comes down to chosing a different algorithm
based on profiling data.

SQL is a lower abstraction than the query language of the ORM, so any
need to see or modify the SQL indicates a leaky abstraction.

semantically that's not true. SQL is a set based language and o/r
mappers are the glue between entities in-memory and entities in the
relational model, but nothing more. So it should be perfectly fine to
use SQL as an o/r mapper language. eSQL proofs that.

Shame. Still, at least there are the alternatives.

it's actually a shame indeed, because DISTINCT is sometimes implicitly
emitted for TOP queries: ('*' added for simplicity)

SELECT DISTINCT TOP 4 E.*
FROM Employees E INNER JOIN Orders O
ON E.EmployeeID = O.EmployeeID
WHERE O.CustomerID = @customerID

this query doesn't work if DISTINCT isn't there, because of the 1:n
relation between employee and order. (it runs, but you'll get 4
duplicates, likely) However, in Linq, if I state .Take(4), I don't
explicitly specify .Distinct(), but it's implicit. A good abstraction
then would switch to a mode where the TOP is performed properly, i.e.
fetch the first n different entities. As they already should have a
mechanism which only fetches unique entities in a collection (haven't
checked, but I assume they have) it's a no-brainer. Oh well..

But in this case it can see that the projection doesn't have a
function with output which differs: CONVERT is deterministic in this
case, and it should know that. The server has all the information it
should need to work out that all it needs to look at is the ID.

The order in which elements are executed is pretty good described in
the SQL spec.

Here, you assume something is expected behavior. But it's not. You
wonder why, and it's described in some SQL spec. Another gem:

select distinct top 5 country
from customers
order by customerid

gives an error. while:
select country
from customers
order by customerid

works.

strange? Sure. however, the sql spec says that order by is executed
after the projection. It works in the second query, because the db
inserts a hidden column to sort on. With DISTINCT, this can give
problems as the hidden column COULD give non-duplicate values but not
for the columns in the resultset.

How the order is of distinct filtering and optimization of the select,
that's not defined in the SQL syntaxis, but it also isn't required
knowledge IMHO, just that DISTINCT filters duplicate rows in the
resultset.

The example I gave above is a thing I'd like to see changed in the
mechanism as it requires inner workings specifications where syntax and
element descriptions should be enough, not some ISO standard which
isn't even free to access.

I hope you see my point now that it's IMHO not doable to require to
know inner workings of the C# compiler if you know the statement
syntaxis and what it does.

C# query expression translations are deterministic too. You write a
particular query expression, you'll get the same result as if you
wrote it out "longhand" using the translated method calls etc.

Now, you claim that "people write X and get X". How many of your
customers would be able to look at any LLBLGen query expression you
give them, and tell you exactly what SQL will be produced? I'd be
very surprised if it's more than a very small proportion. Now, even
if they do, they've had to learn that somehow.

actually, our query system is very close to SQL's elements. There are
just a few elements (like requesting case insensitive comparisons
inserts UPPER() calls etc. but these are minor). So it's easy to
determine what the sql will look like. That's also a goal of the
system, because any magic layer in between will give problems with
tweaking the output.

We do have a translation mechanism with operator overloads. So:
IPredicate p = CustomerFields.CompanyName == "Foo Inc";

will create a FieldCompareValuePredicate() instance similar to:
IPredicate p = new
FieldCompareValuePredicate(CustomerFields.CompanyName,
ComparisonOperator.Equals, "Foo Inc");

I do take the point that the operator overloads somewhat form the n+1
layer of 'where c.CompanyName=="Foo Inc"', but there's a difference:
the filtering system is taught with the predicate classes, and after
that the operator overloads are explained as shortcuts, but not as THE
system. There are also no transparent identifiers introduced, the
conversion is many 1:1.

However at the level of predicate classes, they're close to SQL
predicates. Relation objects are close to join clauses. Sort expression
are close to order by clauses. Field expressions are close (equal to)
field expressions in sql.

Why do you object to people learning the translation rules specified
in C# 3, but think it's important that they should know them for
LLBLGen?

I object to them because you aren't any wiser when you know them (as a
linq USER) as no sql constructs are known so the behavior is unknown,
and the abstraction is also unclear IMHO in some cases as it takes a
couple of puzzles to get from specification form A to B. I find it a
requirement that if a person operates on a given abstraction level,
things should be doable at that abstraction level.

And at that point, you've got people having to learn three things
instead. LINQ would be relatively pointless in that case, IMO. I
really like the consistency aspect of it.

In my book I've got the same data in five different sources: vanilla
objects, SQL, an untyped data set, a typed data set, and XML. I make
a point of showing how consistent the queries can be (within
limitations, of course - but there are query correspondencies). I
think that's a very impressive demonstration of LINQ (not my input to
it, but the consistency).

Sure, there are cases where one select statement can be run on many
database types, oracle, sqlserver etc. However there are other cases
where it's not the case and those are the ones which cause problems.

But I have to quit here. You corrected me a couple of times and I
thank you for that. I looked up a new spec doc and I now see the pages
I overlooked, so I'm a little wiser now. I also hope I've described why
I think it's undoable to force a developer to learn all these specifics
as that's actually a bridge too far for many (not because they can't
handle it, but because it's too time consuming and complex so it will
make Linq MORE complex than what they already have, inline SQL or
procs.).

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------

Jon Skeet [C# MVP] · Oct 21, 2007

Frans Bouma said:
Jon, I have to leave the discussion after this post. It's so time
consuming I can't spend more time on this. I've spend over 4 hours
replying this one, and as it's my only free day this week, I can't
afford more time on this.

One of the twins has just killed the message I was half way through,
which had taken rather a long time already, so I'm calling it quits
too. My most important point in the thread is this one:

A developer wishing to use query expressions in C# 3 should *at least*
understand that there *is* a translation process involved, and that it
will translate many of their expressions into lambda expressions.
Without that knowledge, C# 3 will indeed look inconsistent. However,
it's easy to gain that much knowledge without learning every detail of
the spec, and with that knowledge the whole of LINQ will make a lot
more sense.

If people don't want to learn that, they should stick to using LINQ by
calling methods in the normal way (including extension methods):

dataSource.Where(user => user.Name.StartsWith("Fred"))
.Select(user => user.Age);

(or whatever).

Personally I think it's worth spending the little bit of time involved
to learn what's going on with query expressions, as they're more
readable when you know what's going on. If someone doesn't want to do
that, that's fine. What I would object to is someone using query
expressions without having the faintest idea how they're meant to
behave, and then complaining when they don't do what they're expected
to. It's like starting up a chainsaw, holding it against your legs and
then complaining when you get hurt.

JimDaGeek · Dec 6, 2007

AliRezaGoogle said:
Hi
Currently there are some good ORM tools like NHibernate and LLBGL.
Microsoft is working on it's own one ( I think it's name is
EntitySpace) but not released yet.
I want to know members idea about selecting one of these ORMs:

--Should we wait untill Microsoft release his own ORM? ( and maybe
after two or three years again releases another one and we have to
throw our experiences away and styart to learn new one?)
--Choosing current ORMs is nice? (Does for example NHibernate remain
with .NET forever with support? What about future release of .NET? Is
there any quaranty that future features of .NET remain compatible with
current ORM frameworks? )

Regards

NHibernate would be the way to go. Why wait for MS? If/When MS releases
something, the first version will be buggy and most certainly MS-ONLY.

Sorry, but for me I want choice and don't want to be just locked into some
MS-ONLY implementation.

You have a good question, but I think you should look more internal for your
answers. Stop waiting for MS to do something. Just code.

Code to standards, not what MS wants you to. What is the point of a ORM if
it only works with SQL Server and is MS-ONLY? Seriously, that would suck.

SQL Server is a good DB, but there are others that are better for
somethings. Don't limit yourself or your code to MS-ONLY. That sucks.

So my answer is, why wait on MS? What will you gain? Nothing. What will
you lose? Good, cross-platform code. MS doesn't make "solutions" that are
global and cross-platform. Look at the history of MS.

MS says .Net is "Cross-platform". Uhh, how cross-platform is it really? I
tried, it is not very cross-platform.

Sun made Java cross-platform, MS didn't make .Net cross-platform. .Net suck
under OS X and it suck under Linux. Compare that to Java which works great
under both.

I stopped waiting or hoping for MS to do anything that benefits
cross-platform or developer needs. All of the tools from MS will have ONE
GOAL. To force you to use MS Windows.

Ignacio Machin \( .NET/ C# MVP \) · Dec 6, 2007

Hi,

Have you try Subsonic ?

Does ORM tools work with datagrids or data bindings?	4	Oct 5, 2007
object-relational mapping (ORM) frameworks for .NET	6	Aug 5, 2007
IdeaBlade stories?	6	Aug 24, 2005
DataObjects .NET & ORM Packages	2	Mar 1, 2005
Does the DataSet have a future?	4	Mar 3, 2005
ASP.NET ORM (e.g., Active Record / Ruby on Rails)	3	Jun 9, 2006
ASP.NET Worker Process Memory Growing	1	Sep 7, 2006
Which version of Visual Studio	4	Apr 19, 2007

Finally which ORM tool?

Frans Bouma [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Frans Bouma [C# MVP]

Frans Bouma [C# MVP]

Jon Skeet [C# MVP]

Jon Skeet [C# MVP]

Frans Bouma [C# MVP]

Jon Skeet [C# MVP]

JimDaGeek

Ignacio Machin \( .NET/ C# MVP \)

Ask a Question

Similar Threads