F
Frans Bouma [C# MVP]
This gets long and time consuming.
We're having a discussion mainly on misunderstandings from both sides,
so it's not really that relevant to keep going on for long, I'll
address briefly (I hope) what you misinterpreted from my texts.
and who will create that tree? It you implement all extension methods
of QUeryable yourself, you can directly translate what you get passed
in. Mind you: after the first extension method, the IQUeryable object
you get passed in is your own query object you will return as the
result of the query.
Of course, you will run into problems in corner cases, so it's
advicable to implement an expression tree parser, but it's not
required. I've seen extensionmethod only linq providers.
I haven't looked into that, but it was my understanding the
IEnumerable extension methods handled the input directly.
Anyway, I think we have a misunderstanding here: IMHO the compiler
emits calls to extension methods of Queryable when you're writing the
query definition in code. This is then executed at runtime, and as the
extension methods of queryable are creating an expression tree, you'll
get an expression tree as result of that execution. As the enumerable
execution gets the expression tree (which is the query object itself,
however if your extension methods create something else, that's fine
too) and parses it, if necessary.
Though it's IMHO not a given that there is an expression tree.
Perhaps we're having a misunderstanding what an expression tree is, but
1) I didn't mean what you said above. What I referred to was the where
c.CustomerID=="CHOPS" expression. That's a boolean predicate. It is
translated to a lambda and that's translated to a tree of expression
objects (memberaccess, constant etc. etc.) of various levels deep.
2) what I see is a where foo==constant predicate. If that gets
_translated_ to something else, I dont care, because I'm at the source
side, not at the side where the translated stuff is processed, that's
the compiler, why should I wonder how where foo==constant is translated
into IL and later on in WHERE [dbo].[T].[Foo]=@p, if I'm at the level
of where foo==constant?
THAT was what I was referring to. Not that there aren't lambda's.
That's a misunderstanding, of course there are lambdas. What I meant
was: in where foo==constant, there aren't lambdas IN YOUR FACE. There
might be lambdas in the translation along the way and what not, but I
don't care, I don't see them in THAT PARTICULAR statement. And because
I don't see them, I don't have to know about them, as they appear when
the compiler kicks in. But that's out of my sight, below the
abstraction level.
However, you're arguing that THAT lower abstraction level, namely the
level where: 'where foo==constant' is translated to something, is
important to be able to operate on the abstraction level of 'where
foo==constant'. I then say: IF so, the abstraction level of 'where
foo==constant' is leaky and therefore bogus.
and where did you get that knowledge? You read that in a manual or did
you check how the tree gets translated? If the latter, do you expect
from developers to check every time how a query is translated to
extension methods and lambda's if they don't use extension methods and
lambda's DIRECTLY in their code? I surely hope not!
You could only know that from experiments. It's not obvious a
selectmany is introduced and a tempresult in an anonymous type.
It's not in the C# spec. (At least not in the 3.0 spec doc I have
here. )
Ok.
I can go on and on here, but I think that's repeating myself. You seem
to have the firm stance that if you have sourcecode S in format F, and
a compiler translates that to format F', you have to know F'. I don't,
I find that irrelevant, because the compiler does that for me, I work
with F, not F', as I'm not doing the work for the compiler, the
compiler is much better in that.
Then show me in the C# spec where the table is where I can find all
the translations. I tried to find it, but the doc isn't really helpful
in this. There's for example no specification in the C# 3.0 spec
document where it defines how
from o in nw.orders
join c in nw.customers on c.id equals o.cid into od
is translated, or how let x=... is translated. There are examples, but
not a full detailed list.
I didn't expect one, as it's not really relevant, because it would
mean that the abstaction level created by the C# query language is
actually moot.
where? These translations aren't 1:1 replacements, mind you. So they
need to have a big table somewhere where everything is described, how
every code fragment is translated. But all there is is a C# language
grammar of query specifications for join, where etc. but no translation
specs.
You're mixing syntactical sugar with compiler translations. I find
that not the same thing. Substitutions of elements with other elements
like nullable<T> or System.Int32 isn't the same as translating 'where
c.foo == (c.bar * 10)' into a deep tree of Expression derived class
instances.
I find synonyms between non-terminals OK to know, but I find it for
example irrelevant how foreach(){} is translated into IL. Why should I
care? If I wanted to operate on the abstraction level BELOW
foreach(){}, I'd write the code of getting the enumerator, moving along
etc. manually. We're also not arguing how a for() is rolled into code,
right?
It's common sense: something at abstraction level n+1 shouldn't be
affected by details of abstraction level n, if it would, abstaction
level n+1 is useless.
Or are you going to say that every abstraction level should be used
with the knowledge how the underlying abstraction levels work? I don't
think so.
Btw, I wasn't saying lambda's don't exist, see my remark earlier in
this post.
Show me how join identifier in expression on expression equals
expression into identifier is translated, in the C# spec.
Then what's a tree in this case? An individual Expression<Func<>>? The
tree is build by the extension methods of Queryable. If you implement
them yourself, you can do whatever you want and insert the data you get
passed in directly into the result object which is your Query object
(which implements IQueryable) so after
var q = ...;
you have q which is an IQueryable but not a tree, just a single
object.
You don't NEED to call them. They're there provided for you to build
the tree for you. But if you don't want to parse a tree, you can just
implement all the extension methods yourself, and you get the input
passed in and you can handle it directly instead of building a tree for
evaluation later on.
You misunderstood me.
where does the translation take place? Right, during compilation. Do I
have to be part of that process? no, I'm the developer feeding the
translator.
I also find that inconsistent. Let's agree on disagreeing if this is
consistent or not.
there I specify what's done. In the code examples you and I posted,
it's implicit, not explicit.
Somewhere translation takes place, but how exactly is unclear and IMHO
also irrelevant as it happens AT COMPILETIME.
that's ok
how can translation be part of the level of the language? Because if
it would, the translation result is also part of the level of the
language, and thus: n+1 AND n are both relevant abstraction levels.
Which means n+1 is irrelevant, because n is important.
You know, what if the C# compiler does things differently in C# 6.0 ?
Then I still have my where foo==constant, but how it's translated under
the hood is irrelevant, it can be completely different from how it's
done today. THAT's why there's an abstraction level above the
translation results.
where? Just some words that there is a translation and some examples,
but no clear 'this becomes that', which ARE necessary as the
translations aren't 1:1.
grammar.
if the developer knows what where foo==constant does, why does the
developer has to know what that becomes in the end? It has various
stages!
isn't that what the grammar says? If it's required to know what the
compiler bakes from a piece of sourcecode, the language is hard to use.
If you expect them to know the translation, you also do expect them
the graph it ends up in? As that's tied to it. ANd why does it stop
there for you?
So you expect people to know how a for loop is constructed in the
translated code? I don't. I have for, the grammar of for, the
description of the for statement, that's it. How it's translated is
irrelevant, that's up to the compiler. Otherwise, I'm babysitting the
compiler.
why are all or mappers leaky? an abstraction which doesnt force you to
dig into lower layers because the details are required, isn't leaky.
yes I know. But what's created at compile time is a lower abstraction.
I don't know a single IL statement, I hardly ever look at how the IL
looks like from C# compiled code, do I have to? I don't think so, as
I'm using a compiler to do that work for me.
I need to write code which complies to the C# spec so the code I've
written is doing what it is expected to do. What the compiler bakes of
it, how many passes it has, how it translates where foo == constant
into whatever, and how that influences what happens next, why should I
care?
translation results are translation results, so you're saying it's
important to know translation results to be able to write code in the
format BEFORE the translation results?
then tell me, where's the spec for join how it's translated? Or let ?
it dies at runtime with an exception.
the linq query. it knows, it creates the specific provider for the db
it connects to.
distinct is executed on the resultset created by the projection. It
doesn't know if employee is unique, as the last column in the set can
be a function output which differs... so it has to evaluate every field.
I did understand you (pfew!)
Most people think in sql, as that's what they know. Moving towards an
o/r query system then requires them to drop the sql completely and use
the system provided. But that's very hard as the system often mimics
sql with some things different.
Convert the image to varchar(max). That avoids the distinct error on
text for example, as text isn't a column you can use DISTINCT on, but
varchar(max) is
it does, but DISTINCT has limitations: you can't use it on image,
text, ntext fields. so if you have one of these in your query resultset
(projection) and use DISTINCT, you're getting an error.
it's sad that that's required indeed, that's my whole point in this
long thread. I don't WANT TO KNOW what it ends up in. That's the point.
My query system is deterministic: people write X and get X.
ok, let's agree on that
No, what I meant was: they wrote a query which uses joins, using
relation objects. The DBA says: "that's slow, use a subquery", so they
remove the relation objects and use FieldCompareSetPredicate objects
which are subqueries.
oops, yes I mean delicate. I should have looked it up. delegate matter
indeed looks like something else :!
I think it would have been better if they would have used 3 DSLs, so
you could use the best syntaxis for the target at hand. Now, a
limitation of one keeps the other one also down. (or introduces silly
stuff like 'group joins')
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
We're having a discussion mainly on misunderstandings from both sides,
so it's not really that relevant to keep going on for long, I'll
address briefly (I hope) what you misinterpreted from my texts.
Well, two things here:
1) There will always be an expression tree if you implement IQueryable
and who will create that tree? It you implement all extension methods
of QUeryable yourself, you can directly translate what you get passed
in. Mind you: after the first extension method, the IQUeryable object
you get passed in is your own query object you will return as the
result of the query.
Of course, you will run into problems in corner cases, so it's
advicable to implement an expression tree parser, but it's not
required. I've seen extensionmethod only linq providers.
2) Even if you're using LINQ to objects
I haven't looked into that, but it was my understanding the
IEnumerable extension methods handled the input directly.
Anyway, I think we have a misunderstanding here: IMHO the compiler
emits calls to extension methods of Queryable when you're writing the
query definition in code. This is then executed at runtime, and as the
extension methods of queryable are creating an expression tree, you'll
get an expression tree as result of that execution. As the enumerable
execution gets the expression tree (which is the query object itself,
however if your extension methods create something else, that's fine
too) and parses it, if necessary.
Though it's IMHO not a given that there is an expression tree.
Perhaps we're having a misunderstanding what an expression tree is, but
I don't find a single Expression said:No, they're not the inner workings of the abstraction. They're part
of the C# language spec in terms of how a query expression is
compiled.
Let me make this very clear: when I say "lambda expression" I don't
mean System.Linq.Expressions.LambdaExpression I mean an expression
such as
x => x+1
or
(int x, int y) => x+y
That's the meaning of "lambda expression" in the context of C#.
Now, if you can provide a C# query expression which doesn't involve a
translation which uses lambda expressions, I'll be impressed.
1) I didn't mean what you said above. What I referred to was the where
c.CustomerID=="CHOPS" expression. That's a boolean predicate. It is
translated to a lambda and that's translated to a tree of expression
objects (memberaccess, constant etc. etc.) of various levels deep.
2) what I see is a where foo==constant predicate. If that gets
_translated_ to something else, I dont care, because I'm at the source
side, not at the side where the translated stuff is processed, that's
the compiler, why should I wonder how where foo==constant is translated
into IL and later on in WHERE [dbo].[T].[Foo]=@p, if I'm at the level
of where foo==constant?
THAT was what I was referring to. Not that there aren't lambda's.
That's a misunderstanding, of course there are lambdas. What I meant
was: in where foo==constant, there aren't lambdas IN YOUR FACE. There
might be lambdas in the translation along the way and what not, but I
don't care, I don't see them in THAT PARTICULAR statement. And because
I don't see them, I don't have to know about them, as they appear when
the compiler kicks in. But that's out of my sight, below the
abstraction level.
However, you're arguing that THAT lower abstraction level, namely the
level where: 'where foo==constant' is translated to something, is
important to be able to operate on the abstraction level of 'where
foo==constant'. I then say: IF so, the abstraction level of 'where
foo==constant' is leaky and therefore bogus.
From memory, I believe it's something along the lines of:
nw.Customers.SelectMany (c => nw.Orders,
(c, o) => c);
If there were a "where" clause or anything else between the "from"
and the "select" then it would be more complicated, with a
transparent identifier involved.
and where did you get that knowledge? You read that in a manual or did
you check how the tree gets translated? If the latter, do you expect
from developers to check every time how a query is translated to
extension methods and lambda's if they don't use extension methods and
lambda's DIRECTLY in their code? I surely hope not!
Now, that may not be exactly right, but:
a) I know which parts are used as lambda expressions
You could only know that from experiments. It's not obvious a
selectmany is introduced and a tempresult in an anonymous type.
b) If I want to know the exact details, I could look it up in the C#
spec (or a good book
It's not in the C# spec. (At least not in the 3.0 spec doc I have
here. )
Again, expression trees are in some ways irrelevant to this - the
point is that it's translated into a lambda epxression.
Ok.
I can go on and on here, but I think that's repeating myself. You seem
to have the firm stance that if you have sourcecode S in format F, and
a compiler translates that to format F', you have to know F'. I don't,
I find that irrelevant, because the compiler does that for me, I work
with F, not F', as I'm not doing the work for the compiler, the
compiler is much better in that.
The abstraction level is C#. The "query expression" to "method call
with lambda expressions" translation is at the C# level of
abstraction, which is why it's defined by the C# spec.
Then show me in the C# spec where the table is where I can find all
the translations. I tried to find it, but the doc isn't really helpful
in this. There's for example no specification in the C# 3.0 spec
document where it defines how
from o in nw.orders
join c in nw.customers on c.id equals o.cid into od
is translated, or how let x=... is translated. There are examples, but
not a full detailed list.
I didn't expect one, as it's not really relevant, because it would
mean that the abstaction level created by the C# query language is
actually moot.
No - IL isn't specified by the C# spec, but the translation of query
expressions is.
where? These translations aren't 1:1 replacements, mind you. So they
need to have a big table somewhere where everything is described, how
every code fragment is translated. But all there is is a C# language
grammar of query specifications for join, where etc. but no translation
specs.
It's like knowing that Nullable<int> and int? mean the same thing, or
that System.Int32 and int mean the same thing, or that foreach calls
GetEnumerator and the IEnumerable members (as well as Dispose at the
end).
You're mixing syntactical sugar with compiler translations. I find
that not the same thing. Substitutions of elements with other elements
like nullable<T> or System.Int32 isn't the same as translating 'where
c.foo == (c.bar * 10)' into a deep tree of Expression derived class
instances.
These are some of the rules which govern the language, and C#
developers should be aware of them.
I find synonyms between non-terminals OK to know, but I find it for
example irrelevant how foreach(){} is translated into IL. Why should I
care? If I wanted to operate on the abstraction level BELOW
foreach(){}, I'd write the code of getting the enumerator, moving along
etc. manually. We're also not arguing how a for() is rolled into code,
right?
It certainly is relevant, as it's part of the specified behaviour of
C# and it affects behaviour. How can it not be relevant?
It's common sense: something at abstraction level n+1 shouldn't be
affected by details of abstraction level n, if it would, abstaction
level n+1 is useless.
Or are you going to say that every abstraction level should be used
with the knowledge how the underlying abstraction levels work? I don't
think so.
Btw, I wasn't saying lambda's don't exist, see my remark earlier in
this post.
The exact nature of how lambda expressions are converted into
expression trees is not covered in the spec, but the translation from
query expression into non-query expression form certainly is.
Show me how join identifier in expression on expression equals
expression into identifier is translated, in the C# spec.
No, individual expression trees are built by the compiler (or rather,
the compiler inserts code to build the expression trees) and then the
Queryable extension methods combine the expression trees together.
Then what's a tree in this case? An individual Expression<Func<>>? The
tree is build by the extension methods of Queryable. If you implement
them yourself, you can do whatever you want and insert the data you get
passed in directly into the result object which is your Query object
(which implements IQueryable) so after
var q = ...;
you have q which is an IQueryable but not a tree, just a single
object.
Look at the extension methods in Queryable: they all take expression
trees as their parameters. How are you going to call them if you
don't create expression trees yourself? You could pass in null, I
suppose, but that wouldn't be terribly useful
You don't NEED to call them. They're there provided for you to build
the tree for you. But if you don't want to parse a tree, you can just
implement all the extension methods yourself, and you get the input
passed in and you can handle it directly instead of building a tree for
evaluation later on.
Then I don't see why you're getting some of the details wrong above.
You keep referring to lambda expressions as if they only exist in
expression trees, whereas I'm talking about the lambda expressions
which are part of the specified translation of query expressions.
You misunderstood me.
Yes - one is part of a query expression, in a clause which is
translated into a lambda expression, and one is just a parameter in a
method call.
where does the translation take place? Right, during compilation. Do I
have to be part of that process? no, I'm the developer feeding the
translator.
It's exactly the same as this sort of situation:
List<int> list = new List<int> { 1, 2, 3 };
int i = 0;
int index = list.FindFirst (i,
delegate (int y) { return y==i; });
Again, the variable "i" is used twice - in one place it's evaluated
as part of calling the method, whereas in the delegate it's evaluated
within the method itself.
I also find that inconsistent. Let's agree on disagreeing if this is
consistent or not.
That's only the same type of inconsistency as passing one parameter
by ref and another by value, however. The behaviour is well defined
at the C# abstraction level.
there I specify what's done. In the code examples you and I posted,
it's implicit, not explicit.
If the developer works at the C# level, they should understand C# as
a language. The fact that one of your variables is being used as part
of a lambda expression (not necessarily part of an expression tree)
is clearly part of the spec.
Somewhere translation takes place, but how exactly is unclear and IMHO
also irrelevant as it happens AT COMPILETIME.
I'm afraid I'll continue to disagree with you.
that's ok
C# is the level of abstraction they're operating on, and the
translation is part of that level.
how can translation be part of the level of the language? Because if
it would, the translation result is also part of the level of the
language, and thus: n+1 AND n are both relevant abstraction levels.
Which means n+1 is irrelevant, because n is important.
You know, what if the C# compiler does things differently in C# 6.0 ?
Then I still have my where foo==constant, but how it's translated under
the hood is irrelevant, it can be completely different from how it's
done today. THAT's why there's an abstraction level above the
translation results.
It's defined in the C# spec, and therefore I believe it's at the
right abstraction level.
where? Just some words that there is a translation and some examples,
but no clear 'this becomes that', which ARE necessary as the
translations aren't 1:1.
In particular, if a developer doesn't know about the translation,
what do you expect them to know about?
grammar.
if the developer knows what where foo==constant does, why does the
developer has to know what that becomes in the end? It has various
stages!
Obviously you can't just type
in any combination of words and get a valid C# program - how much of
C#'s rules do you expect people to know? Would you expect them to
know, for instance, that in the expression "a ? b : c" either b or c
is evaluated, but not both?
isn't that what the grammar says? If it's required to know what the
compiler bakes from a piece of sourcecode, the language is hard to use.
If you do expect people to know that, why should they also not have
to know about what query expressions are about? If they don't need to
know the translation, how should people understand query expressions?
Or do you think they should just type keywords at random until
something compiles?
If you expect them to know the translation, you also do expect them
the graph it ends up in? As that's tied to it. ANd why does it stop
there for you?
So you expect people to know how a for loop is constructed in the
translated code? I don't. I have for, the grammar of for, the
description of the for statement, that's it. How it's translated is
irrelevant, that's up to the compiler. Otherwise, I'm babysitting the
compiler.
All abstractions are leaky, IMO. All ORMs are leaky - are they
automatically flawed too?
why are all or mappers leaky? an abstraction which doesnt force you to
dig into lower layers because the details are required, isn't leaky.
I'd hope so, but it doesn't seem that way from how you use them.
Lambda expressions aren't "created at runtime" - they're present in
the *compile-time* translation of query expression to non-query
expression.
yes I know. But what's created at compile time is a lower abstraction.
I don't know a single IL statement, I hardly ever look at how the IL
looks like from C# compiled code, do I have to? I don't think so, as
I'm using a compiler to do that work for me.
You need to know what the compiler must do in terms of the C# spec.
You don't need to know how it achieves that goal.
I need to write code which complies to the C# spec so the code I've
written is doing what it is expected to do. What the compiler bakes of
it, how many passes it has, how it translates where foo == constant
into whatever, and how that influences what happens next, why should I
care?
It's not at the level of the output of the compiler: that is beyond
the remit of the C# spec. The query expression translation is within
the remit of the C# spec.
translation results are translation results, so you're saying it's
important to know translation results to be able to write code in the
format BEFORE the translation results?
then tell me, where's the spec for join how it's translated? Or let ?
Hmm... what happens, out of interest? I don't have SQL server 2000
available here.
it dies at runtime with an exception.
And did you run the SQL, or the LINQ query? I don't
know whether LINQ to SQL knows what kind of database it's talking to
at all.
the linq query. it knows, it creates the specific provider for the db
it connects to.
I don't know whether we've crossed wires or not. I'm aware that it's
distinct at a "whole result" level, but SQL Server should (IMO) be
smart enough to know that it can work out the distinctness of the
whole result just from the EmployeeID.
The EmployeeID is unique, and all the results are based on the
Employee table. Therefore if it sees an EmployeeID it's already
returning, it knows that all the rest of the results for that row
will match the row it's already returning. Likewise it knows if it
sees an EmployeeID that it hasn't already seen, that's a "new row to
return" by definition.
distinct is executed on the resultset created by the projection. It
doesn't know if employee is unique, as the last column in the set can
be a function output which differs... so it has to evaluate every field.
Does that make my meaning any clearer? Or did you understand me the
first time and I just didn't understand your reply?
I did understand you (pfew!)
Isn't that tantamount to saying, "there's no point in letting people
write queries in anything other than SQL"? I think it's absolutely
vital that they're easily able to see the SQL generated, of course.
Most people think in sql, as that's what they know. Moving towards an
o/r query system then requires them to drop the sql completely and use
the system provided. But that's very hard as the system often mimics
sql with some things different.
Yes, it is a bit surprising. What's the SQL Server 2005 trick in
question?
Convert the image to varchar(max). That avoids the distinct error on
text for example, as text isn't a column you can use DISTINCT on, but
varchar(max) is
I thought DISTINCT worked fine in SQL Server 2000 - or is
that only for single-value-select queries? (i.e. a sequence of single
values).
it does, but DISTINCT has limitations: you can't use it on image,
text, ntext fields. so if you have one of these in your query resultset
(projection) and use DISTINCT, you're getting an error.
I don't see how you could expect that to happen without people even
knowing the C# language well enough to know which parts of query
expressions are translated into lambda expressions, and which are
evaluated immediately.
it's sad that that's required indeed, that's my whole point in this
long thread. I don't WANT TO KNOW what it ends up in. That's the point.
My query system is deterministic: people write X and get X.
Well, again I believe that some things which you regard as beneath
the abstraction layer of C# are well within that layer, but even so I
agree that the abstraction will leak.
ok, let's agree on that
However, I'd be very surprised to ever see any ORM where that's not
the case. You've said how you've got customers who have to change
their queries in order to meet the SQL specified by the DBAs - isn't
that the same kind of thing?
No, what I meant was: they wrote a query which uses joins, using
relation objects. The DBA says: "that's slow, use a subquery", so they
remove the relation objects and use FieldCompareSetPredicate objects
which are subqueries.
Just to clarify your language - do you mean a delicate matter?
oops, yes I mean delicate. I should have looked it up. delegate matter
indeed looks like something else :!
It's a
somewhat important distinction given that "delegates" mean something
rather different in this context I'm not trying to criticise your
English for the sake of it - just trying to make sure we don't talk
at cross-purposes.
I would personally have been really unhappy if it had been more
biased towards SQL at the cost of being natural for in-memory
objects. I believe the importance of LINQ to Objects has been
underplayed - I perform queries, orderings, projections etc on
in-memory collections just as often as I do against databases.
Everyone seems focused on LINQ to SQL, but I'm looking forward to
more readable code for in-memory objects.
I think it would have been better if they would have used 3 DSLs, so
you could use the best syntaxis for the target at hand. Now, a
limitation of one keeps the other one also down. (or introduces silly
stuff like 'group joins')
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------