Frans Bouma said:
It's not a given there are lambda expressions involved. If hte
provider is implemented as extension methods on IQueryable, you don't
have an expression tree. (some linq providers do this)
Well, two things here:
1) There will always be an expression tree if you implement IQueryable
2) Even if you're using LINQ to objects
Also, the lambda expressions and the expression tree are a conversion
of what's written, so the inner workings of the abstraction and
therefore NOT RELEVANT. If these ARE relevant, the abstraction is
useless.
No, they're not the inner workings of the abstraction. They're part of
the C# language spec in terms of how a query expression is compiled.
Let me make this very clear: when I say "lambda expression" I *don't*
mean System.Linq.Expressions.LambdaExpression I mean an expression such
as
x => x+1
or
(int x, int y) => x+y
That's the meaning of "lambda expression" in the context of C#.
Now, if you can provide a C# query expression which doesn't involve a
translation which uses lambda expressions, I'll be impressed.
Where did I say it's trivial?
var q = from c in nw.Customers
from o in nw.Orders
select c;
into what is this translated? Do you know?
From memory, I believe it's something along the lines of:
nw.Customers.SelectMany (c => nw.Orders,
(c, o) => c);
If there were a "where" clause or anything else between the "from" and
the "select" then it would be more complicated, with a transparent
identifier involved.
Now, that may not be *exactly* right, but:
a) I know which parts are used as lambda expressions
b) If I want to know the exact details, I could look it up in the C#
spec (or a good book
If not, why is it necessary
to know how it is constructed under the hood, as the code above is an
abstraction level ABOVE the expression trees!
Again, expression trees are in some ways irrelevant to this - the point
is that it's translated into a *lambda* epxression.
(btw you can't write a join with extension methods if you don't define
the intermediate classes up front, as you need anonymous classes you
have to REFER to later on
)
Not sure what you mean there.
but that's an abstraction level BELOW where the code is written at. If
that info from abstraction level n is REQUIRED to work at abstraction
level n+1, then abstraction level n+1 is leak, and useless.
The abstraction level is C#. The "query expression" to "method call
with lambda expressions" translation is at the C# level of abstraction,
which is why it's defined by the C# spec.
I find that pretty bold, if you ask me. It's like you have to know to
what IL statements a C# construct is compiled to to be able to write
proper C#. If that's required, the C# abstraction level is bogus
No - IL isn't specified by the C# spec, but the translation of query
expressions is.
It's like knowing that Nullable<int> and int? mean the same thing, or
that System.Int32 and int mean the same thing, or that foreach calls
GetEnumerator and the IEnumerable members (as well as Dispose at the
end).
These are some of the rules which govern the language, and C#
developers should be aware of them.
sure, but which parts of a query are translated to which expressions
in a tree (as I said: I didn't use any lambda at all! )and which
fragments become lambda's is NOT relevant, as the code in the example
is an abstraction level above the lambdas.
It certainly *is* relevant, as it's part of the specified behaviour of
C# and it affects behaviour. How can it *not* be relevant?
The exact nature of how lambda expressions are converted into
expression trees is not covered in the spec, but the translation from
query expression into non-query expression form certainly is.
As I said above: that's not always the case: if you implement
IQueryable extension methods, no tree is created, as the tree creation
is done by extension methods on Queryable.
No, individual expression trees are built by the compiler (or rather,
the compiler inserts code to build the expression trees) and then the
Queryable extension methods combine the expression trees together.
Look at the extension methods in Queryable: they all take expression
trees as their parameters. How are you going to call them if you don't
create expression trees yourself? You could pass in null, I suppose,
but that wouldn't be terribly useful.
The conversion to Expression<Func<... >> is done by the compiler but
that's below the abstraction level of the code, and thus irrelevant.
The details of that conversion are below the abstraction level of C#,
but the fact that the conversion is performed *is* part of the
abstraction level of C#.
Btw, I simply said the lambda ends up in the expression tree. I know
the order, the inner workings aren't new to me as writing a linq
provider requires that knowledge, unfortunately.
Then I don't see why you're getting some of the details wrong above.
You keep referring to lambda expressions as if they only exist in
expression trees, whereas I'm talking about the lambda expressions
which are part of the specified translation of query expressions.
Erm... the inconsistency is in the fact that the parameter is used at
two different spots in the same query, however one spot gets evaluated
at execution time (where x==var) and the other one at query creation
time (the .Take(var) )
Yes - one is part of a query expression, in a clause which is
translated into a lambda expression, and one is just a parameter in a
method call.
It's exactly the same as this sort of situation:
List<int> list = new List<int> { 1, 2, 3 };
int i = 0;
int index = list.FindFirst (i,
delegate (int y) { return y==i; });
Again, the variable "i" is used twice - in one place it's evaluated as
part of calling the method, whereas in the delegate it's evaluated
within the method itself.
this creates inconsistency, as changing var changes the query at 1
spot, not at two spots.
That's only the same type of inconsistency as passing one parameter by
ref and another by value, however. The behaviour is well defined at the
C# abstraction level.
As I said earlier: there's a technical explanation, but that doesn't
explain WHY this is changing only at one spot, semantically speaking,
i.e.: from the user (developer)'s pov: s/he works at the C# code level.
The stuff beneath that level is irrelevant, due to the abstraction.
However with this, it WILL be relevant.
If the developer works at the C# level, they should understand C# as a
language. The fact that one of your variables is being used as part of
a lambda expression (not necessarily part of an expression tree) is
clearly part of the spec.
People can jump up and down and say I'm stupid, but this IS
inconsistent.
I'm afraid I'll continue to disagree with you.
erm... why do they need to know about translations being performed
under the level of the abstraction they're operate on?
C# *is* the level of abstraction they're operating on, and the
translation is part of that level.
It's critical that people operating at abstraction level n+1 don't
need to know the details of abstraction level n. If I write:
where c.Foo == bar
select c;
I don't need to know what gets translated into what. Not only is this
NOT trivial, it's also irrelevant, as it's not at the abstraction level
I'm operating on.
It's defined in the C# spec, and therefore I believe it's at the right
abstraction level.
In particular, if a developer doesn't know about the translation, what
*do* you expect them to know about? Obviously you can't just type in
any combination of words and get a valid C# program - how much of C#'s
rules do you expect people to know? Would you expect them to know, for
instance, that in the expression "a ? b : c" either b or c is
evaluated, but not both?
If you *do* expect people to know that, why should they also not have
to know about what query expressions are about? If they don't need to
know the translation, how *should* people understand query expressions?
Or do you think they should just type keywords at random until
something compiles?
If I would have written:
o.Where(c=>c.Foo==bar).Select(c=>c);
it's a different thing, as I formulate a different statement: I use a
lambda, and thus operate on a different abstraction level.
Well, it's a different thing because your query expression wouldn't
include the Select(c=>c) part, but other than that they're *not*
different, and I believe it's important that people understand that.
Make no
mistake: creating a query language is damn hard, and people shouldn't
take things for granted easily. I definitely find that Linq is flawed
in this case, as it simply provides an abstraction level which isn't
there, as it's leaky.
All abstractions are leaky, IMO. All ORMs are leaky - are they
automatically flawed too?
Jon, please, I do know the difference.
I'd hope so, but it doesn't seem that way from how you use them. Lambda
expressions aren't "created at runtime" - they're present in the
*compile-time* translation of query expression to non-query expression.
Yes I know, but as it's stuff created by the compiler, I don't care,
as stuff created by the compiler is out of my reach: why should I care
about stuff created by the compiler, if I operate on a level above a
compiler?
You need to know what the compiler *must* do in terms of the C# spec.
You don't need to know how it achieves that goal.
If the output of a compiler is relevant for me on the
abstraction level above the output of the compiler, something is
SERIOUSLY wrong here.
It's not at the level of the output of the compiler: *that* is beyond
the remit of the C# spec. The query expression translation *is* within
the remit of the C# spec.
why?
Aren't they writing C# code? They're not taking shortcuts, they're
writing C# code. I fail to see why this is relevant.
They're taking shortcuts in the same way that writing int? instead of
System.Nullable<System.Int32> is taking a shortcut - and I'd certainly
expect people to know that.
They're writing expressions which are clearly defined in the spec to be
used as part of lambda expressions. Whether those lambda expressions
are then converted into expression trees or simple delegate instances
depends on what you're querying, but the fact that they're used as part
of lambda expressions is undeniable; it's not part of a C# compiler's
implementation details, it's part of the spec it has to follow.
ah, and now on sqlserver 2000.
*poof*.
Hmm... what happens, out of interest? I don't have SQL server 2000
available here. And did you run the SQL, or the LINQ query? I don't
know whether LINQ to SQL knows what kind of database it's talking to at
all.
no
DISTINCT is a row-wide DISTINCT. It's not Oracle which can do
per column distinct operations
I don't know whether we've crossed wires or not. I'm aware that it's
distinct at a "whole result" level, but SQL Server *should* (IMO) be
smart enough to know that it can work out the distinctness of the whole
result just from the EmployeeID.
The EmployeeID is unique, and all the results are based on the Employee
table. Therefore if it sees an EmployeeID it's already returning, it
knows that all the rest of the results for that row will match the row
it's already returning. Likewise it knows if it sees an EmployeeID that
it *hasn't* already seen, that's a "new row to return" by definition.
Does that make my meaning any clearer? Or did you understand me the
first time and I just didn't understand your reply?
it depends on the depth of the tree, if you have 1 subquery the
execution plans are the same here on sqlserver 2000, but it can differ,
which is why it's important you have the choice.
Sure.
Interesting! The exists query isn't optimal, but it's better than
nothing. Hopefully they'll document this properly.
I'm not sure whether I'd expect them to document all the potential
twists and turns involved in converting a LINQ query into a SQL query.
I suspect it would be unreadable anyway, and people will do what I did:
experiment and look at the SQL.
It would be good to give some hints though
It didn't occur to me to write it that way. I didn't think of
.Contains().
Take it from me, who has written a query system which appears logical
to me, but not always to others: it's not straightforward unless it's
plain sql.
Isn't that tantamount to saying, "there's no point in letting people
write queries in anything other than SQL"? I think it's absolutely
*vital* that they're easily able to see the SQL generated, of course.
So this is an element to document in full otherwise the
masses will ask this question time and time again. The .Contains()
isn't obvious IMHO but once you know it exists, it is a way to optimize
the query indeed. Good point.
I think it's reasonably obvious if you come at it from the "what LINQ
query operators have I got available?" side of things rather than the
"here's the SQL I want to create".
One point: if you do a "Contains" on something like an integer array,
it translates that into IN.
distinct doesn't work on sqlserver 2000, as it uses a sqlserver 2005
trick. It's not a rant against linq to sql, it was an example how the
provider can throw things at you at runtime which are unexpected and
differ from provider to provider. Distinct did throw exceptions on me
with sqlserver 2000. Unexpected, btw, as I did expect they had solved
it for that db as well (it's not a db which isn't used a lot ... )
Yes, it is a bit surprising. What's the SQL Server 2005 trick in
question? I thought DISTINCT worked fine in SQL Server 2000 - or is
that only for single-value-select queries? (i.e. a sequence of single
values).
Sure, it makes things less cumbersome. I don't particular mind Linq to
Sql btw, I used that as an example, it's how the developer is able to
specify a query and KNOW the outcome, or in other words: is not running
into abstraction layer misery. So deterministically write X and get X.
I don't see how you could expect that to happen without people even
knowing the C# language well enough to know which parts of query
expressions are translated into lambda expressions, and which are
evaluated immediately.
I would say that you can have a leaky abstraction which doesn't
necessarily imply misery. More work than a perfectly leak-free
abstraction, of course, but they're imaginary IMO
EXISTS often requires making the inner query a co-related subquery,
but that's up to the o/r mapper.
But at the server side, should the query optimiser not be able to work
out that they mean the same thing (in cases such as this)?
Ok. I didn't see the Contains link (pun intended
) but it's now
clear the flexibility to some extend is there.
Cool.
It's a delegate matter though, and my remarks shouldn't be seen as a
bash of linq to sql as it's the only publically known full
implementation of linq at the moment, and all I wanted is to illustrate
that the material the developer has to work with is at abstraction
layer n+1 and it might be problematic when problems occur at
abstraction layer n.
Well, again I believe that some things which you regard as beneath the
abstraction layer of C# are well within that layer, but even so I agree
that the abstraction will leak.
However, I'd be very surprised to *ever* see *any* ORM where that's not
the case. You've said how you've got customers who have to change their
queries in order to meet the SQL specified by the DBAs - isn't that the
same kind of thing?
I call it a delegate matter because it's a balance
act:
Just to clarify your language - do you mean a *delicate* matter? It's a
somewhat important distinction given that "delegates" mean something
rather different in this context
I'm not trying to criticise your
English for the sake of it - just trying to make sure we don't talk at
cross-purposes.
on one hand you want to abstract away the low-level messing with
databases and simply decide that for the developer and create the query
for the developer so s/he doesn't have to think about details. However
on the other hand you don't want to decide too much so the fine details
aren't reachable as in THIS case, the fine details might be of upmost
importancy.
Agreed.
Personally I find the query language too far away from the database
mechanism, but it's a tradeoff, as it's a generic language also to be
used on in-memory collections and xml, where other things are more
important.
I would personally have been really unhappy if it had been more biased
towards SQL at the cost of being natural for in-memory objects. I
believe the importance of LINQ to Objects has been underplayed - I
perform queries, orderings, projections etc on in-memory collections
just as often as I do against databases. Everyone seems focused on LINQ
to SQL, but I'm looking forward to more readable code for in-memory
objects.
that's to be seen. A lot of people will simply follow and do what MS
tells them. Some will wonder what is gained by using an abstraction
layer that doesn't really make things less cumbersome to use. I mean:
there are still issues with runtime exceptions based on queries which
can't be created. Those aren't caught at compile time. What's different
(over exxageration) with a system where, put it bluntly, sql is placed
in strings inside code or in a config file? Then things break at
runtime too.
Because it catches *more* stuff at compile-time, and uses language
which is *more* familiar to developers. It doesn't need to be perfect -
just better than the alternative.
I've written a lot of LINQ to SQL queries recently, and far, far more
of them have worked first time after getting rid of compiler errors
than would have been the case if I'd just been writing raw SQL.
It's not really an easy sell, as it's a step forward in many areas but
also a step back in others (or not really a step forward)
I don't accept that it's a step back - it's not preventing you from
doing anything you might have done before. It won't help everywhere,
but I believe it will help in enough places to make it a big win,
especially when you consider LINQ to Objects (and XML etc) as well as
LINQ to SQL.