Finally which ORM tool?

F

Frans Bouma [C# MVP]

Jon said:
Let me rephrase then:

It is easier to write an ORM system which has a rich session system
if you force sessions to be used everywhere: it means that every
layer of the ORM framework code can rely on sessions being available.

Using a session makes an o/r mapper easier to write, as you can just
pick Scott Ambler's design and start typing code. The thing is that as
soon as you create a distributed system, you run into problems which
have to be solved by the user of the framework and not the framework
itself. Re-attaching entity objects from a previous session is 'ok' but
you won't know which fields have changed.

Also, if you use a distributed system, and you want on the _client_ to
have unique objects, you're in for a lot of pain, as you don't have
that option, you have to write your own uniquing code on the client (as
the session lives on the server!).

With in-memory context objects you solve this, and with a session-less
design in the o/r mapper you solve the re-attach crap.

So why people still find it necessary to design an o/r mapper around a
session is beyond me. Sure, it's more work to do it session-less, but
the point of a framework isn't the joy it gives to the author of it,
but the joy it gives to the user of it.
If you don't want to use sessions, I would suggest you use an ORM
which makes them optional, understanding that there may well be
session-based features which aren't available because the sessions
aren't pervasive in that system.

I can't name one feature which isn't possible in these kind of
frameworks to be honest.
I mean a session which is potentially split over multiple
client/server requests.

that means the service is stateful. Isn't that a problem waiting to
happen, sooner or later?>

never mind services, let's use a simple example of asp.net and
postbacks. Exact the same problem.
When one framework forces you into a particular and irritating
mindset, it can be seen as a mistake. When several (including the
most commonly used ones) do so, it's worth taking a step back and
looking for the underlying reasons. The experts are best placed to do
that, and I don't claim to be an expert in ORM in general.

(As another couple of data points, LINQ to SQL has the same session
bias, and I believe that the ADO.NET Entity Framework does too.)

That's also why they have teh add/detach hell already looming on the
horizon (Linq to sql will throw exceptions in these cases when
attaching graphs) and together with deferred execution of linq queries,
it will give a lot of problems for users who expect A but run into snag
B.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
Using a session makes an o/r mapper easier to write, as you can just
pick Scott Ambler's design and start typing code. The thing is that as
soon as you create a distributed system, you run into problems which
have to be solved by the user of the framework and not the framework
itself. Re-attaching entity objects from a previous session is 'ok' but
you won't know which fields have changed.

You won't konw which fields have changed unless you requery the
database whatever you do - sessions don't make the situation any worse
here.
Also, if you use a distributed system, and you want on the _client_ to
have unique objects, you're in for a lot of pain, as you don't have
that option, you have to write your own uniquing code on the client (as
the session lives on the server!).
Indeed.

With in-memory context objects you solve this, and with a session-less
design in the o/r mapper you solve the re-attach crap.

I suspect (although I haven't tried it) that using the ORM system on
the client would let you effectively have a "fake" session that you can
attach the objects to in the same way as using "context objects".
So why people still find it necessary to design an o/r mapper around a
session is beyond me. Sure, it's more work to do it session-less, but
the point of a framework isn't the joy it gives to the author of it,
but the joy it gives to the user of it.

Well, given how many frameworks *do* use it, including organisations
which certainly have enough time and money to avoid using sessions
gratuitously, I suspect there are advantages which either you don't
appreciate the importance of or are unaware of.

Certainly when I've used Hibernate I haven't found the session stuff to
be a pain, and it's been nice to be able to treat it as a context for
that request.
I can't name one feature which isn't possible in these kind of
frameworks to be honest.

I can't off the top of my head either - but that doesn't mean such
features don't exist. I hope you won't take it as an insult if I
suggest that you might be somewhat biased towards session-less ORMs,
too :)
that means the service is stateful. Isn't that a problem waiting to
happen, sooner or later?>

never mind services, let's use a simple example of asp.net and
postbacks. Exact the same problem.

Either the service is stateful (which can be distributed state,
potentially - I believe EJB 3 containers provide this option) or you
create a new session, reattach any data you need (whether passed by the
client, or whatever) and don't need any server state.

If you need server state, you need server state: having a session based
or sessionless ORM isn't likely to change that as far as I can see.
That's also why they have teh add/detach hell already looming on the
horizon (Linq to sql will throw exceptions in these cases when
attaching graphs) and together with deferred execution of linq queries,
it will give a lot of problems for users who expect A but run into snag
B.

You call it "add/detach hell" but I honestly can't say I ever had any
problems when working with Hibernate in this way.

Deferred execution is a fundamental (and very useful) part of LINQ
which people will just have to understand. It's not mind-bogglingly
new, either - using the Criteria API in Hibernate does exactly the same
kind of thing, building up a query which can be executed whenever you
want it to be.
 
F

Frans Bouma [C# MVP]

Jon said:
You won't konw which fields have changed unless you requery the
database whatever you do - sessions don't make the situation any
worse here.

If you track changes inside the entity, you don't need to. If you
don't have a session-oriented design, you have to track changes
elsewhere, so you don't have to put the burden onto the developer to do
things for the framework, the framework can decide what to do by itself.

For example, in LLBLGen Pro, changes are tracked inside the entity, so
you can just pass the entity back/forth across tiers, services, doesn't
matter and save it in 1 line of code, without telling the o/r core that
it's a new entity or not, nor does the o/r core have to refetch the
entity from the db or does it have to keep a copy around. :)
I suspect (although I haven't tried it) that using the ORM system on
the client would let you effectively have a "fake" session that you
can attach the objects to in the same way as using "context objects".

That won't help you. the session on the service doesn't know which
fields are changed, which entities are new etc.

Graph / entity management, one of the services provided by a good o/r
mapper, is something often ignored by people who start with o/r mappers
but it's one of those things which makes an o/r mapper framework be
more than just a save/load layer for data.
Well, given how many frameworks do use it, including organisations
which certainly have enough time and money to avoid using sessions
gratuitously, I suspect there are advantages which either you don't
appreciate the importance of or are unaware of.

Name one advantage of session-oriented design over non-session
oriented design for the USER of the framework. There are none. Sure,
for the developer of the framework, it's easier, much easier, as every
meta-data element and other data element to work with is centrally
available and what to do is actually told to you.
Certainly when I've used Hibernate I haven't found the session stuff
to be a pain, and it's been nice to be able to treat it as a context
for that request.

2-way databinding in webapps, passing entities across service
boundaries... 2 examples where you have to tell the framework what to
do while it's the job of the framework to find out what to do.

Sure if you don't mind to do the extra work, it's not a problem. Also,
with hibernate based frameworks, you don't get a lot of entity
management anyway. I mean:
myOrder.Customer = myCustomer;
doesn't make this true:
myCustomer.Orders.Contains(myOrder);
I can't off the top of my head either - but that doesn't mean such
features don't exist. I hope you won't take it as an insult if I
suggest that you might be somewhat biased towards session-less ORMs,
too :)

heh, of course it looks like I'm biased. Though let it be clear that I
took deliberately the decision not to create a session oriented
framework because of the problems that come with it, because I wanted a
framework which would behave as natural as possible (read: as
transparent as possible) so with the least amount of work for the user
(the developer). And trust me, I've cursed that decision many many
times because inside the framework it's often a challenge how to get
things done because there's no central info store with all the info for
you. :)
Either the service is stateful (which can be distributed state,
potentially - I believe EJB 3 containers provide this option) or you
create a new session, reattach any data you need (whether passed by
the client, or whatever) and don't need any server state.

Doesn't java have cross-system object awareness? On .NET, a
distributed state is really a red herring, you still have to
serialize/deserialize the state which implies a copy.

Though back to the small space of a webpage which posts back: the
state of the page has to be pulled from somewhere. That's of course ok,
there are facilities for that. The problem is though, if you track
changes in an object outside the entities, you either have to keep that
in memory as well (bad) or you have to rely on the developer to tell
you what the state of the entities is. (bad too, as it implies
babysitting by the developer).
If you need server state, you need server state: having a session
based or sessionless ORM isn't likely to change that as far as I can
see.

Changes in the entity are the concern of the entity, not of some
outside object. If it WOULD, the outside object tracking the changes is
tied to the entity till the entity dies.

Having a session tracking changes is precisely tying the session to
the entity: as soon as you pass the entity to a place where the session
isn't available, the changes are lost. As soon as that happens, the
developer using the framework has to tell the new instance of the
session what happened: is the entity instance a new entity or an
existing one. This is significant.

So as this is a pain, people work around this by changing their system
design to avoid this. Which in general implies having stateful
repositories in memory so changes/sessions aren't lost. Though that's
not always possible or not always desired.
You call it "add/detach hell" but I honestly can't say I ever had any
problems when working with Hibernate in this way.

So you never had to tell the session that the entity object you
attached was new or not new?
Deferred execution is a fundamental (and very useful) part of LINQ
which people will just have to understand. It's not mind-bogglingly
new, either - using the Criteria API in Hibernate does exactly the
same kind of thing, building up a query which can be executed
whenever you want it to be.

That's not the same thing! A criteria object doesn't contain a
session! the object created by the compiler DOES contain a
QueryProvider which has to have access to the persistence core to be
able to execute the query by itself.

That's the fundamental difference and that difference is going to
cause problems, simply because the linq query object contains a session.

Also, if you pass a variable to the query, the value the variable has
at EXECUTION time is used, not at CREATION time:

string customerID = "CHOPS";

var q = from o in nw.Order
where o.CustomerId = customerID
select o;

// .. some other code
customerID = "BLONP";

foreach(Order o in q)
{
// which orders are read? BLONP's!
}

So the query isn't a query definition alone, it's also the resultset.
That's a combination of concerns which will cause problems, and IMHO
unnecessary. Just because some people thought it would be 'easier' for
people to have deferred execution, it's now the main way to execute a
query. In fact I can't do:
q.Execute();

I can do:
List<Order> orders = q.ToList();

though that will create a duplicate list. However I have to if I want
to get the # of orders. If it would be a query specification ALONE, I
could have used:
mySession.GetCount(q);

and the results:
IList results = mySession.Execute(q);

Would that have been such a problem? IMHO not at all. It's very
natural. Also, because in that last example it's not deferred executed,
the query is created when the var q statement is executed in code, so
the changed variable problem isn't there.

Also, if I have a process routine, which has to process 100,000 rows
in a complex set of routines, it's natural to page through the rows,
process the page read, and persist them in a transaction, to avoid
having all data in memory.

Now, how would you do that, with a session object INSIDE the query?
You can only do that if you can either use System.Transactions, or if
you can place the fetch AND save logic in the same routine. Otherwise
you have to pass sessions around to the query creation routine, because
the query creation routine REQUIRES the session object, and you need to
pass the session object you're using at that moment, otherwise you'll
get deadlocks in the db.

If the query would just be that, a query specification, you wouldn't
have had this problem.

Now, call me stupid, but why on earth would anyone with a little bit
of knowledge of o/r mapping design the system DELIBERATELY so that
deferred execution was there? What problem does it solve? IMHO none.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
If you track changes inside the entity, you don't need to. If you
don't have a session-oriented design, you have to track changes
elsewhere, so you don't have to put the burden onto the developer to do
things for the framework, the framework can decide what to do by itself.

I think we were talking about a different type of change tracking.
However, I seem to remember that Hibernate performs bytecode trickery
to keep changes within the object itself as well.
That won't help you. the session on the service doesn't know which
fields are changed, which entities are new etc.

To be honest, I can't remember enough of the details to remember how/if
Hibernate handles knowing about new entities.
Graph / entity management, one of the services provided by a good o/r
mapper, is something often ignored by people who start with o/r mappers
but it's one of those things which makes an o/r mapper framework be
more than just a save/load layer for data.

Sure - but graph management in terms of uniqueness is more pervasive
when it's part of a session system which is *also* pervasive. IMO,
anyway.
Name one advantage of session-oriented design over non-session
oriented design for the USER of the framework. There are none. Sure,
for the developer of the framework, it's easier, much easier, as every
meta-data element and other data element to work with is centrally
available and what to do is actually told to you.

You're still sounding *very* black and white, and I simply don't
believe that there isn't a single advantage (to the user) of keeping
track of things in sessions. I have to say that when I was using
Hibernate it seemed a very natural way of working, partly because it
was very similar to using a transaction. Keeping a session open for the
length of a request just seemed to fit as an easy way of doing
something. I never experienced the "hell" of attach/detach, and having
opened the session and used that for everything else, I didn't need to
learn any "extras" to get uniqueness.
2-way databinding in webapps, passing entities across service
boundaries... 2 examples where you have to tell the framework what to
do while it's the job of the framework to find out what to do.

Ah, databinding - I've never been a fan of that to start with, to be
honest. Too much black magic which only works if you happen to want to
work in *exactly* the way that was anticipated. At least, that's been
my experience of it in every scenario I've seen. It does work pretty
well for displaying stuff - but making it 2-way just has *so* many
problems. With any luck WPF improves matters somewhat as far as thick
clients are concerned...
Sure if you don't mind to do the extra work, it's not a problem. Also,
with hibernate based frameworks, you don't get a lot of entity
management anyway. I mean:
myOrder.Customer = myCustomer;
doesn't make this true:
myCustomer.Orders.Contains(myOrder);

Not unless you implement the recommended patterns (which would keep the
association consistent). I admit it's a disadvantage to have to do all
this manually.
heh, of course it looks like I'm biased. Though let it be clear that I
took deliberately the decision not to create a session oriented
framework because of the problems that come with it, because I wanted a
framework which would behave as natural as possible (read: as
transparent as possible) so with the least amount of work for the user
(the developer). And trust me, I've cursed that decision many many
times because inside the framework it's often a challenge how to get
things done because there's no central info store with all the info for
you. :)

Well, as I said earlier: I suspect it's more of a balancing act that
you're making it out to be. I'm certainly *not* trying to say that's a
deliberate act on your part, just to be clear - but it tends to be
easier to see the advantages of your own system and the disadvantages
of others rather than the other way round.

I'm sure that if we had a Hibernate expert here (Ayende, for example!)
he'd be much more persuasive on the "sessions are okay" side of the
fence.
Doesn't java have cross-system object awareness? On .NET, a
distributed state is really a red herring, you still have to
serialize/deserialize the state which implies a copy.

I don't think Java "natively" has any cross-system object awareness
specified - although I know there have been distributed JVMs built, at
least as research projects.
Though back to the small space of a webpage which posts back: the
state of the page has to be pulled from somewhere. That's of course ok,
there are facilities for that. The problem is though, if you track
changes in an object outside the entities, you either have to keep that
in memory as well (bad) or you have to rely on the developer to tell
you what the state of the entities is. (bad too, as it implies
babysitting by the developer).

As I say, I thought Hibernate kept track of what had changed within the
entity itself too. I don't know about NHibernate though, or how
serialization affects this.
Changes in the entity are the concern of the entity, not of some
outside object. If it WOULD, the outside object tracking the changes is
tied to the entity till the entity dies.

Having a session tracking changes is precisely tying the session to
the entity: as soon as you pass the entity to a place where the session
isn't available, the changes are lost. As soon as that happens, the
developer using the framework has to tell the new instance of the
session what happened: is the entity instance a new entity or an
existing one. This is significant.

So as this is a pain, people work around this by changing their system
design to avoid this. Which in general implies having stateful
repositories in memory so changes/sessions aren't lost. Though that's
not always possible or not always desired.

Well, all I can say is that the main projects I worked on which used
Hibernate never ran into this as an issue. It would take quite a lot of
description and conversation between us to figure out exactly *why* it
wasn't an issue, but please believe me when I say that it wasn't :)
So you never had to tell the session that the entity object you
attached was new or not new?

I can't say I remember ever actually needing to attach/detach. I know
that it's *available*, but I can't remember needing to use it. If I
*did* need to use it, it clearly didn't cause me enough pain to make it
memorable.
That's not the same thing! A criteria object doesn't contain a
session! the object created by the compiler DOES contain a
QueryProvider which has to have access to the persistence core to be
able to execute the query by itself.

A Criteria object does contain the session (or at least has a reference
to it).

See
http://www.hibernate.org/hib_docs/v3/api/org/hibernate/impl/CriteriaImp
l.html (watch for wrapping)
That's the fundamental difference and that difference is going to
cause problems, simply because the linq query object contains a session.

Also, if you pass a variable to the query, the value the variable has
at EXECUTION time is used, not at CREATION time:

That certainly needs to be clearly understood - but I'd say it's
useful, too, in cases where you need to do the same query multiple
times but with different parameters.

Captured variables always need to be handled with care, and I can see
that it could well trip up novices, but I don't think it's
unreasonable.
string customerID = "CHOPS";

var q = from o in nw.Order
where o.CustomerId = customerID
select o;

// .. some other code
customerID = "BLONP";

foreach(Order o in q)
{
// which orders are read? BLONP's!
}

So the query isn't a query definition alone, it's also the resultset.

No, the query itself is a query definition alone. It's only the
enumerator you get when you call GetEnumerator() which relates to the
result set.
That's a combination of concerns which will cause problems, and IMHO
unnecessary. Just because some people thought it would be 'easier' for
people to have deferred execution, it's now the main way to execute a
query. In fact I can't do:
q.Execute();

I can do:
List<Order> orders = q.ToList();

though that will create a duplicate list. However I have to if I want
to get the # of orders. If it would be a query specification ALONE, I
could have used:
mySession.GetCount(q);

You can just call Count(). That will execute immediately, and return
the count - but without having to fetch all the data.
and the results:
IList results = mySession.Execute(q);

Would that have been such a problem? IMHO not at all.

I don't see that there's a problem anyway, given the ability to call
Count().
It's very natural. Also, because in that last example it's not deferred executed,
the query is created when the var q statement is executed in code, so
the changed variable problem isn't there.

But equally the ability to reuse the query very simply by changing the
variables isn't there either. If you want to avoid the meaning of the
query being changed, either copy the variable values or just don't
change the variables.
Also, if I have a process routine, which has to process 100,000 rows
in a complex set of routines, it's natural to page through the rows,
process the page read, and persist them in a transaction, to avoid
having all data in memory.

Now, how would you do that, with a session object INSIDE the query?
You can only do that if you can either use System.Transactions, or if
you can place the fetch AND save logic in the same routine. Otherwise
you have to pass sessions around to the query creation routine, because
the query creation routine REQUIRES the session object, and you need to
pass the session object you're using at that moment, otherwise you'll
get deadlocks in the db.

I'd usually use the "a context has a current session" idea at that
point, where a context might or might not be "the current thread"
depending on the situation. (You may need to be careful of thread
agility.)
If the query would just be that, a query specification, you wouldn't
have had this problem.

Now, call me stupid, but why on earth would anyone with a little bit
of knowledge of o/r mapping design the system DELIBERATELY so that
deferred execution was there? What problem does it solve? IMHO none.

I've always found the deferred execution very natural, both in LINQ and
Hibernate - I create a query, do other things if necessary, and then
use the query when I'm ready. Why should creating the query execute it
immediately?

The equivalent of Execute() *is* either starting to enumerate the
results, or calling ToList(), or calling one of the aggregates such as
Count(). If you want to execute the query as soon as you've defined it,
that's easy enough to do - whereas if you want deferred execution in a
situation where defining the query *does* execute it immediately,
that's harder.

Now, if you want to argue that it would be nice to be able to defer
execution further, allowing it to be unrelated to a session, I can
certainly see the benefit of that - commonly used queries could be
created at start-up and then just executed against arbitrary sessions.
I'd have no problem with that whatsoever - a definite benefit. That's
still deferred execution though.
 
J

James Crosswell

By the way, Hibernate uses the "lock" method of a session to perform the
"reattaching" of objects that might have been disconnected (for example
when writing ASP.NET applications)... which nHibernate supports as well.

So basically, that's the work around which avoids the need for a
CopyTo/CopyFrom method and the pseudocode I provided WAY up the chain of
messages in this thread.

Other than that, for SQL Server work I'm really starting to like
nHibernate. It doesn't support other database backends as well as
something like XPO, but it gives much better support for the SQL Server
specific features (things like ROW_NUMBER...OVER for example, get used
internally if you specify page size and first page on query or criteria
objects when calling the List() and List<T>() methods).

XPO has kind of written to the lowest common denominator and, because
not all DBs have an inherent way of supporting paging (like MySQL has
done for ages) they basically don't provide any support for paging (so
you don't get any out of the box paging support even if the DB that
you're using does support it). I guess it's these kind of design choices
that are the reason DevExpress' framework offers better database
agnosticism.

All in all, I think I'm leaning towards the nHibernate camp for most
situations... although Linq for Entities or whatever they're calling it
these days seems like it will do much the same thing, so I wonder what
the future will hold for the other ORMs - certainly it'll make them a
harder sell if Linq comes with Visual Studio. Lot's of folks will start
to wonder "why bother with anything else?"

Best Regards,

James Crosswell
Microforge.net LLC
http://www.microforge.net
 
F

Frans Bouma [C# MVP]

Jon said:
I think we were talking about a different type of change tracking.

I think there's just one: entity instance gets loaded into entity
class instance, entity instance in-memory gets changed. Which parts?
that's tracked, by the change tracking mechanism.
However, I seem to remember that Hibernate performs bytecode trickery
to keep changes within the object itself as well.

Don't compare hibernate with nhibernate, as hibernate 3 is different
from nhibernate (which is based on hibernate2). It's my understanding
that nhibernate uses the 'keep a copy around of the original values in
the session' method.
You're still sounding very black and white, and I simply don't
believe that there isn't a single advantage (to the user) of keeping
track of things in sessions. I have to say that when I was using
Hibernate it seemed a very natural way of working, partly because it
was very similar to using a transaction. Keeping a session open for
the length of a request just seemed to fit as an easy way of doing
something. I never experienced the "hell" of attach/detach, and
having opened the session and used that for everything else, I didn't
need to learn any "extras" to get uniqueness.

Also in distributed scenario's where entities got fetched in session
instance S1, distributed to some client, altered there and then they
came back and have to be saved with session instance S2?

I'm not saying that having a class which manages the activity with the
DB is bad, on the contrary. What I'm saying is that the object used to
load the entities should be in effect stateless. So shouldn't be tied
to an entity object.
Ah, databinding - I've never been a fan of that to start with, to be
honest.

Me neither, but the vast majority of people out there uses it. :)
Too much black magic which only works if you happen to want
to work in exactly the way that was anticipated. At least, that's
been my experience of it in every scenario I've seen. It does work
pretty well for displaying stuff - but making it 2-way just has so
many problems. With any luck WPF improves matters somewhat as far as
thick clients are concerned...

It's not that bad. Avoiding it has the implication that you have to
write the glue between control and object yourself, which can be
painful and buggy as well, simply because it's boring code.
Not unless you implement the recommended patterns (which would keep
the association consistent). I admit it's a disadvantage to have to
do all this manually.

exactly. If you have to write manually all the code to keep things
going, sure, then there's no problem. But the point of using a
framework is that you DON'T have to write the code manually: it's been
done for you.
As I say, I thought Hibernate kept track of what had changed within
the entity itself too. I don't know about NHibernate though, or how
serialization affects this.

If I'm not mistaken, they use the same mechanism as Linq to Sql does:
keep the original values in the session, and compare original with
current to see what the changes are.
I can't say I remember ever actually needing to attach/detach. I know
that it's available, but I can't remember needing to use it. If I
did need to use it, it clearly didn't cause me enough pain to make it
memorable.

If Hibernate 3 has in-entity change management through bytecode
manipulation (or post-compilation, as it's used by some .net o/r
mappers), then you don't have attach/detach problems as you don't need
to: the session doesn't keep track of the original values, they're
inside the entity object.
A Criteria object does contain the session (or at least has a
reference to it).

See
http://www.hibernate.org/hib_docs/v3/api/org/hibernate/impl/CriteriaIm
p l.html (watch for wrapping)

Hmm, indeed, I didn't know that. Strange decision IMHO.
That certainly needs to be clearly understood - but I'd say it's
useful, too, in cases where you need to do the same query multiple
times but with different parameters.

Isn't that going into the red zone of 'magic programming' ? I mean,
you have a local variable, even if it's a value typed variable, and you
have a linq query and by changing the variable's value, you manipulate
the linq query IF you're executing it at that moment.
Captured variables always need to be handled with care, and I can see
that it could well trip up novices, but I don't think it's
unreasonable.

Sure, but the code LOOKS the same as:
int foo = GetFoo();
bool b = (bar == foo);

b isn't suddenly affected if foo is changed after this. Though if I do:
var q = from x in metadata.SomeEntity
where x.Foo == foo
select x;

q is affected if I change foo AFTER this query and BEFORE execution.
Why is this comparison expression suddenly different from the
expression of b? The first is executed at that moment, the latter is
executed somewhere later, but you have to follow q to found out when.
That SHOULDN'T be important, simply because q LOOKS like a declaration,
constructed at the spot where q is declared.
No, the query itself is a query definition alone. It's only the
enumerator you get when you call GetEnumerator() which relates to the
result set.

It's an IEnumerable<T>, and therefore a resultset. If it would be a
definition alone, the queries of CHOPS would have been fetched, simply
because the declaration construction was with 'CHOPS'.
But equally the ability to reuse the query very simply by changing
the variables isn't there either. If you want to avoid the meaning of
the query being changed, either copy the variable values or just
don't change the variables.

Sure, though I don't like the similarity of the code statements which
tend to behave completely different at runtime. As this is a runtime
issue, it can lead to test-burdens.

They for example could have opted for a system where you could get
access to the parameters in the query to alter them for each run.
I've always found the deferred execution very natural, both in LINQ
and Hibernate - I create a query, do other things if necessary, and
then use the query when I'm ready. Why should creating the query
execute it immediately?

No, that's not what I meant. What I meant was: you declare a query
somewhere, e.g. in a method you call to formulate the query, then
execute it somewhere else, however the query declaration is already
fixed, so you can alter it, by changing a parameter on a predicate, but
not by changing a local variable's value.

What advantage does this have? well, plenty. For example you can write
generic code for constructing filters, which is executable on multiple
databases, e.g. your system is targeting multiple databases(oracle,
sqlserver, doesn't matter) at runtime, and you can achieve that.

With the 'executor is embedded inside the query' approach, you can't
do that, or at least not easy: you have to swap out the provider inside
the query object, and also: you need one when creating the query, which
is IMHO absurd, as you're constructing the query with meta-data, not
with a db-specific provider, because the mapping info is of no
relevance when creating the query.

so I can do:

EntityCollection<CustomerEntity> customers = new
EntityCollection<CustomerEntity>();
//call a method to specify the filter for the query.
RelationPredicateBucket filter = CreateFilter();

// then execute it on the adapter for the db currently used:
using(IDataAccessAdapter adapter = AdapterFactory.GetAdapter())
{
adapter.FetchEntityCollection(customers, filter);
}

generic code, not tied to a db specific context or meta-data. The
connection between db specific information (the mapping data) and the
generic entity info based query specification is made inside the
adapter, out of sight, but still allows me to flexibly write code which
targets whatever db supported at runtime.

That's IMHO why the deferred execution system with ties to the db
specific elements isn't great, added to the fuzzyness of the parameters
The equivalent of Execute() is either starting to enumerate the
results, or calling ToList(), or calling one of the aggregates such
as Count(). If you want to execute the query as soon as you've
defined it, that's easy enough to do - whereas if you want deferred
execution in a situation where defining the query does execute it
immediately, that's harder.

I'm not saying the query should be executed where it's defined. What
I'm saying is that the query should be a query, not a resultset as
well. To obtain the resultset, one has to ask an object to get the
result, defined by the query. This automatically has two advantages:

1) it avoids parameter changes because a value changed: it's very
clear: the query is the query defined when you actually defined it
2) it allows to formulate the query with generic code and avoids all
ties with db-specific code whatsoever, as these are defined in the
object you ask to execute the query.
Now, if you want to argue that it would be nice to be able to defer
execution further, allowing it to be unrelated to a session, I can
certainly see the benefit of that - commonly used queries could be
created at start-up and then just executed against arbitrary
sessions. I'd have no problem with that whatsoever - a definite
benefit. That's still deferred execution though.

I think the difference lies within the concept of what a declaration
means to the developer. When a developer writes:
string a = foo + bar;

then a is foo+bar right after that line.

if the developer writes:
var q = from c in nw.Customers
where c.CompanyName == a
select c;

then q isn't the set of customers. It's a definition of a query,
however with ties to the code it is created in, as it relies on the
value of a, because the query CONSTRUCTION actually happens when q is
executed. It should have obtained the values to construct the query
when it's declared, so changing a isn't affecting q afterwards.

Declaring a query using HQL or whatever query system the o/r mapper
uses, is declaring the query and nothing else. Passing it on to some
other method to execute it there with an adapter, session, context or
whatever is used is simply executing what's declared elsewhere WHEN it
was declared, with the state at that moment of declaration, not with
the state at the moment of execution.

I think my main problem is with that last difference: query execution
should be done with the state at declaration time, while in linq it's
query execution with the state at execution time. This is wrong IMHO,
as it's vague: the developer can easily make a mistake in this, and
even though 's/he then simply shouldn't do that'-applies here, if we
all would avoid what we shouldn't do, we wouldn't have bugs and we all
have bugs in our code no matter what, and the more clarity build into
the language, the less bugs will appear IMHO.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

James Crosswell said:
By the way, Hibernate uses the "lock" method of a session to perform the
"reattaching" of objects that might have been disconnected (for example
when writing ASP.NET applications)... which nHibernate supports as well.

So basically, that's the work around which avoids the need for a
CopyTo/CopyFrom method and the pseudocode I provided WAY up the chain of
messages in this thread.
Cool.

Other than that, for SQL Server work I'm really starting to like
nHibernate. It doesn't support other database backends as well as
something like XPO, but it gives much better support for the SQL Server
specific features (things like ROW_NUMBER...OVER for example, get used
internally if you specify page size and first page on query or criteria
objects when calling the List() and List<T>() methods).

Do you have examples of what it doesn't handle on different backends?
nHibernate certainly supports a fair number of databases - which is not
the same as saying it's *as well* supported on all the different
backends :)
All in all, I think I'm leaning towards the nHibernate camp for most
situations... although Linq for Entities or whatever they're calling it
these days seems like it will do much the same thing, so I wonder what
the future will hold for the other ORMs - certainly it'll make them a
harder sell if Linq comes with Visual Studio. Lot's of folks will start
to wonder "why bother with anything else?"

Indeed - LINQ to SQL isn't much of a threat to nHibernate as it only
supports SQL Server (and is missing various other features) but I think
ADO.NET Entities will be much more of a competitor.

On the other hand, open source frameworks can adapt more easily to
customer demand, which will always be in nHibernate's favour... and
it's got a head start. We'll have to see...
 
J

James Crosswell

Jon said:
Do you have examples of what it doesn't handle on different backends?
nHibernate certainly supports a fair number of databases - which is not
the same as saying it's *as well* supported on all the different
backends :)

It seems to do OK on Firebird but falls over on really simple stuff in
MS Access (e.g. create a Guid column in your Access table and watch the
schema generation fall over when it tries to create the table). There
are a number of documented issues with nHibernate and MS Access (due to
limited support in MS Access for stuff like subqueries) - whereas XPO
works flawlessly with MS Access. Like I say, there are good technical
reasons why XPO provides better support for these wimpy lesser databases.

Most of the time I wouldn't care but Access is still a nice choice if
you want to create a simple desktop application that you plan to roll
out to the masses. Embedded Firebird is about the only other option that
I've found to do the same - and indeed Firebird is WAY more powerful
than Access (I'd say it's got almost as much punch as MS SQL Server in
fact) so technically it's a superior choice... but very few people know
what an fdb file is or would have any idea how to read it using tools
other than my apps, so MS Access is quite a nice choice for end users
who may want to access the data the app produces and generate their own
reports using tools they're already familiar with.

XPO also has an option to use an "In Memory" persistence layer, which is
really nice in unit testing scenarios.

As usual you kinda have to pick the tool for the job and nHibernate
isn't always a clear winner - XPO still wins out in quite a few areas,
but generally speaking nHibernate is pretty solid.

Best Regards,

James Crosswell
Microforge.net LLC
http://www.microforge.net
 
J

James Crosswell

Frans said:
Hmm, indeed, I didn't know that. Strange decision IMHO.

They have a DettachedCriteria class as well, which you can use in
situations where you're not directly connected to your persistence layer
(e.g. Windows/Web clients for n-tier apps).

Best Regards,

James Crosswell
Microforge.net LLC
http://www.microforge.net
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
I think there's just one: entity instance gets loaded into entity
class instance, entity instance in-memory gets changed. Which parts?
that's tracked, by the change tracking mechanism.

I thought we'd briefly gone into the realms of noticing that the
database copy had changed. My bad.
Don't compare hibernate with nhibernate, as hibernate 3 is different
from nhibernate (which is based on hibernate2). It's my understanding
that nhibernate uses the 'keep a copy around of the original values in
the session' method.

Ah... my bad again - it's Hibernate 3 which I'm most familiar with by a
long shot.
Also in distributed scenario's where entities got fetched in session
instance S1, distributed to some client, altered there and then they
came back and have to be saved with session instance S2?

Nope, I never had to do that. Now, it's possible that my situation is a
very rare one - but I'm not so sure. I'm not saying that your scenario
is rare either, just that there are pros and cons. For what I had to
do, the session was a natural way of achieving it.
I'm not saying that having a class which manages the activity with the
DB is bad, on the contrary. What I'm saying is that the object used to
load the entities should be in effect stateless. So shouldn't be tied
to an entity object.

Except you then need to introduce a new concept when you want database
identity preserved. When a session is already part of your mindset, and
that's what you flush at the end when you want to save (rather than
telling each individual object or tree to save itself) it's already
straightforward.
Me neither, but the vast majority of people out there uses it. :)
:)


It's not that bad. Avoiding it has the implication that you have to
write the glue between control and object yourself, which can be
painful and buggy as well, simply because it's boring code.

Oh absolutely. The difference is that if you then need to fix something
to behave in a particular way, you can do so. I suspect there's a nice
solution waiting in the wings somewhere, but I haven't used it yet. As
I say, I have hopes for WPF...
exactly. If you have to write manually all the code to keep things
going, sure, then there's no problem. But the point of using a
framework is that you DON'T have to write the code manually: it's been
done for you.

Preferrably, yes - but given the rest of what Hibernate does for me,
I'm happy enough to do this myself. Alternatively, I can still use a
code generator to create entities to start with; a hybrid between the
"pure code generation" and "no code generation" solutions.

I've never had the need to do that with Hibernate, but I've always
known it's an option if I really wanted it :)
If I'm not mistaken, they use the same mechanism as Linq to Sql does:
keep the original values in the session, and compare original with
current to see what the changes are.

If, as you say, NHibernate is based on Hibernate2 instead of
Hibernate3, you're probably right.
If Hibernate 3 has in-entity change management through bytecode
manipulation (or post-compilation, as it's used by some .net o/r
mappers), then you don't have attach/detach problems as you don't need
to: the session doesn't keep track of the original values, they're
inside the entity object.

You may still want to detach or re-attach though, for other reasons -
to make the re-attached entity part of a new session and have it saved
when that session is saved, participate in uniqueness etc.

Isn't that going into the red zone of 'magic programming' ? I mean,
you have a local variable, even if it's a value typed variable, and you
have a linq query and by changing the variable's value, you manipulate
the linq query IF you're executing it at that moment.

It's probably slightly "magic" at the moment. I don't think it will be
for long. Sooner or later, devs are going to have to understand what
closures really are, and how with captured variables it really is the
variable which is captured, not its value.

Don't forget that it's not just LINQ to SQL we're talking about here -
this is true for LINQ to Objects as well as anywhere else you might use
an anonymous function.
Sure, but the code LOOKS the same as:
int foo = GetFoo();
bool b = (bar == foo);

b isn't suddenly affected if foo is changed after this. Though if I do:
var q = from x in metadata.SomeEntity
where x.Foo == foo
select x;

q is affected if I change foo AFTER this query and BEFORE execution.
Why is this comparison expression suddenly different from the
expression of b? The first is executed at that moment, the latter is
executed somewhere later, but you have to follow q to found out when.
That SHOULDN'T be important, simply because q LOOKS like a declaration,
constructed at the spot where q is declared.

And it *is* a declaration, but one which captures the variables. I
really don't think it's that bad, once you get used to it - it's very
similar to the whole "passing a reference type argument by value
doesn't mean all the data is copied, just the reference". Once you
accept where changes will be reflected, it's quite easy to work with
that.
It's an IEnumerable<T>, and therefore a resultset. If it would be a
definition alone, the queries of CHOPS would have been fetched, simply
because the declaration construction was with 'CHOPS'.

I don't see the logic in your first definition. Suppose it didn't
implement IEnumerable<T> but had an Execute() method which gave back an
IEnumerator<T> instead - in other words, we *just* changed a method
name and in doing so removed an interface implementation. How can that
change whether the object is itself a resultset or not? Or would it
still be a resultset in your view?

The query itself doesn't contain the data, it simply contains all the
information required to fetch the data. That doesn't make it a
resultset in my view. It's all a matter of definition though.
Sure, though I don't like the similarity of the code statements which
tend to behave completely different at runtime. As this is a runtime
issue, it can lead to test-burdens.

They for example could have opted for a system where you could get
access to the parameters in the query to alter them for each run.

You can do that for LINQ to SQL, but it's rather harder to do it for
LINQ in general, unless you want *everything* to go via expression
trees instead of lambda expressions, or make lambda expressions
significantly less powerful.

Variable capture is an important part of the power of LINQ IMO, but it
certainly does need to be understood - and I'm sure it'll bite people
while they get the hang of it.
No, that's not what I meant. What I meant was: you declare a query
somewhere, e.g. in a method you call to formulate the query, then
execute it somewhere else, however the query declaration is already
fixed, so you can alter it, by changing a parameter on a predicate, but
not by changing a local variable's value.

The fact that the *execution* doesn't happen at the *declaration* means
it's deferred execution by every other use of the phrase that I've ever
seen.

You're using "deferred execution" to mean what I'd probably describe as
"deferred parameter evaluation". It's an interesting topic to talk
about, but I'd prefer it if we didn't keep calling it "deferred
execution".

From MSDN:

<quote>
As stated previously, when the query is designed to produce a sequence
of values, the query variable itself only stores the query commands.
The actual execution of the query is deferred until you iterate over
the query variable in a foreach loop. This concept is referred to in
the documentation as deferred execution.
</quote>

Now, you also bring up another topic: separating the query from its
data connection. You can't easily separate it from its whole data
context in terms of the types involved, because at that point you lose
a lot of the benefits of LINQ - but I could certainly envisage
separating it from a "live" context. I don't know what LINQ to Entities
will have, but I wouldn't be at all surprised to see that in there.

I think in LINQ to SQL you can do this with CompiledQuery.Compile, but
I haven't tried it myself. I don't know whether this *also* keeps the
values of variables at the point of compilation, but if it does then
it's a fairly simple way of keeping people out of trouble.

I think the rest of your post pretty much falls into the bits I've
talked about above, so I've snipped rather than repeating myself :)
 
F

Frans Bouma [C# MVP]

Jon said:
Oh absolutely. The difference is that if you then need to fix
something to behave in a particular way, you can do so. I suspect
there's a nice solution waiting in the wings somewhere, but I haven't
used it yet. As I say, I have hopes for WPF...

What I've seen is that they more or less adopted the asp.net way of
databinding. So nothing really helpful, but it allows you to do
'declarative programming'... yay! :/
It's probably slightly "magic" at the moment. I don't think it will
be for long. Sooner or later, devs are going to have to understand
what closures really are, and how with captured variables it really
is the variable which is captured, not its value.

'Closures', now there's a word that's overloaded too many times :).
Are we talking about sets or graph paths? :)
Don't forget that it's not just LINQ to SQL we're talking about here
- this is true for LINQ to Objects as well as anywhere else you might
use an anonymous function.

True. I saw your blogpost today with the query which looked a bit
obscure to me. Sure, it's understandable if you look closely, but
that's not the point. The point is that one easily overlooks the real
specifics of the query: a human in general has a hard time grasping any
form of computer language, simply because the human has to interpret
the code before understanding what it REALLY does.

Making it more obscure than clear doesn't really help the human
reading the code, which directly implies the chance for more bugs.
And it is a declaration, but one which captures the variables. I
really don't think it's that bad, once you get used to it - it's very
similar to the whole "passing a reference type argument by value
doesn't mean all the data is copied, just the reference". Once you
accept where changes will be reflected, it's quite easy to work with
that.

Though it's a way to make things really complicated without a lot of
effort, or better: you somewhat have to PUT IN effort to make it very
clear. Perhaps not for the person who wrote the code and who has spend
weeks designing and writing it, but for the poor sods who have to read
the code after the genius left for another job.
I don't see the logic in your first definition. Suppose it didn't
implement IEnumerable<T> but had an Execute() method which gave back
an IEnumerator<T> instead - in other words, we just changed a method
name and in doing so removed an interface implementation. How can
that change whether the object is itself a resultset or not? Or would
it still be a resultset in your view?

Because it implements IEnumerable, it by itself is an enumerable
resource. This IMHO implies that it's a set.

If it had an Execute method (some o/r mappers add that method), the
thing still is that it's not a declaration alone. You can't execute a
declaration without the executor who interprets the query so it gets
executed.

With Linq they placed the executor inside the declaration. Why, I have
no idea, the only guess I have is that it's 'easier' for some people
who have no clue what they're doing anyway, but then again, these
people will have a hard time with some areas of linq anyway so why
bother introducing this 'feature' for them.

The analogy is a simple SQL string: you can't execute it without a SQL
engine. To use it as a metaphore here, this is the same as linq:
string query = "SELECT * FROM Customers WHERE Country='USA'";

foreach(var customer in query)
{
// do something with customer
}

Everyone will say: "you can't execute a string". No of course you
can't, as it contains the declaration of a query. You need an execution
engine to execute the query. That would have been better IMHO, because
it separates declaration from execution, which are combined in a linq
query.
The query itself doesn't contain the data, it simply contains all the
information required to fetch the data. That doesn't make it a
resultset in my view. It's all a matter of definition though.

Something which is enumerable, isn't that semantically a sequence, a
set for you?

Because the linq query isn't a declaration alone, you can't do things
with it like pass it to an execution engine of choice, you have to
execute it with the engine inside the query, or you have to be lucky to
be using a linq provider which gives you this flexibility ;).
The fact that the execution doesn't happen at the declaration means
it's deferred execution by every other use of the phrase that I've
ever seen.

You're using "deferred execution" to mean what I'd probably describe
as "deferred parameter evaluation". It's an interesting topic to talk
about, but I'd prefer it if we didn't keep calling it "deferred
execution".

Sure, I don't like word-games, so I don't mind. The thing is though
that I find it very important that people understand that the place
where the executor is located in a linq query is an important issue,
and not something you can simply wave away as 'not that important'. The
thing is that it severily limits the user of the framework in how s/he
wants to use the query, while it doesn't give the user any real
benefits, except perhaps the 1 line of code they don't have to type now.
Now, you also bring up another topic: separating the query from its
data connection. You can't easily separate it from its whole data
context in terms of the types involved, because at that point you
lose a lot of the benefits of LINQ - but I could certainly envisage
separating it from a "live" context. I don't know what LINQ to
Entities will have, but I wouldn't be at all surprised to see that in
there.

Depends on if they still move ahead towards a multi-db design. I have
the feeling they won't, as it could give their competitors in the DB
market the same advantage SqlServer will have now (as IBM and Oracle
have already made their DB engines capable of running these kind of
engines in-process so it won't be hard for them to move an EDM layer
into their databases, IMHO).

FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
What I've seen is that they more or less adopted the asp.net way of
databinding. So nothing really helpful, but it allows you to do
'declarative programming'... yay! :/

I have no problem with the declarative side of things, and the big
advantage from my point of view is that we can do things in the XAML
directly when the designer doesn't support them, and use the designer
for the rest of the time. It's the best designer/hand-crafting amalgam
I've seen yet.
'Closures', now there's a word that's overloaded too many times :).
Are we talking about sets or graph paths? :)

Um, closures the fairly standard computer science term:
http://en.wikipedia.org/wiki/Closure_(computer_science)
True. I saw your blogpost today with the query which looked a bit
obscure to me. Sure, it's understandable if you look closely, but
that's not the point. The point is that one easily overlooks the real
specifics of the query: a human in general has a hard time grasping any
form of computer language, simply because the human has to interpret
the code before understanding what it REALLY does.

That was deliberately nasty code. It would be fairly hard to
accidentally do that sort of thing, and I would certainly discourage
its use in real code.
Though it's a way to make things really complicated without a lot of
effort, or better: you somewhat have to PUT IN effort to make it very
clear. Perhaps not for the person who wrote the code and who has spend
weeks designing and writing it, but for the poor sods who have to read
the code after the genius left for another job.

Well, I think query expressions are clearer than text - as well as
being more easily verifiable by the compiler, of course (within the
bounds of being valid expressions - the compiler can't determine
whether or not there will be a valid SQL translation).

As for the captured variable aspect of it - I still believe it's a
matter of education and becoming used to them, as well as not abusing
them. You can certainly get yourself into trouble pretty easily, but
then again you can also *avoid* getting yourself into trouble fairly
easily.
Because it implements IEnumerable, it by itself is an enumerable
resource. This IMHO implies that it's a set.

To me it impllies it's the source of a sequence.
If it had an Execute method (some o/r mappers add that method), the
thing still is that it's not a declaration alone. You can't execute a
declaration without the executor who interprets the query so it gets
executed.

Well, it's a few things:
1) It relates to a schema - crucial for keeping type safety etc
2) It knows about its current session
3) It's the query itself
4) It's the means of executing the query

I would have no objection to the idea of it not implementing
IEnumerable<T> directly but instead having an Execute method taking the
session. It would make a few bits of code a bit more long-winded, but
that's all.

Having said that, I also don't have much problem with the way it's been
done.
With Linq they placed the executor inside the declaration. Why, I have
no idea, the only guess I have is that it's 'easier' for some people
who have no clue what they're doing anyway, but then again, these
people will have a hard time with some areas of linq anyway so why
bother introducing this 'feature' for them.

The analogy is a simple SQL string: you can't execute it without a SQL
engine. To use it as a metaphore here, this is the same as linq:
string query = "SELECT * FROM Customers WHERE Country='USA'";

foreach(var customer in query)
{
// do something with customer
}

Everyone will say: "you can't execute a string". No of course you
can't, as it contains the declaration of a query. You need an execution
engine to execute the query. That would have been better IMHO, because
it separates declaration from execution, which are combined in a linq
query.

Well, you need *context* - and that's what the DataContext (and the
tables off it) really provides.

I really don't think it's nearly as ugly as you seem to be making it
out to be though.
Something which is enumerable, isn't that semantically a sequence, a
set for you?

Sequence and set certainly aren't the same thing, but I'd say that
something which is enumerable is either a sequence in itself or is a
way of getting at a sequence. Think of it in terms of the method:
GetEnumerator returns an enumerator for the data. There's nothing
inconsistent in that being applied to a data source rather than
something which already contains the data.
Because the linq query isn't a declaration alone, you can't do things
with it like pass it to an execution engine of choice, you have to
execute it with the engine inside the query, or you have to be lucky to
be using a linq provider which gives you this flexibility ;).

Sure - but the beautiful thing about LINQ (instead of LINQ to SQL) is
that different providers can choose their own way to go on this and
people can still use broadly the same query syntax. If you choose to
implement LINQ in a way which requires a context to be provided to it,
that's fine - and the query will still be easily recognisable to
someone who has used LINQ to NHibernate, or LINQ to SQL.
Sure, I don't like word-games, so I don't mind.

It's not a word game - it's a matter of accepting the common
understanding of a term. If you use the phrase "deferred execution" to
mean "deferred parameter evaluation" you *will* confuse people.
The thing is though
that I find it very important that people understand that the place
where the executor is located in a linq query is an important issue,
and not something you can simply wave away as 'not that important'. The
thing is that it severily limits the user of the framework in how s/he
wants to use the query, while it doesn't give the user any real
benefits, except perhaps the 1 line of code they don't have to type now.

How does it "severely limit" the user of the framework? If you don't
want to change the parameters, don't change the values of the variables
- I don't think it's something that people are likely to do
accidentally anyway, to be honest.
Depends on if they still move ahead towards a multi-db design. I have
the feeling they won't, as it could give their competitors in the DB
market the same advantage SqlServer will have now (as IBM and Oracle
have already made their DB engines capable of running these kind of
engines in-process so it won't be hard for them to move an EDM layer
into their databases, IMHO).

I will be very disappointed if they don't go for a multi-db design.
It's not going to stop other projects from going multi-db (such as LINQ
to NHibernate) - it would just limit the usefulness of ADO.NET
Entities. Put it this way: people aren't likely to change their
database choice based on what ADO.NET 3 supports, but they may well
change their framework based on their choice of database. MS would be
foolish not to understand that.
 
F

Frans Bouma [C# MVP]

Jon said:
Um, closures the fairly standard computer science term:
http://en.wikipedia.org/wiki/Closure_(computer_science)

Yet, I find it a confusing term, as the closures in logic and the
closures in math are often used in our field as well, take for example
the mathematics specific closure definition, especially in the context
of a query ;), hence my question.
That was deliberately nasty code. It would be fairly hard to
accidentally do that sort of thing, and I would certainly discourage
its use in real code.

I don't know if your particular example isn't going to pop up
regularly, but I do know that most developers think that a query is an
imperative piece of code, and often fall into a trap where they want to
re-use elements of the result of the query as the input of the query,
which is what you illustrated.
Well, I think query expressions are clearer than text - as well as
being more easily verifiable by the compiler, of course (within the
bounds of being valid expressions - the compiler can't determine
whether or not there will be a valid SQL translation).

As for the captured variable aspect of it - I still believe it's a
matter of education and becoming used to them, as well as not abusing
them. You can certainly get yourself into trouble pretty easily, but
then again you can also avoid getting yourself into trouble fairly
easily.

However it's not consistent: a variable passed to an extension method
used INSIDE the query is passed as the value immediately, but the same
variable passed to another extension method in a lambda is passed as a
memberaccess expression and not passed as its value.

I find that inconsistent behavior, because the elements in the query
are all elements of that query, so why is one more special than the
other?

var q = (from c in customers
where c.Orders.Count == amount
select c).Take(amount);

amount is a variable. The first amount is passed as a memberaccess
expression because it's resulting in a lambda (that isn't shown here,
mind you, all I see is a boolean expression in THIS code), and the
second amount is passed as a value.

When I now change amount, before execution of q, it will change the
where lambda, but not the Take method.

Now, and how is this consistent? Sure, there's always an explanation,
however calling THIS consistent makes a lot of other stuff look
consistent all of a sudden as well...
To me it impllies it's the source of a sequence.

List<T> also implements IEnumerable. Or most other collections for
that matter. Does IEnumerable on this object an outside resource? Or
does it imply you enumerate over the data INSIDE the object?

To me it implies the latter, and I fail to see why it's all of a
sudden completely different with a Queryable object.
Well, it's a few things:
1) It relates to a schema - crucial for keeping type safety etc

no, it relates to a model. The relational schema is totally not
relevant here. Example: I can create a linq query using entity meta
data and execute it on oracle or on sqlserver, with different mappings.
Does it matter? No. Not for the query. For example, say I have entity
Customer mapped onto a view in oracle and on a table in sqlserver...
2) It knows about its current session

but why?
3) It's the query itself
4) It's the means of executing the query

Same as with 2): why is this? What big problem does this solve? IMHO
it only creates problems.
I would have no objection to the idea of it not implementing
IEnumerable<T> directly but instead having an Execute method taking
the session. It would make a few bits of code a bit more long-winded,
but that's all.

that would already be better.
Having said that, I also don't have much problem with the way it's
been done.

so you're comfortable with creating a query q in method a, pass it to
method b and therefore requiring a session in method a, which is for
example not possible. Say I want to formulate what I want in method A,
but as I'm not allowed to directly use database access code, I have to
pass the specification to a layer where it IS possible to use data
access code. I now can't formulate the query in method A, I have to
pass what I want in a DIFFERENT specification method. Also, where I
specify it, I have to decide which DB to use if I have a multi-db
design.

For Linq to sql that's not of their concern, so you also don't see a
solution for that in their design, but it IS a problem.
Well, you need context - and that's what the DataContext (and the
tables off it) really provides.

I really don't think it's nearly as ugly as you seem to be making it
out to be though.

If you think I'm alone in this, you're mistaken. ;). I just find it
rather odd, that a lot of people spend hours and hours a day to
separate concerns in their design, yet this in-your-face combination of
concerns is apparently acceptable.
Sequence and set certainly aren't the same thing, but I'd say that
something which is enumerable is either a sequence in itself or is a
way of getting at a sequence. Think of it in terms of the method:
GetEnumerator returns an enumerator for the data. There's nothing
inconsistent in that being applied to a data source rather than
something which already contains the data.

Sequence and set aren't equal, true, but in this case, where linq
queries are executed on the db, the difference IS a bit artificial, as
the query IS fetched first, so the query object contains the whole
resultset requested, onto which the enumerator is created.

It's not as if the enumerator is represented by a life cursor on a
resultset in the db.

A forward only cursor on the resultset which is already read in full,
isn't that helpful in a lot of cases either, you want the set to work
with. ToList() creates a copy, as it's already in a set. Little things
which show that:

IList result = q.Execute<IList>();

would have been better (not ideal, the query still executes itself).

I mean: IF the user is interested in a forward only cursor on the
resultset, give the user a forward only cursor on the resultset.
However the set is already fetched, in full, so in that case, simply
return the set and be done with it, as the user wants the set, not a
cursor, as the set is what the query defines and what the user wanted.
Sure - but the beautiful thing about LINQ (instead of LINQ to SQL) is
that different providers can choose their own way to go on this and
people can still use broadly the same query syntax.

That's only true on paper. Every Linq provider will implement
extension methods, which are specific for that linq provider. For
example we have extension methods for paging (as skip/take isn't going
to cut it, in this case) and adding prefetch paths to the query, and
likely more when we're completely done programming. The thing is that
others have different extension methods, and it's precisely THOSE
extension methods which make things interesting.

Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out of
hand. Take for example a silly method like .Distinct(), which fails on
linq to sql when a distinct voilating type is detected.

As Linq relies on extension methods, it simply depends on what kind of
extension methods are implemented for the provider used. For example,
DefaultIfEmpty(), the stupid method which signals left/right join. Is
it possible to rely on this method to get a left/right join, something
FUNDAMENTAL to SQL? No.

So, it looks good on paper, but the common demeanor is pretty small in
this case.
If you choose to
implement LINQ in a way which requires a context to be provided to
it, that's fine - and the query will still be easily recognisable to
someone who has used LINQ to NHibernate, or LINQ to SQL.

To some extend.

It's always easy to explain things using boring simple queries. Those
aren't the problem. The problems arise when things get more complicated
than for example a single value list from a single entity set: at that
moment, the core C#/VB.NET syntaxis for queries isn't enough or will
differ a lot from the expression tree generated so the o/r mapper will
likely require the use of extension methods on one hand and will have
to make decisions what caused this particular subtree to be there on
the other hand (as that's not always obvious, like with DefaultIfEmpty,
or multiple from clauses which result in nested SelectMany calls)

It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join and
vice versa. This can be solved with extension methods, but this ties
the query to the provider used.

I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has a
price, and that price is giving up provider-independency.
How does it "severely limit" the user of the framework? If you don't
want to change the parameters, don't change the values of the
variables - I don't think it's something that people are likely to do
accidentally anyway, to be honest.

Not only the parameter stuff, also the ability to specify on which
context/session/adapter they want to execute the query is a thing which
is hard/not possible to do. It also implies that when creating the
query you NEED a session/context/adapter, which is in a lot of cases
not possible, simply because the context/session isn't known at that
point, or not available because at that spot it's not allowed to cut
corners and access the db for example.
I will be very disappointed if they don't go for a multi-db design.

One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd parties is
that their original design, where the ado.net provider had little to do
to get things done has been changed to make it a lot of work to get
things done for the ado.net provider, which means that the 3rd party
ado.net provider has to implement a lot of code to work with the EDM.
It's not going to stop other projects from going multi-db (such as
LINQ to NHibernate) - it would just limit the usefulness of ADO.NET
Entities. Put it this way: people aren't likely to change their
database choice based on what ADO.NET 3 supports, but they may well
change their framework based on their choice of database. MS would be
foolish not to understand that.

If they had understood it, they wouldn't have made IProvider internal
for linq to sql, so linq to sql (which had a multi-db design at first)
could be used on multiple db's as well.

EDM is a core part of sqlserver 2008. Any db vendor not having an EDM
provider undermines the success of EDM: if only MS releases a provider,
which only works for sqlserver 2008, will it succeed? Unlikely, because
it's a separate download for developers, it's not part of the .net
framework. If I was oracle or IBM, I would create the provider, but not
release it until I really would have to (read: when EDM turns out to be
a big data-access success so developers in general start to look for
databases which support it). Looking at the reluctance of Oracle to
release 11g for windows, I wouldn't be surprised if they're not that
enthousiastic for releasing a provider for EDM.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
Yet, I find it a confusing term, as the closures in logic and the
closures in math are often used in our field as well, take for example
the mathematics specific closure definition, especially in the context
of a query ;), hence my question.

I thought it was reasonably unambiguous in this context - given that we
were talking about lambda expressions, the computer science use seemed
fairly obvious to me. Not to worry.
I don't know if your particular example isn't going to pop up
regularly, but I do know that most developers think that a query is an
imperative piece of code, and often fall into a trap where they want to
re-use elements of the result of the query as the input of the query,
which is what you illustrated.

Well, only time will tell - but I'd be surprised to see code like that
written with any significant expectations of "simple" behaviour.
However it's not consistent: a variable passed to an extension method
used INSIDE the query is passed as the value immediately, but the same
variable passed to another extension method in a lambda is passed as a
memberaccess expression and not passed as its value.

Firstly, it's not to do with extension methods at all. It's to do with
whether parameter is a lambda expression or not, and that's *all* it
has to do with.

Now, as for consistency - it's consistent once you understand which
parts of a query expression are actually a shorthand for lambda
expressions. Is someone who doesn't want to learn the basics of query
expressions going to find that confusing? Yes. Should someone who
doesn't want to learn the basics of query expressions be using them in
production code? Absolutely not.

There are all kinds of areas where if you have no idea what you're
doing, you can go wrong - there's nothing new in that. Lambda
expressions and query expressions aren't that hard, and education is
the key IMO.

Apply your consistency test to a mutable struct vs a mutable class,
with a value being passed to a method and then changed - you'll see
exactly the same "inconsistency". Does that mean we shouldn't have the
distinction between value types and reference types? No - it just means
that people need to know about the difference between them.
List<T> also implements IEnumerable. Or most other collections for
that matter. Does IEnumerable on this object an outside resource? Or
does it imply you enumerate over the data INSIDE the object?

To me it implies the latter, and I fail to see why it's all of a
sudden completely different with a Queryable object.

There's nothing to say it *has* to be an outside resource, but there's
nothing inherently *preventing* it from being an outside resource. You
are, after all, calling GetEnumerator() - doesn't that suggest that it
could be taking some action?
no, it relates to a model. The relational schema is totally not
relevant here.

Well, I view the model as a schema - just not the *relational* schema.
However, I'm very happy to use the term "model" here instead. Either
way, it's important for the LINQ query.

Yes, I know you dislike it - I was merely clarifying what the query
knows about.
Same as with 2): why is this? What big problem does this solve? IMHO
it only creates problems.
Ditto.


that would already be better.

I still say it's less convenient in many situations, but I wouldn't
mind much either way.
so you're comfortable with creating a query q in method a, pass it to
method b and therefore requiring a session in method a, which is for
example not possible. Say I want to formulate what I want in method A,
but as I'm not allowed to directly use database access code, I have to
pass the specification to a layer where it IS possible to use data
access code. I now can't formulate the query in method A, I have to
pass what I want in a DIFFERENT specification method. Also, where I
specify it, I have to decide which DB to use if I have a multi-db
design.

You can use CompiledQuery for that sort of thing.
For Linq to sql that's not of their concern, so you also don't see a
solution for that in their design, but it IS a problem.

But it's not a problem with LINQ itself. Providers can choose to
implement things that way if they want to.
If you think I'm alone in this, you're mistaken. ;). I just find it
rather odd, that a lot of people spend hours and hours a day to
separate concerns in their design, yet this in-your-face combination of
concerns is apparently acceptable.

Well, you can work round it with CompiledQuery when you want to, and it
makes things slightly simpler when you *do* just want to execute a
query in a particular session.

As I've said, it's the same with Hibernate: you normally create a
Criteria against a Session. You *can* used a DetachedCriteria, but the
more normal operation is to create a Criteria when you need to, against
a session. I've used that model with no problems, and the sky didn't
fall down.
Sequence and set aren't equal, true, but in this case, where linq
queries are executed on the db, the difference IS a bit artificial, as
the query IS fetched first, so the query object contains the whole
resultset requested, onto which the enumerator is created.

It's not as if the enumerator is represented by a life cursor on a
resultset in the db.

Is that definitely true in all cases? I can see situations where it
would be very handy to effectively get a DataReader back turning things
into anonymous types on the fly. (Using full entities would require
remembering them all for uniqueness purposes, which would negate a lot
of the point of it, of course.)

Even if it's not true for LINQ to SQL, it could be true for other LINQ
providers in the future.
A forward only cursor on the resultset which is already read in full,
isn't that helpful in a lot of cases either, you want the set to work
with.

Sometimes you do, sometimes you just want to process a record at a
time. Even if you're batching things, you may well not want to read the
whole batch in one go.
ToList() creates a copy, as it's already in a set. Little things
which show that:

IList result = q.Execute<IList>();

would have been better (not ideal, the query still executes itself).

I mean: IF the user is interested in a forward only cursor on the
resultset, give the user a forward only cursor on the resultset.
However the set is already fetched, in full, so in that case, simply
return the set and be done with it, as the user wants the set, not a
cursor, as the set is what the query defines and what the user wanted.

Sure - it's trivial to turn a cursor into a fully loaded result set.
The reverse isn't feasible, of course.
That's only true on paper. Every Linq provider will implement
extension methods, which are specific for that linq provider. For
example we have extension methods for paging (as skip/take isn't going
to cut it, in this case) and adding prefetch paths to the query, and
likely more when we're completely done programming. The thing is that
others have different extension methods, and it's precisely THOSE
extension methods which make things interesting.

In *some* queries it is - but in many cases a simple "from x where y
select z" will be perfectly fine.
Sure, they can all use the simple syntaxis of selecting a set of
entities from a set, using a simple filter, but it quickly gets out of
hand. Take for example a silly method like .Distinct(), which fails on
linq to sql when a distinct voilating type is detected.

If you're asking for distinct values and the type violates
distinctness, why shouldn't it fail? Perhaps an example would help.
As Linq relies on extension methods, it simply depends on what kind of
extension methods are implemented for the provider used. For example,
DefaultIfEmpty(), the stupid method which signals left/right join. Is
it possible to rely on this method to get a left/right join, something
FUNDAMENTAL to SQL? No.

So, it looks good on paper, but the common demeanor is pretty small in
this case.

Many, many queries are simple ones in my experience. It's nice to have
the ability to use the full power of the specific database or LINQ
provider when you need to, but it's also nice to have consistency of
querying when that's feasible.
To some extend.

It's always easy to explain things using boring simple queries. Those
aren't the problem. The problems arise when things get more complicated
than for example a single value list from a single entity set: at that
moment, the core C#/VB.NET syntaxis for queries isn't enough or will
differ a lot from the expression tree generated so the o/r mapper will
likely require the use of extension methods on one hand and will have
to make decisions what caused this particular subtree to be there on
the other hand (as that's not always obvious, like with DefaultIfEmpty,
or multiple from clauses which result in nested SelectMany calls)

If the provider has been well designed, the query should still be
readable, even if parts of it would not be available in other
providers.
It gets really different when things like tweakability are added to
the equation. With linq, people have less control over how the SQL
looks like. This is actually pretty bad in the long run as the SQL
might for example use a subquery where it should have used a join and
vice versa. This can be solved with extension methods, but this ties
the query to the provider used.

Yes, if you absolutely have to tweak things, then that's fine - and I
fully believe that you ought to closely examine the SQL generated by
your LINQ provider - but there are many simple queries which don't need
tweaking.
I'm not saying this is a bad thing per se, Linq offers extension
methods, which are ideal for solving these problems, however it has a
price, and that price is giving up provider-independency.

Yup - so you pay that price when you need to, and when you don't need
to you've still got independence.
Not only the parameter stuff, also the ability to specify on which
context/session/adapter they want to execute the query is a thing which
is hard/not possible to do.

CompiledQuery makes it pretty easy to execute a query against an
arbitrary context.
It also implies that when creating the
query you NEED a session/context/adapter, which is in a lot of cases
not possible, simply because the context/session isn't known at that
point, or not available because at that spot it's not allowed to cut
corners and access the db for example.

Again, CompiledQuery doesn't require this.

[ADO.NET Entities]
One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd parties is
that their original design, where the ado.net provider had little to do
to get things done has been changed to make it a lot of work to get
things done for the ado.net provider, which means that the 3rd party
ado.net provider has to implement a lot of code to work with the EDM.

That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.
If they had understood it, they wouldn't have made IProvider internal
for linq to sql, so linq to sql (which had a multi-db design at first)
could be used on multiple db's as well.

Well, don't forget that LINQ to SQL is (as I understand it) a very
different team to the ADO.NET side of things.
EDM is a core part of sqlserver 2008. Any db vendor not having an EDM
provider undermines the success of EDM: if only MS releases a provider,
which only works for sqlserver 2008, will it succeed? Unlikely, because
it's a separate download for developers, it's not part of the .net
framework.

So is the Oracle data provider, but that's pretty well used. Ditto
NUnit :)
If I was oracle or IBM, I would create the provider, but not
release it until I really would have to (read: when EDM turns out to be
a big data-access success so developers in general start to look for
databases which support it). Looking at the reluctance of Oracle to
release 11g for windows, I wouldn't be surprised if they're not that
enthousiastic for releasing a provider for EDM.

Hmm... I guess we'll have to wait and see. Still, there'll always be
other providers around. If MS really wants to lose ORM customers to
NHibernate etc, they will...
 
F

Frans Bouma [C# MVP]

(snipped a lot away, as a lot has been said already in this thread :))
Firstly, it's not to do with extension methods at all. It's to do
with whether parameter is a lambda expression or not, and that's all
it has to do with.

sure, but you have to realize that. It's not always obvious. One could
argue that you have to know what everything ends up in but I find that
an excuse: the expression trees created are rather big sometimes and
sometimes different than you'd expect. A developer of code is living on
the level of C#, not on the level of expression trees and lambdas if
s/he uses native C# code. It's the same as knowing which IL is being
produced or which x86 code is being produced by the jit. I don't care,
and I also don't want to NEED to care, because if I need to care, the
abstraction level I'm living at is a facade, and there's no difference
with C++ with inline asm.
Now, as for consistency - it's consistent once you understand which
parts of a query expression are actually a shorthand for lambda
expressions. Is someone who doesn't want to learn the basics of query
expressions going to find that confusing? Yes. Should someone who
doesn't want to learn the basics of query expressions be using them
in production code? Absolutely not.

I don't see why one has to understand which parts are lambda's in the
expression trees (!) and which aren't if NO statements written use real
lambda's, all code is written using C# code, no lambdas in sight.

Also, my example of using the same variable in the where and in the
Take method doesn't show me why one is updated at execution time and
the other one isn't updated: why aren't BOTH updated? Because one is
translated into a lamdba expression in an expression tree under the
hood? Why do I even need to know that it is translated into an
expression tree? If I have to, it's a leaky abstraction.
There are all kinds of areas where if you have no idea what you're
doing, you can go wrong - there's nothing new in that. Lambda
expressions and query expressions aren't that hard, and education is
the key IMO.

query expressions aren't hard in general, but the details will kill a
lot of dreams, but that's MS' problem.

The thing is though that you can teach a C# developer how queries
work, and you don't have to educate them with how expression trees
work. That's also info not NEEDED to write correct queries on the
abstraction level of C#.
Apply your consistency test to a mutable struct vs a mutable class,
with a value being passed to a method and then changed - you'll see
exactly the same "inconsistency". Does that mean we shouldn't have
the distinction between value types and reference types? No - it just
means that people need to know about the difference between them.

I used the SAME variable in two different places in the query. One
gets updated, the other one isn't when the variable changes value.
Sorry, but that's inconsistent behavior, and the reason is actually
irrelevant, because at the level of abstraction where the code is
written, there ARE NO expression trees and the query didn't contain any
lambdas, these are only created at runtime when the query is executed
and an expression tree is created. That tree is often different than
what you've written in code. Relevant info? Why? Why does someone
writing linq queries have to think about expression trees? If that's
required info, why is this abstraction leaking into the level of C#'s
abstraction?
You can use CompiledQuery for that sort of thing.

I looked up the (almost non-existend) docs about CompiledQuery but it
didn't tell me a lot of info. For example: is this compiled query
always usable, no matter what the provider is? No idea.
Is that definitely true in all cases? I can see situations where it
would be very handy to effectively get a DataReader back turning
things into anonymous types on the fly. (Using full entities would
require remembering them all for uniqueness purposes, which would
negate a lot of the point of it, of course.)

If you want to kill your DB's performance, you should do that. :)
Keeping open a cursor means you keep open a resultset on the server.
That takes resources. If your resultset is pretty big, it can eat more
resources than you want to give up for a longer period of time.

That's also why processing on the client is better of using batch
processing, i.e. page through a resultset.
Even if it's not true for LINQ to SQL, it could be true for other
LINQ providers in the future.

I doubt it.
Sometimes you do, sometimes you just want to process a record at a
time. Even if you're batching things, you may well not want to read
the whole batch in one go.

then you page through the batch. What if processing a row costs 2
seconds and you have 1000 rows, that's 2000 seconds before the
resultset is closed.
If you're asking for distinct values and the type violates
distinctness, why shouldn't it fail? Perhaps an example would help.

Because entity identity is verifyable when the data is read from the
db. (PK). So you can limit on teh client if you have to, by reading
enough rows till you're done. It's slightly slower but it's a way to
solve it.

Northwind: employee 1:n order. If you want all employees who have an
order filed for customers from the UK, you could do: (I use '*' for
simplicity here)

select e.*
from employees e inner join orders o on
e.EmployeeID = o.EmployeeID
inner join customers c
on o.CustomerID = c.CustomerID
where c.Country = 'UK'

Though, you'll get a lot of duplicates. So you apply distinct. But
that's not possible. So you have to filter on the client. You can,
because the PK identity of the entity instance (== the data!) is
available. No O/R mapper should give up with such a silly query.

What's better though is that you can also use subqueries to avoid the
duplicates:

select *
from employees where employeeID IN
(
select employeeID from
orders where CustomerID in
(
select customerID from customers
where country = 'UK'
)
)

No duplicates, no distinct needed. If you look closely, the execution
plan is the same on most db's.

This is what I meant with tweakability below. In Linq I can't specify
to use the subquery train, I have to use joins, or rely on the provider
to be gentle for me. However, that last bit isn't possible: the o/r
mapper then has to know the db statistics about the sizes of the data
in the DB.

A good query system allows this kind of simple tweakability.
Many, many queries are simple ones in my experience. It's nice to
have the ability to use the full power of the specific database or
LINQ provider when you need to, but it's also nice to have
consistency of querying when that's feasible.

Still, I'd have liked if they would have spend more time on this. They
could have added more standard elements but decided not to.
Yes, if you absolutely have to tweak things, then that's fine - and I
fully believe that you ought to closely examine the SQL generated by
your LINQ provider - but there are many simple queries which don't
need tweaking.

You tell that to that team of DBAs which refuse to run your software
on their many TB big databases because they queries are too slow.

If you want to use an O/R mapper as a developer, and you find a team
of DBAs on the other side of the table and they refuse to accept the
fact that the queries are now generated by a program, you really have
to have your act together and proof that your queries are fast and
flexible to the schema and size of the table data, otherwise you're off
to hammering out stored proc call code all day.

This isn't something I cooked up just to be cocky. Many times we've
received emails from developers who wanted to use an o/r mapper and
they had to convince their boss and DBAs that the SQL produced is fast,
that the queries are tunable/tweakable so the DBAs will be happy.

If the O/R mapper doesn't allow flexibility in that area, the o/r
mapper isn't going to be used a lot in the enterprise area where tables
can have millions of rows and each join has to be done with care.

So, how is Linq supporting tweakability here? Does it for example
offer a simple IN subquery element, so the DEVELOPER can tweak the
code, based on the DBA's advice? Not really. So the DEVELOPER, when
asked by the DBA if a costly join operation can be changed to a
subquery using query or vice versa (as they have both sweetspots), can
only answer: No, I can't do that. 10 to 1 the DBA will then say a
stored proc will be used instead.

That leaves the smaller simple stuff for the o/r mapper, and keeps in
place the myth that stored procs are the way to go when it comes to
serious data-access. While it's unnecessary, a query system should be
flexible enough to offer these kind of tweaks.
Yup - so you pay that price when you need to, and when you don't need
to you've still got independence.

if you have to branch out to custom code, independence is gone for
100%.

That said, I don't think it's possible to create a 100% independent
query system. It just has to clear that for the people who think that
Linq WILL bring you that independent system, it's a facade, there is no
such thing as an independent system: you ALWAYS will have o/r mapper
specific code in your application, unless you abstract away everything,
which also has a (sometimes big) pricetag.
[ADO.NET Entities]
One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd
parties is that their original design, where the ado.net provider
had little to do to get things done has been changed to make it a
lot of work to get things done for the ado.net provider, which
means that the 3rd party ado.net provider has to implement a lot of
code to work with the EDM.

That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.

though if it takes a lot of work, it will take a long time before open
source databases for example have implemented a provider. These things
aren't simple.
Well, don't forget that LINQ to SQL is (as I understand it) a very
different team to the ADO.NET side of things.

Though it wouldn't have taken any more effort. Now they apparently
didn't design it in, (otherwise the design would be open and anyone
would be able to write a provider) so the design is targeted towards 1
db, which is IMHO odd as it doesn't take that much effort in a system
which is already largely designed around providers anyway.
So is the Oracle data provider, but that's pretty well used. Ditto
NUnit :)

That's different. If a .NET developer wants to use Oracle, chances are
s/he won't use the MS oracle provider, simply because of it
limitations. So there's a necessity. The EDM additional download isn't
a core citizen inside vs.net 2008, nor in .NET 3.5. Therefore you won't
build momentum around it as it would have had when it was released WITH
vs.net 2008 and .net 3.5.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

(I hope I've actually finished all the sentences in here. I've been
called away *loads* of times when writing this.)

Frans Bouma said:
sure, but you have to realize that. It's not always obvious. One could
argue that you have to know what everything ends up in but I find that
an excuse: the expression trees created are rather big sometimes and
sometimes different than you'd expect. A developer of code is living on
the level of C#, not on the level of expression trees and lambdas if
s/he uses native C# code. It's the same as knowing which IL is being
produced or which x86 code is being produced by the jit. I don't care,
and I also don't want to NEED to care, because if I need to care, the
abstraction level I'm living at is a facade, and there's no difference
with C++ with inline asm.

I don't think the developer should need to know the details of which
expression trees are generated, but I *do* think they need to know that
query expressions implicitly use lambda expressions. I also think they
need to know that lambda expressions are converted to either delegate
instances or expression trees depending on the situation. In other
words, I'd expect someone to know that:

from data in SomeSource
where data.SomeCondition
select data.SomeProjection

is translated into

SomeSource.Where(data => data.SomeCondition)
.Select(data => data.SomeProject)

If they know that much, then they know where the lambda expressions
are, and therefore which variables will be captured.

If they don't know that much, they shouldn't be using LINQ except to
find out what it's all about - in other words, to learn the above.
I don't see why one has to understand which parts are lambda's in the
expression trees (!) and which aren't if NO statements written use real
lambda's, all code is written using C# code, no lambdas in sight.

I didn't say which parts are lambdas in the expression trees. I said
which parts are lambdas in the query expressions. There's a *huge*
difference between query expressions and expression trees.

And lambdas *are* part of C# code, as are query expressions - so
someone using C# 3 and LINQ ought to know about both.
Also, my example of using the same variable in the where and in the
Take method doesn't show me why one is updated at execution time and
the other one isn't updated: why aren't BOTH updated? Because one is
translated into a lamdba expression in an expression tree under the
hood? Why do I even need to know that it is translated into an
expression tree? If I have to, it's a leaky abstraction.

It's not translated into a lambda expression in an expression tree -
it's translated into a lambda expression which is then converted into
an expression tree. It's very important to understand that the
translation is going on, because otherwise you won't understand
*anything* about what's going on.

Now, notice that the conditions/projections etc themselves aren't being
evaluated at the point of the query declaration - so why should the
values of variables within those conditions/projections be evaluated?
*That's* what I would find inconsistent.
query expressions aren't hard in general, but the details will kill a
lot of dreams, but that's MS' problem.

The thing is though that you can teach a C# developer how queries
work, and you don't have to educate them with how expression trees
work. That's also info not NEEDED to write correct queries on the
abstraction level of C#.

Which is why I didn't claim that they should know all about expression
trees. They need to know the broad concept, but not the details. They
*do* need to know about query expressions though, and the translations
which are performed.
I used the SAME variable in two different places in the query. One
gets updated, the other one isn't when the variable changes value.
Sorry, but that's inconsistent behavior, and the reason is actually
irrelevant, because at the level of abstraction where the code is
written, there ARE NO expression trees and the query didn't contain any
lambdas, these are only created at runtime when the query is executed
and an expression tree is created.

Again I think you're confusing lamdba expressions with expression
trees. The expression trees are created at runtime, but the lambda
expressions are there at compile time, after the compiler has
translated the query expression as shown earlier.
That tree is often different than
what you've written in code. Relevant info? Why? Why does someone
writing linq queries have to think about expression trees? If that's
required info, why is this abstraction leaking into the level of C#'s
abstraction?

They don't need to think too deeply about the details of the expression
tree - but they *do* need to know that they're basically taking a
shortcut to writing lambda expressions.
I looked up the (almost non-existend) docs about CompiledQuery but it
didn't tell me a lot of info. For example: is this compiled query
always usable, no matter what the provider is? No idea.

It's a LINQ-to-SQL specific feature, as I'd expect it to be. The
details of when a session is used, when the query is translated into
SQL etc is, and should be IMO, provider-specific.
If you want to kill your DB's performance, you should do that. :)
Keeping open a cursor means you keep open a resultset on the server.
That takes resources. If your resultset is pretty big, it can eat more
resources than you want to give up for a longer period of time.

I don't think the database should have to keep the whole evaluated
result set - or even match all the rows - unless you've asked it for
the ordering. After all, that's why you can't ask a result set for its
count without either explicitly using COUNT or basically fetching it
all.

However, I'm happy to accept your word for it being a bad idea.

Because entity identity is verifyable when the data is read from the
db. (PK). So you can limit on teh client if you have to, by reading
enough rows till you're done. It's slightly slower but it's a way to
solve it.

Northwind: employee 1:n order. If you want all employees who have an
order filed for customers from the UK, you could do: (I use '*' for
simplicity here)

select e.*
from employees e inner join orders o on
e.EmployeeID = o.EmployeeID
inner join customers c
on o.CustomerID = c.CustomerID
where c.Country = 'UK'

Though, you'll get a lot of duplicates. So you apply distinct. But
that's not possible. So you have to filter on the client. You can,
because the PK identity of the entity instance (== the data!) is
available. No O/R mapper should give up with such a silly query.

Here's my LINQ query:

var query = from employee in context.Employees
join order in context.Orders
on employee equals order.Employee
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select employee;

query = query.Distinct();


and here's the generated SQL:

SELECT DISTINCT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate], [t0].
[HireDate], [t0].[Address], [t0].[City], [t0].[Region], [t0].
[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
CONVERT(VarBinary(MAX),[t0].[Photo]) AS [Photo], CONVERT(NVarChar(MAX),
[t0].[Notes]) AS [Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
INNER JOIN [dbo].[Orders] AS [t1]
ON [t0].[EmployeeID] = [t1].[EmployeeID]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID] WHERE [t2].[Country] = @p0

So LINQ to SQL didn't give up on it at all.

It's doing the DISTINCT on the database, and that wasn't even *trying*
to tweak it. Now, I wouldn't like to swear that SQL server will realise
that EmployeeID is distinct for distinct employees, so it only needs to
look at that part, but I'd certainly hope so.
What's better though is that you can also use subqueries to avoid the
duplicates:

select *
from employees where employeeID IN
(
select employeeID from
orders where CustomerID in
(
select customerID from customers
where country = 'UK'
)
)

No duplicates, no distinct needed. If you look closely, the execution
plan is the same on most db's.

The estimated execution plan from join version is very different, but I
don't know which would actually take longer.

However, let's see how close we can get to your query. This has the
same kind of feeling:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.Employee).Contains(employee)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
[t0].[Photo], [t0].[Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
LEFT OUTER JOIN [dbo].[Employees] AS [t3]
ON [t3].[EmployeeID] = [t1].[EmployeeID]
WHERE ([t3].[EmployeeID] = [t0].[EmployeeID])
AND ([t2].[Country] = @p0)
)

Alternatively, we can explicitly use the employee ID:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.EmployeeID)
.Contains(employee.EmployeeID)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
[t0].[Photo], [t0].[Notes], [t0].[ReportsTo], [t0].[PhotoPath]
FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
WHERE ([t1].[EmployeeID] = ([t0].[EmployeeID]))
AND ([t2].[Country] = @p0)
)

That latter one looks pretty reasonable to me.
This is what I meant with tweakability below. In Linq I can't specify
to use the subquery train, I have to use joins, or rely on the provider
to be gentle for me. However, that last bit isn't possible: the o/r
mapper then has to know the db statistics about the sizes of the data
in the DB.

In what way have I not tweaked the query to use the subquery way of
doing things with the LINQ queries above?
A good query system allows this kind of simple tweakability.

Which I've just proved LINQ to SQL has. I should point out that those
were just the first three ways of doing it that I thought of. I haven't
actually tried to optimise it particularly.
Still, I'd have liked if they would have spend more time on this. They
could have added more standard elements but decided not to.

Well, I've shown that they've got DISTINCT in there. I know about the
LEFT OUTER JOIN issue, and I agree it's a shame. What else do you miss?
You tell that to that team of DBAs which refuse to run your software
on their many TB big databases because they queries are too slow.

I have no problem with tweaking queries which need tweaking. If they
start claiming that *every* query, even the simplest "SELECT (fields)
FROM (table) WHERE (some simple condition)" needs extra tweaking, I'd
certainly need some evidence.
If you want to use an O/R mapper as a developer, and you find a team
of DBAs on the other side of the table and they refuse to accept the
fact that the queries are now generated by a program, you really have
to have your act together and proof that your queries are fast and
flexible to the schema and size of the table data, otherwise you're off
to hammering out stored proc call code all day.

Indeed. Good job that LINQ to SQL is reasonably tweakable then, isn't
it? And also that it lets you execute custom SQL when you need to.
This isn't something I cooked up just to be cocky. Many times we've
received emails from developers who wanted to use an o/r mapper and
they had to convince their boss and DBAs that the SQL produced is fast,
that the queries are tunable/tweakable so the DBAs will be happy.

Yes, I've no problem with that.
If the O/R mapper doesn't allow flexibility in that area, the o/r
mapper isn't going to be used a lot in the enterprise area where tables
can have millions of rows and each join has to be done with care.

Yup, and I reckon LINQ to SQL is *reasonably* flexible.
So, how is Linq supporting tweakability here? Does it for example
offer a simple IN subquery element, so the DEVELOPER can tweak the
code, based on the DBA's advice? Not really.

Well, the "Contains" is the broad equivalent of "IN" here. I don't know
how much difference there is between "WHERE EXISTS" and "IN",
admittedly - but I'd hope not a lot.
So the DEVELOPER, when
asked by the DBA if a costly join operation can be changed to a
subquery using query or vice versa (as they have both sweetspots), can
only answer: No, I can't do that. 10 to 1 the DBA will then say a
stored proc will be used instead.

Except that I've shown that the developer can offer a range of options
- and that was without trying hard.
That leaves the smaller simple stuff for the o/r mapper, and keeps in
place the myth that stored procs are the way to go when it comes to
serious data-access. While it's unnecessary, a query system should be
flexible enough to offer these kind of tweaks.
Absolutely.


if you have to branch out to custom code, independence is gone for
100%.

Not at all. If I can do almost all my work using the same query
expressions for everything, but need just a few places where I tweak
the SQL or the query expression to satisfy a particular database, the
independence of the bulk of the work is still very useful. Likewise if
I can apply the same query expressions in LINQ to SQL as in LINQ to
Objects for a large proportion of my work, but tweak some of the LINQ
to SQL where necessary - I'm still gaining broad readability and
consistency, IMO.
That said, I don't think it's possible to create a 100% independent
query system. It just has to clear that for the people who think that
Linq WILL bring you that independent system, it's a facade, there is no
such thing as an independent system: you ALWAYS will have o/r mapper
specific code in your application, unless you abstract away everything,
which also has a (sometimes big) pricetag.

Yup, totally agreed. The abstraction will *always* be leaky - otherwise
we wouldn't need to look at the SQL, after all.

That doesn't mean there isn't a big benefit, of course.
[ADO.NET Entities]
I will be very disappointed if they don't go for a multi-db
design.

One reason I think they'll move it towards an approach which might
offer multi-db design but that's totally in the hands of 3rd
parties is that their original design, where the ado.net provider
had little to do to get things done has been changed to make it a
lot of work to get things done for the ado.net provider, which
means that the 3rd party ado.net provider has to implement a lot of
code to work with the EDM.

That's fairly reasonable - it's good to let the third parties make
their own providers work as well as possible.

though if it takes a lot of work, it will take a long time before open
source databases for example have implemented a provider. These things
aren't simple.

Possibly - I guess it depends if they see a big benefit. Mono had a
working implementation of C# 2 features long before .NET 2.0 was
actually released :) I know that's a simpler scenario, but it shows
that open source folks can certainly work quickly when they see the
benefit.
Though it wouldn't have taken any more effort. Now they apparently
didn't design it in, (otherwise the design would be open and anyone
would be able to write a provider) so the design is targeted towards 1
db, which is IMHO odd as it doesn't take that much effort in a system
which is already largely designed around providers anyway.

It would have taken more effort, in the same way that there are all the
different dialects in NHibernate. You've got to work out how to do
paging for each database, "LIKE" queries for each database (Hibernate 3
used to get this wrong, btw - it didn't escape '%' when it was part of
a LIKE query; I don't know if this has been fixed) etc.

It would certainly have been nice though.
That's different. If a .NET developer wants to use Oracle, chances are
s/he won't use the MS oracle provider, simply because of it
limitations. So there's a necessity. The EDM additional download isn't
a core citizen inside vs.net 2008, nor in .NET 3.5. Therefore you won't
build momentum around it as it would have had when it was released WITH
vs.net 2008 and .net 3.5.

It won't have as much momentum, no - but it could still have easily
enough to make it viable and compelling for third parties.
 
F

Frans Bouma [C# MVP]

Jon said:
I don't think the developer should need to know the details of which
expression trees are generated, but I do think they need to know that
query expressions implicitly use lambda expressions.

It's not a given there are lambda expressions involved. If hte
provider is implemented as extension methods on IQueryable, you don't
have an expression tree. (some linq providers do this)

Also, the lambda expressions and the expression tree are a conversion
of what's written, so the inner workings of the abstraction and
therefore NOT RELEVANT. If these ARE relevant, the abstraction is
useless.
I also think
they need to know that lambda expressions are converted to either
delegate instances or expression trees depending on the situation. In
other words, I'd expect someone to know that:

from data in SomeSource
where data.SomeCondition
select data.SomeProjection

is translated into

SomeSource.Where(data => data.SomeCondition)
.Select(data => data.SomeProject)

that's not trivial.

var q = from c in nw.Customers
from o in nw.Orders
select c;

into what is this translated? Do you know? If not, why is it necessary
to know how it is constructed under the hood, as the code above is an
abstraction level ABOVE the expression trees!

(btw you can't write a join with extension methods if you don't define
the intermediate classes up front, as you need anonymous classes you
have to REFER to later on ;) )

If they know that much, then they know where the lambda expressions
are, and therefore which variables will be captured.

but that's an abstraction level BELOW where the code is written at. If
that info from abstraction level n is REQUIRED to work at abstraction
level n+1, then abstraction level n+1 is leak, and useless.
If they don't know that much, they shouldn't be using LINQ except to
find out what it's all about - in other words, to learn the above.

I find that pretty bold, if you ask me. It's like you have to know to
what IL statements a C# construct is compiled to to be able to write
proper C#. If that's required, the C# abstraction level is bogus
I didn't say which parts are lambdas in the expression trees. I said
which parts are lambdas in the query expressions. There's a huge
difference between query expressions and expression trees.

And lambdas are part of C# code, as are query expressions - so
someone using C# 3 and LINQ ought to know about both.

sure, but which parts of a query are translated to which expressions
in a tree (as I said: I didn't use any lambda at all! )and which
fragments become lambda's is NOT relevant, as the code in the example
is an abstraction level above the lambdas.
It's not translated into a lambda expression in an expression tree -
it's translated into a lambda expression which is then converted into
an expression tree. It's very important to understand that the
translation is going on, because otherwise you won't understand
anything about what's going on.

As I said above: that's not always the case: if you implement
IQueryable extension methods, no tree is created, as the tree creation
is done by extension methods on Queryable.

The conversion to Expression<Func<... >> is done by the compiler but
that's below the abstraction level of the code, and thus irrelevant.

Btw, I simply said the lambda ends up in the expression tree. I know
the order, the inner workings aren't new to me as writing a linq
provider requires that knowledge, unfortunately.
Now, notice that the conditions/projections etc themselves aren't
being evaluated at the point of the query declaration - so why should
the values of variables within those conditions/projections be
evaluated? That's what I would find inconsistent.

Erm... the inconsistency is in the fact that the parameter is used at
two different spots in the same query, however one spot gets evaluated
at execution time (where x==var) and the other one at query creation
time (the .Take(var) )

this creates inconsistency, as changing var changes the query at 1
spot, not at two spots.

As I said earlier: there's a technical explanation, but that doesn't
explain WHY this is changing only at one spot, semantically speaking,
i.e.: from the user (developer)'s pov: s/he works at the C# code level.
The stuff beneath that level is irrelevant, due to the abstraction.
However with this, it WILL be relevant.

People can jump up and down and say I'm stupid, but this IS
inconsistent.
Which is why I didn't claim that they should know all about
expression trees. They need to know the broad concept, but not the
details. They do need to know about query expressions though, and the
translations which are performed.

erm... why do they need to know about translations being performed
under the level of the abstraction they're operate on?

It's critical that people operating at abstraction level n+1 don't
need to know the details of abstraction level n. If I write:
where c.Foo == bar
select c;

I don't need to know what gets translated into what. Not only is this
NOT trivial, it's also irrelevant, as it's not at the abstraction level
I'm operating on.

If I would have written:
o.Where(c=>c.Foo==bar).Select(c=>c);

it's a different thing, as I formulate a different statement: I use a
lambda, and thus operate on a different abstraction level. Make no
mistake: creating a query language is damn hard, and people shouldn't
take things for granted easily. I definitely find that Linq is flawed
in this case, as it simply provides an abstraction level which isn't
there, as it's leaky.
Again I think you're confusing lamdba expressions with expression
trees.

Jon, please, I do know the difference.
The expression trees are created at runtime, but the lambda
expressions are there at compile time, after the compiler has
translated the query expression as shown earlier.

Yes I know, but as it's stuff created by the compiler, I don't care,
as stuff created by the compiler is out of my reach: why should I care
about stuff created by the compiler, if I operate on a level above a
compiler? If the output of a compiler is relevant for me on the
abstraction level above the output of the compiler, something is
SERIOUSLY wrong here.
They don't need to think too deeply about the details of the
expression tree - but they do need to know that they're basically
taking a shortcut to writing lambda expressions.

why?
Aren't they writing C# code? They're not taking shortcuts, they're
writing C# code. I fail to see why this is relevant.
Because entity identity is verifyable when the data is read from
the db. (PK). So you can limit on teh client if you have to, by
reading enough rows till you're done. It's slightly slower but it's
a way to solve it.

Northwind: employee 1:n order. If you want all employees who have
an order filed for customers from the UK, you could do: (I use '*'
for simplicity here)

select e.*
from employees e inner join orders o on
e.EmployeeID = o.EmployeeID
inner join customers c
on o.CustomerID = c.CustomerID
where c.Country = 'UK'

Though, you'll get a lot of duplicates. So you apply distinct. But
that's not possible. So you have to filter on the client. You can,
because the PK identity of the entity instance (== the data!) is
available. No O/R mapper should give up with such a silly query.

Here's my LINQ query:

var query = from employee in context.Employees
join order in context.Orders
on employee equals order.Employee
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select employee;

query = query.Distinct();


and here's the generated SQL:

SELECT DISTINCT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate], [t0].
[HireDate], [t0].[Address], [t0].[City], [t0].[Region], [t0].
[PostalCode], [t0].[Country], [t0].[HomePhone], [t0].[Extension],
CONVERT(VarBinary(MAX),[t0].[Photo]) AS [Photo],
CONVERT(NVarChar(MAX), [t0].[Notes]) AS [Notes], [t0].[ReportsTo],
[t0].[PhotoPath] FROM [dbo].[Employees] AS [t0]
INNER JOIN [dbo].[Orders] AS [t1]
ON [t0].[EmployeeID] = [t1].[EmployeeID]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID] WHERE [t2].[Country] = @p0

So LINQ to SQL didn't give up on it at all.

ah, and now on sqlserver 2000.

*poof*.
It's doing the DISTINCT on the database, and that wasn't even trying
to tweak it. Now, I wouldn't like to swear that SQL server will
realise that EmployeeID is distinct for distinct employees, so it
only needs to look at that part, but I'd certainly hope so.

no :) DISTINCT is a row-wide DISTINCT. It's not Oracle which can do
per column distinct operations ;)
The estimated execution plan from join version is very different, but
I don't know which would actually take longer.

it depends on the depth of the tree, if you have 1 subquery the
execution plans are the same here on sqlserver 2000, but it can differ,
which is why it's important you have the choice.
However, let's see how close we can get to your query. This has the
same kind of feeling:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.Employee).Contains(employee)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone],
[t0].[Extension], [t0].[Photo], [t0].[Notes], [t0].[ReportsTo],
[t0].[PhotoPath] FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
LEFT OUTER JOIN [dbo].[Employees] AS [t3]
ON [t3].[EmployeeID] = [t1].[EmployeeID]
WHERE ([t3].[EmployeeID] = [t0].[EmployeeID])
AND ([t2].[Country] = @p0)
)

Alternatively, we can explicitly use the employee ID:

var query = from employee in context.Employees
where (from order in context.Orders
join customer in context.Customers
on order.Customer equals customer
where customer.Country=="UK"
select order.EmployeeID)
.Contains(employee.EmployeeID)
select employee;

which generates this:

SELECT [t0].[EmployeeID], [t0].[LastName], [t0].[FirstName],
[t0].[Title], [t0].[TitleOfCourtesy], [t0].[BirthDate],
[t0].[HireDate], [t0].[Address], [t0].[City], [t0].[Region],
[t0].[PostalCode], [t0].[Country], [t0].[HomePhone],
[t0].[Extension], [t0].[Photo], [t0].[Notes], [t0].[ReportsTo],
[t0].[PhotoPath] FROM [dbo].[Employees] AS [t0]
WHERE EXISTS(
SELECT NULL AS [EMPTY]
FROM [dbo].[Orders] AS [t1]
INNER JOIN [dbo].[Customers] AS [t2]
ON [t1].[CustomerID] = [t2].[CustomerID]
WHERE ([t1].[EmployeeID] = ([t0].[EmployeeID]))
AND ([t2].[Country] = @p0)
)

That latter one looks pretty reasonable to me.

Interesting! The exists query isn't optimal, but it's better than
nothing. Hopefully they'll document this properly.
In what way have I not tweaked the query to use the subquery way of
doing things with the LINQ queries above?

It didn't occur to me to write it that way. I didn't think of
..Contains().

Take it from me, who has written a query system which appears logical
to me, but not always to others: it's not straightforward unless it's
plain sql. So this is an element to document in full otherwise the
masses will ask this question time and time again. The .Contains()
isn't obvious IMHO but once you know it exists, it is a way to optimize
the query indeed. Good point. :)
Well, I've shown that they've got DISTINCT in there. I know about the
LEFT OUTER JOIN issue, and I agree it's a shame. What else do you
miss?

distinct doesn't work on sqlserver 2000, as it uses a sqlserver 2005
trick. It's not a rant against linq to sql, it was an example how the
provider can throw things at you at runtime which are unexpected and
differ from provider to provider. Distinct did throw exceptions on me
with sqlserver 2000. Unexpected, btw, as I did expect they had solved
it for that db as well (it's not a db which isn't used a lot ... )
Indeed. Good job that LINQ to SQL is reasonably tweakable then, isn't
it? And also that it lets you execute custom SQL when you need to.

Sure, it makes things less cumbersome. I don't particular mind Linq to
Sql btw, I used that as an example, it's how the developer is able to
specify a query and KNOW the outcome, or in other words: is not running
into abstraction layer misery. So deterministically write X and get X.
Well, the "Contains" is the broad equivalent of "IN" here. I don't
know how much difference there is between "WHERE EXISTS" and "IN",
admittedly - but I'd hope not a lot.

EXISTS often requires making the inner query a co-related subquery,
but that's up to the o/r mapper.
Except that I've shown that the developer can offer a range of
options - and that was without trying hard.

Ok. I didn't see the Contains link (pun intended ;)) but it's now
clear the flexibility to some extend is there.

It's a delegate matter though, and my remarks shouldn't be seen as a
bash of linq to sql as it's the only publically known full
implementation of linq at the moment, and all I wanted is to illustrate
that the material the developer has to work with is at abstraction
layer n+1 and it might be problematic when problems occur at
abstraction layer n. I call it a delegate matter because it's a balance
act: on one hand you want to abstract away the low-level messing with
databases and simply decide that for the developer and create the query
for the developer so s/he doesn't have to think about details. However
on the other hand you don't want to decide too much so the fine details
aren't reachable as in THIS case, the fine details might be of upmost
importancy.

Personally I find the query language too far away from the database
mechanism, but it's a tradeoff, as it's a generic language also to be
used on in-memory collections and xml, where other things are more
important.
Yup, totally agreed. The abstraction will always be leaky - otherwise
we wouldn't need to look at the SQL, after all.

That doesn't mean there isn't a big benefit, of course.

that's to be seen. A lot of people will simply follow and do what MS
tells them. Some will wonder what is gained by using an abstraction
layer that doesn't really make things less cumbersome to use. I mean:
there are still issues with runtime exceptions based on queries which
can't be created. Those aren't caught at compile time. What's different
(over exxageration) with a system where, put it bluntly, sql is placed
in strings inside code or in a config file? Then things break at
runtime too.

It's not really an easy sell, as it's a step forward in many areas but
also a step back in others (or not really a step forward)

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
It's not a given there are lambda expressions involved. If hte
provider is implemented as extension methods on IQueryable, you don't
have an expression tree. (some linq providers do this)

Well, two things here:

1) There will always be an expression tree if you implement IQueryable
2) Even if you're using LINQ to objects
Also, the lambda expressions and the expression tree are a conversion
of what's written, so the inner workings of the abstraction and
therefore NOT RELEVANT. If these ARE relevant, the abstraction is
useless.

No, they're not the inner workings of the abstraction. They're part of
the C# language spec in terms of how a query expression is compiled.

Let me make this very clear: when I say "lambda expression" I *don't*
mean System.Linq.Expressions.LambdaExpression I mean an expression such
as
x => x+1
or
(int x, int y) => x+y

That's the meaning of "lambda expression" in the context of C#.

Now, if you can provide a C# query expression which doesn't involve a
translation which uses lambda expressions, I'll be impressed.
that's not trivial.

Where did I say it's trivial?
var q = from c in nw.Customers
from o in nw.Orders
select c;

into what is this translated? Do you know?

From memory, I believe it's something along the lines of:

nw.Customers.SelectMany (c => nw.Orders,
(c, o) => c);

If there were a "where" clause or anything else between the "from" and
the "select" then it would be more complicated, with a transparent
identifier involved.

Now, that may not be *exactly* right, but:
a) I know which parts are used as lambda expressions
b) If I want to know the exact details, I could look it up in the C#
spec (or a good book :)
If not, why is it necessary
to know how it is constructed under the hood, as the code above is an
abstraction level ABOVE the expression trees!

Again, expression trees are in some ways irrelevant to this - the point
is that it's translated into a *lambda* epxression.
(btw you can't write a join with extension methods if you don't define
the intermediate classes up front, as you need anonymous classes you
have to REFER to later on ;) )

Not sure what you mean there.
but that's an abstraction level BELOW where the code is written at. If
that info from abstraction level n is REQUIRED to work at abstraction
level n+1, then abstraction level n+1 is leak, and useless.

The abstraction level is C#. The "query expression" to "method call
with lambda expressions" translation is at the C# level of abstraction,
which is why it's defined by the C# spec.
I find that pretty bold, if you ask me. It's like you have to know to
what IL statements a C# construct is compiled to to be able to write
proper C#. If that's required, the C# abstraction level is bogus

No - IL isn't specified by the C# spec, but the translation of query
expressions is.

It's like knowing that Nullable<int> and int? mean the same thing, or
that System.Int32 and int mean the same thing, or that foreach calls
GetEnumerator and the IEnumerable members (as well as Dispose at the
end).

These are some of the rules which govern the language, and C#
developers should be aware of them.
sure, but which parts of a query are translated to which expressions
in a tree (as I said: I didn't use any lambda at all! )and which
fragments become lambda's is NOT relevant, as the code in the example
is an abstraction level above the lambdas.

It certainly *is* relevant, as it's part of the specified behaviour of
C# and it affects behaviour. How can it *not* be relevant?

The exact nature of how lambda expressions are converted into
expression trees is not covered in the spec, but the translation from
query expression into non-query expression form certainly is.
As I said above: that's not always the case: if you implement
IQueryable extension methods, no tree is created, as the tree creation
is done by extension methods on Queryable.

No, individual expression trees are built by the compiler (or rather,
the compiler inserts code to build the expression trees) and then the
Queryable extension methods combine the expression trees together.

Look at the extension methods in Queryable: they all take expression
trees as their parameters. How are you going to call them if you don't
create expression trees yourself? You could pass in null, I suppose,
but that wouldn't be terribly useful.
The conversion to Expression<Func<... >> is done by the compiler but
that's below the abstraction level of the code, and thus irrelevant.

The details of that conversion are below the abstraction level of C#,
but the fact that the conversion is performed *is* part of the
abstraction level of C#.
Btw, I simply said the lambda ends up in the expression tree. I know
the order, the inner workings aren't new to me as writing a linq
provider requires that knowledge, unfortunately.

Then I don't see why you're getting some of the details wrong above.
You keep referring to lambda expressions as if they only exist in
expression trees, whereas I'm talking about the lambda expressions
which are part of the specified translation of query expressions.
Erm... the inconsistency is in the fact that the parameter is used at
two different spots in the same query, however one spot gets evaluated
at execution time (where x==var) and the other one at query creation
time (the .Take(var) )

Yes - one is part of a query expression, in a clause which is
translated into a lambda expression, and one is just a parameter in a
method call.

It's exactly the same as this sort of situation:

List<int> list = new List<int> { 1, 2, 3 };

int i = 0;

int index = list.FindFirst (i,
delegate (int y) { return y==i; });

Again, the variable "i" is used twice - in one place it's evaluated as
part of calling the method, whereas in the delegate it's evaluated
within the method itself.
this creates inconsistency, as changing var changes the query at 1
spot, not at two spots.

That's only the same type of inconsistency as passing one parameter by
ref and another by value, however. The behaviour is well defined at the
C# abstraction level.
As I said earlier: there's a technical explanation, but that doesn't
explain WHY this is changing only at one spot, semantically speaking,
i.e.: from the user (developer)'s pov: s/he works at the C# code level.
The stuff beneath that level is irrelevant, due to the abstraction.
However with this, it WILL be relevant.

If the developer works at the C# level, they should understand C# as a
language. The fact that one of your variables is being used as part of
a lambda expression (not necessarily part of an expression tree) is
clearly part of the spec.
People can jump up and down and say I'm stupid, but this IS
inconsistent.

I'm afraid I'll continue to disagree with you.
erm... why do they need to know about translations being performed
under the level of the abstraction they're operate on?

C# *is* the level of abstraction they're operating on, and the
translation is part of that level.
It's critical that people operating at abstraction level n+1 don't
need to know the details of abstraction level n. If I write:
where c.Foo == bar
select c;

I don't need to know what gets translated into what. Not only is this
NOT trivial, it's also irrelevant, as it's not at the abstraction level
I'm operating on.

It's defined in the C# spec, and therefore I believe it's at the right
abstraction level.

In particular, if a developer doesn't know about the translation, what
*do* you expect them to know about? Obviously you can't just type in
any combination of words and get a valid C# program - how much of C#'s
rules do you expect people to know? Would you expect them to know, for
instance, that in the expression "a ? b : c" either b or c is
evaluated, but not both?

If you *do* expect people to know that, why should they also not have
to know about what query expressions are about? If they don't need to
know the translation, how *should* people understand query expressions?
Or do you think they should just type keywords at random until
something compiles?
If I would have written:
o.Where(c=>c.Foo==bar).Select(c=>c);

it's a different thing, as I formulate a different statement: I use a
lambda, and thus operate on a different abstraction level.

Well, it's a different thing because your query expression wouldn't
include the Select(c=>c) part, but other than that they're *not*
different, and I believe it's important that people understand that.
Make no
mistake: creating a query language is damn hard, and people shouldn't
take things for granted easily. I definitely find that Linq is flawed
in this case, as it simply provides an abstraction level which isn't
there, as it's leaky.

All abstractions are leaky, IMO. All ORMs are leaky - are they
automatically flawed too?
Jon, please, I do know the difference.

I'd hope so, but it doesn't seem that way from how you use them. Lambda
expressions aren't "created at runtime" - they're present in the
*compile-time* translation of query expression to non-query expression.
Yes I know, but as it's stuff created by the compiler, I don't care,
as stuff created by the compiler is out of my reach: why should I care
about stuff created by the compiler, if I operate on a level above a
compiler?

You need to know what the compiler *must* do in terms of the C# spec.
You don't need to know how it achieves that goal.
If the output of a compiler is relevant for me on the
abstraction level above the output of the compiler, something is
SERIOUSLY wrong here.

It's not at the level of the output of the compiler: *that* is beyond
the remit of the C# spec. The query expression translation *is* within
the remit of the C# spec.
why?
Aren't they writing C# code? They're not taking shortcuts, they're
writing C# code. I fail to see why this is relevant.

They're taking shortcuts in the same way that writing int? instead of
System.Nullable<System.Int32> is taking a shortcut - and I'd certainly
expect people to know that.

They're writing expressions which are clearly defined in the spec to be
used as part of lambda expressions. Whether those lambda expressions
are then converted into expression trees or simple delegate instances
depends on what you're querying, but the fact that they're used as part
of lambda expressions is undeniable; it's not part of a C# compiler's
implementation details, it's part of the spec it has to follow.


ah, and now on sqlserver 2000.

*poof*.

Hmm... what happens, out of interest? I don't have SQL server 2000
available here. And did you run the SQL, or the LINQ query? I don't
know whether LINQ to SQL knows what kind of database it's talking to at
all.
no :) DISTINCT is a row-wide DISTINCT. It's not Oracle which can do
per column distinct operations ;)

I don't know whether we've crossed wires or not. I'm aware that it's
distinct at a "whole result" level, but SQL Server *should* (IMO) be
smart enough to know that it can work out the distinctness of the whole
result just from the EmployeeID.

The EmployeeID is unique, and all the results are based on the Employee
table. Therefore if it sees an EmployeeID it's already returning, it
knows that all the rest of the results for that row will match the row
it's already returning. Likewise it knows if it sees an EmployeeID that
it *hasn't* already seen, that's a "new row to return" by definition.

Does that make my meaning any clearer? Or did you understand me the
first time and I just didn't understand your reply? :)
it depends on the depth of the tree, if you have 1 subquery the
execution plans are the same here on sqlserver 2000, but it can differ,
which is why it's important you have the choice.
Sure.



Interesting! The exists query isn't optimal, but it's better than
nothing. Hopefully they'll document this properly.

I'm not sure whether I'd expect them to document all the potential
twists and turns involved in converting a LINQ query into a SQL query.
I suspect it would be unreadable anyway, and people will do what I did:
experiment and look at the SQL.

It would be good to give some hints though :)
It didn't occur to me to write it that way. I didn't think of
.Contains().

Take it from me, who has written a query system which appears logical
to me, but not always to others: it's not straightforward unless it's
plain sql.

Isn't that tantamount to saying, "there's no point in letting people
write queries in anything other than SQL"? I think it's absolutely
*vital* that they're easily able to see the SQL generated, of course.
So this is an element to document in full otherwise the
masses will ask this question time and time again. The .Contains()
isn't obvious IMHO but once you know it exists, it is a way to optimize
the query indeed. Good point. :)

I think it's reasonably obvious if you come at it from the "what LINQ
query operators have I got available?" side of things rather than the
"here's the SQL I want to create".

One point: if you do a "Contains" on something like an integer array,
it translates that into IN.
distinct doesn't work on sqlserver 2000, as it uses a sqlserver 2005
trick. It's not a rant against linq to sql, it was an example how the
provider can throw things at you at runtime which are unexpected and
differ from provider to provider. Distinct did throw exceptions on me
with sqlserver 2000. Unexpected, btw, as I did expect they had solved
it for that db as well (it's not a db which isn't used a lot ... )

Yes, it is a bit surprising. What's the SQL Server 2005 trick in
question? I thought DISTINCT worked fine in SQL Server 2000 - or is
that only for single-value-select queries? (i.e. a sequence of single
values).
Sure, it makes things less cumbersome. I don't particular mind Linq to
Sql btw, I used that as an example, it's how the developer is able to
specify a query and KNOW the outcome, or in other words: is not running
into abstraction layer misery. So deterministically write X and get X.

I don't see how you could expect that to happen without people even
knowing the C# language well enough to know which parts of query
expressions are translated into lambda expressions, and which are
evaluated immediately.

I would say that you can have a leaky abstraction which doesn't
necessarily imply misery. More work than a perfectly leak-free
abstraction, of course, but they're imaginary IMO :)
EXISTS often requires making the inner query a co-related subquery,
but that's up to the o/r mapper.

But at the server side, should the query optimiser not be able to work
out that they mean the same thing (in cases such as this)?
Ok. I didn't see the Contains link (pun intended ;)) but it's now
clear the flexibility to some extend is there.
Cool.

It's a delegate matter though, and my remarks shouldn't be seen as a
bash of linq to sql as it's the only publically known full
implementation of linq at the moment, and all I wanted is to illustrate
that the material the developer has to work with is at abstraction
layer n+1 and it might be problematic when problems occur at
abstraction layer n.

Well, again I believe that some things which you regard as beneath the
abstraction layer of C# are well within that layer, but even so I agree
that the abstraction will leak.

However, I'd be very surprised to *ever* see *any* ORM where that's not
the case. You've said how you've got customers who have to change their
queries in order to meet the SQL specified by the DBAs - isn't that the
same kind of thing?
I call it a delegate matter because it's a balance
act:

Just to clarify your language - do you mean a *delicate* matter? It's a
somewhat important distinction given that "delegates" mean something
rather different in this context :) I'm not trying to criticise your
English for the sake of it - just trying to make sure we don't talk at
cross-purposes.
on one hand you want to abstract away the low-level messing with
databases and simply decide that for the developer and create the query
for the developer so s/he doesn't have to think about details. However
on the other hand you don't want to decide too much so the fine details
aren't reachable as in THIS case, the fine details might be of upmost
importancy.
Agreed.

Personally I find the query language too far away from the database
mechanism, but it's a tradeoff, as it's a generic language also to be
used on in-memory collections and xml, where other things are more
important.

I would personally have been really unhappy if it had been more biased
towards SQL at the cost of being natural for in-memory objects. I
believe the importance of LINQ to Objects has been underplayed - I
perform queries, orderings, projections etc on in-memory collections
just as often as I do against databases. Everyone seems focused on LINQ
to SQL, but I'm looking forward to more readable code for in-memory
objects.
that's to be seen. A lot of people will simply follow and do what MS
tells them. Some will wonder what is gained by using an abstraction
layer that doesn't really make things less cumbersome to use. I mean:
there are still issues with runtime exceptions based on queries which
can't be created. Those aren't caught at compile time. What's different
(over exxageration) with a system where, put it bluntly, sql is placed
in strings inside code or in a config file? Then things break at
runtime too.

Because it catches *more* stuff at compile-time, and uses language
which is *more* familiar to developers. It doesn't need to be perfect -
just better than the alternative.

I've written a lot of LINQ to SQL queries recently, and far, far more
of them have worked first time after getting rid of compiler errors
than would have been the case if I'd just been writing raw SQL.
It's not really an easy sell, as it's a step forward in many areas but
also a step back in others (or not really a step forward)

I don't accept that it's a step back - it's not preventing you from
doing anything you might have done before. It won't help everywhere,
but I believe it will help in enough places to make it a big win,
especially when you consider LINQ to Objects (and XML etc) as well as
LINQ to SQL.
 
J

Jon Skeet [C# MVP]

<snip>

Quick question to the crowd - is anyone else reading this thread any
more?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top