Jon said:
I'd be very surprised to see such a database, personally, as least
outside the embedded or "hobby" market.
In multi-branch joins with expressions other than simple field
compares, it can become very awkward to optimize the query, especially
if one side (or both!) is for example a derived table (subquery). You'll
then see that it's not as optimal as you'd have hoped it to be.
The main thing is: if you want the RDBMS to perform an equ-join or a
left join, specify it as such and it will do so. Specifying a query with
different constructs and then relying on the optimizer to make the best
of it isn't optimal and will likely fail in some (and least acceptable)
situations.
Sqlserver's optimizer is more advanced than the rest out there,
although Oracle is catching up. On Oracle and especially DB2 you'll
still see many different execution plans with the different join types
especially with multi-branch joins and the more advanced where clauses.
Though there are still differences in performance. Take these two:
select c.*
from customers c cross join orders o
cross join employees e cross join [order details] od
where
c.CustomerId = o.CustomerId
and
e.EmployeeId = o.EmployeeId
and o.OrderId = od.OrderId
vs.
select c.*
from customers c inner join orders o
on c.CustomerId = o.CustomerId
inner join employees e on e.EmployeeId = o.EmployeeId
inner join [order details] od on o.OrderId = od.OrderId
THe server trace shows that the duration of the query first query is
much longer than the second query.
It also can be that with more explicit joins and
explicit where clauses, the optimizer gives up and the execution plans
will differ. For example, in some queries, moving the where clause to
the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid
The second query has less reads than the first query.
Again, I'd expect a decent optimiser to be able to work out the
equivalent between them. That's the nice thing about set operations -
there's so much that an optimiser can do.
till the optimizer can't decide what to do because the # of options
grows too large.
Well you can easily add where clauses to each effective side of a join
in LINQ:
from c in customers
where c.Country='USA'
inner join (from o in OrderDetails where o.TotalCost > 100m)
on c.CustomerId = o.CustomerId
'inner join' ?
I was more thinking about 'on c.CustomerId = o.CustomerId &&
o.OrderDate > myOrderDate'
in left/right join situations (which are utterly stupidly formulated in
linq) it's sometimes essential that the predicate is formulated with the
join statement, to prevent row exclusion in some edge queries
(combinations of inner and left operations in a chain)
I thought it was worth pointing out as your earlier post could easily
give people the wrong impression:
<quote>
The cross-join execution path is often less ideal as the cross-join
action by itself is first handled, and THEN the where clause is handled
to filter out rows you don't want.
</quote>
That made it sound (to me at least) like the query engine *couldn't*
take any notice of the where clause until after the cross join had been
handled, due to the way LINQ was doing things - which just isn't true.
As it is, it's a matter of how good the query optimiser is.
The way how Linq is designed forces more cross-joins upon the database
than with equivalent more flexible query languages, as the developer
using Linq often has no other option but to use nested from clauses and
a subsequential where clause. This then results in suboptimal queries in
some cases in the DATABASE. I was talking about that database level, not
in linq providers
However, in a linq provider, it's undoable to optimize away a cross
join with a where into inner joins, so the developer has to hope the
optimizer produces a solid optimized query, which isn't something the
developer should hope for, if the alternative, inner/left joins, is
already available though not through linq.
We've had this debate elsewhere before, and I think we'll have to agree
to differ.
A sequence is a set with an order. As a sequence is a set, and the
different join operations are operations for sets, it's logical to
assume that a more specialized form of a set inherits these operations,
otherwise a sequence wouldn't be a set. Limiting operations on the
sequence therefore means limiting the freedom a developer has, which is
IMHO unnecessary and solely done to make things less complicated for
linq to objects.
I don't see that as a problem if it's generated by a provider which
knows it's talking to a SQL Server database. I'd expect any good
provider to use any useful vendor-specific extensions when it can.
It shows you never wrote a linq provider otherwise you'd know what you
wrote isn't doable in the case of cross apply: there's no alternative
for databases which don't support cross apply. So if you have a linq
query with two nested froms and a where with a count filter on one side
where the count is on the other side for example, cross apply is
necessary otherwise the query can't be formulated unless it's rewritten.
So if the user has to run the query on oracle, it can't do anything else
than simply give up.
FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website:
http://www.llblgen.com
My .NET blog:
http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------