SQL to Linq - Left, Right and Inner Joins

M

Michel Walsh

In fact, you don't do SQL-joins at all, not directly. What you do is using a
for-each based representation of the universe which will be TRANSLATED to an
expression tree which will be TRANSLATED to an SQL statement. At least, for
me, that helps to see the process as being a double translation, and to STOP
thinking in SQL sets, while using LINQ, but to then think in terms of
for-each sequences.

Side note about the supplied URL: an SQL join is not represented correctly
by a Venn diagram since, like an electronic Op-Amp, an SQL join can
selectively 'amplify' the number of rows from the original sets. If you make
an inner (equi) join between { a, a, b} and { a, a, c}, you end up with 5
rows, something the Venn diagram misses. The Venn diagrams are only
applicable if there is no 'duplicated' elements involved in common to both
sets ( { a, a, b} and { c, d, d} are ok for a Venn diagram, since no dup
appears in common to both sets).

So, for the sequence thing:
-------------
LINQ: from a in b

is to be compare to

foreach( a in b)

and will generate SQL like "FROM b AS a"
------------------

The sequence:

from a in b
from c in d

is a kind of imbricated:

foreach( a in b)
{
foreach ( c in d)

}

and could generate the SQL part:

FROM b AS a, d AS c
---------------------------



While SQL joins explanation (can) start from a cross join, LINQ joins start
form a group join.

from a in b
join
c in d
on
a.field equals c.otherField
into newSet

The second 'table' is introduced with the keyword join rather than with the
keyword from, as previously, (probably to allow) the on keyword introduces
the relation, always an equality (equals is a keyword).

There is no direct SQL equivalence for that. Also note that "a" variable
must be at the left of the word equals while "c" variable must be at the
right of that keyword. Literature sometimes speaks of left and right side
(of equals), but that has nothing to do with SQL LEFT JOIN or SQL RIGHT
JOIN.

The result is somehow described as: for each value from b, **link** to it
all the values from d satisfying the relation field = otherField. There is
no direct SQL statement for that since it is not targeting a 'rectangular'
representation of the data.



-------------------------------
To get the SQL INNER JOIN, removes the

into newSet

-------------------------------
To get the SQL LEFT outer join, we can expand the LINQ group join, but for
"a" values having no match, we introduce nulls on the unpreserved side, like
this:

from a in b
join
c in d
on
a.field equals c.otherField
into newSet
from x in newSet.DefaultIfEmpty()

newSet can be any name you want which do not collide with keyword or other
variables.

That is a little bit as saying: if a row in b has disappeared, re-introduce
it supplying null (default) for the unpreserved side, and now, the 'alias'
to use to refer to the unpreserved side is x.


------------------------------

There is no LINQ-Right outer join, or LINQ-full outer join.


--------------------------
You can use sub-query, here, in the SQL-select clause:


LINQ:
from a in b
select new { ..., someName = { from c in d ... select something} }

SQL:
SELECT... (SELECT something FROM d AS c ... ) AS someName

-------------------------

We can cascade the queries:

var query1 = from ... ;
var query2 = from query1 ... ;

which can be used too if you need complex outer join, for your SQL
statement.
------------------------------

For non-equi join, I suggest you start with a cross join and add the
required where clause. That won't do much for outer non equi join, though.

-------------------------------


You can use direct to SQL with ExecuteQuery<T> method accepting the SQL
statement as a string.

-------------------------------

There is much more to it.

-------------------------------


Note that I didn't include a SELECT clause in many of the examples: you have
to, in order to make a valid statement.



Hoping it may help,
Vanderghast, Access MVP
 
S

shapper

In fact, you don't do SQL-joins at all, not directly. What you do is using a
for-each based representation of the universe which will be TRANSLATED toan
expression tree which will be TRANSLATED to an SQL statement. At least, for
me, that helps to see the process as being a double translation, and to STOP
thinking in SQL sets, while using LINQ, but to then think in terms of
for-each sequences.

Side  note about the supplied URL: an SQL join is not represented correctly
by a Venn diagram since, like an electronic Op-Amp, an SQL join can
selectively 'amplify' the number of rows from the original sets. If you make
an inner (equi) join between { a, a, b} and { a, a, c}, you end up with 5
rows, something the Venn diagram misses. The Venn diagrams are only
applicable if there is no 'duplicated' elements involved in common to both
sets ( { a, a, b}  and { c, d, d}  are ok for a Venn diagram, since no dup
appears in common to both sets).

So, for the sequence thing:
-------------
LINQ:  from a in b

is to be compare to

foreach( a in b)

and will generate SQL  like   "FROM b AS a"
------------------

The sequence:

from a in b
    from c in d

is a kind of imbricated:

     foreach( a in b)
        {
        foreach ( c in d)

        }

and could generate the SQL part:

        FROM b AS a, d AS c
---------------------------

While SQL joins explanation (can) start from a cross join, LINQ joins start
form a group join.

from a in b
join
 c in d
on
 a.field equals c.otherField
into newSet

The second 'table' is introduced with the keyword join rather than with the
keyword from, as previously, (probably to allow)  the on keyword introduces
the relation, always an equality (equals is a keyword).

There is no direct SQL equivalence for that. Also note that "a" variable
must be at the left of the word equals while "c" variable must be at the
right of that keyword. Literature sometimes speaks of left and right side
(of equals), but that has nothing to do with SQL LEFT JOIN or SQL RIGHT
JOIN.

The result is somehow described as: for each value from b, **link** to it
all the values from d satisfying the relation field = otherField. Thereis
no direct SQL statement for that since it is not targeting a 'rectangular'
representation of the data.

-------------------------------
To get the SQL INNER JOIN, removes the

    into newSet

-------------------------------
To get the SQL LEFT outer join, we can expand the LINQ group join, but for
"a" values having no match, we introduce nulls on the unpreserved side, like
this:

from a in b
join
 c in d
on
 a.field equals c.otherField
into newSet
from x in newSet.DefaultIfEmpty()

newSet can be any name you want which do not collide with keyword or other
variables.

That is a little bit as saying:  if a row in b has disappeared, re-introduce
it supplying null (default) for the unpreserved side, and now, the 'alias'
to use to refer to the unpreserved side is x.

------------------------------

There is no LINQ-Right outer join, or LINQ-full outer join.

--------------------------
You can use sub-query, here, in the SQL-select clause:

LINQ:
from a in b
select new {  ..., someName = { from c in d ... select something}  }

SQL:
SELECT... (SELECT something FROM d AS c ... ) AS someName

-------------------------

We can cascade the queries:

var query1 = from ... ;
var query2 = from query1 ... ;

which can be used too if you need complex outer join, for your SQL
statement.
------------------------------

For non-equi join, I suggest you start with a cross join and add the
required where clause. That won't do much for outer non equi join, though..

-------------------------------

You can use direct to SQL with  ExecuteQuery<T> method accepting the SQL
statement as a string.

-------------------------------

There is much more to it.

-------------------------------

Note that I didn't include a SELECT clause in many of the examples: you have
to, in order to make a valid statement.

Hoping it may help,
Vanderghast, Access MVP

Michel,

It was a great help. I will try now ...

Thanks,
Miguel
 
F

Frans Bouma [C# MVP]

Keep in mind that:
var q = from a in ctx.A
from b in ctx.B
...

results in a cross-join which could be very inefficient.

FB

Michel said:
In fact, you don't do SQL-joins at all, not directly. What you do is using a
for-each based representation of the universe which will be TRANSLATED to an
expression tree which will be TRANSLATED to an SQL statement. At least, for
me, that helps to see the process as being a double translation, and to STOP
thinking in SQL sets, while using LINQ, but to then think in terms of
for-each sequences.

Side note about the supplied URL: an SQL join is not represented correctly
by a Venn diagram since, like an electronic Op-Amp, an SQL join can
selectively 'amplify' the number of rows from the original sets. If you make
an inner (equi) join between { a, a, b} and { a, a, c}, you end up with 5
rows, something the Venn diagram misses. The Venn diagrams are only
applicable if there is no 'duplicated' elements involved in common to both
sets ( { a, a, b} and { c, d, d} are ok for a Venn diagram, since no dup
appears in common to both sets).

So, for the sequence thing:
-------------
LINQ: from a in b

is to be compare to

foreach( a in b)

and will generate SQL like "FROM b AS a"
------------------

The sequence:

from a in b
from c in d

is a kind of imbricated:

foreach( a in b)
{
foreach ( c in d)

}

and could generate the SQL part:

FROM b AS a, d AS c
---------------------------



While SQL joins explanation (can) start from a cross join, LINQ joins start
form a group join.

from a in b
join
c in d
on
a.field equals c.otherField
into newSet

The second 'table' is introduced with the keyword join rather than with the
keyword from, as previously, (probably to allow) the on keyword introduces
the relation, always an equality (equals is a keyword).

There is no direct SQL equivalence for that. Also note that "a" variable
must be at the left of the word equals while "c" variable must be at the
right of that keyword. Literature sometimes speaks of left and right side
(of equals), but that has nothing to do with SQL LEFT JOIN or SQL RIGHT
JOIN.

The result is somehow described as: for each value from b, **link** to it
all the values from d satisfying the relation field = otherField. There is
no direct SQL statement for that since it is not targeting a 'rectangular'
representation of the data.



-------------------------------
To get the SQL INNER JOIN, removes the

into newSet

-------------------------------
To get the SQL LEFT outer join, we can expand the LINQ group join, but for
"a" values having no match, we introduce nulls on the unpreserved side, like
this:

from a in b
join
c in d
on
a.field equals c.otherField
into newSet
from x in newSet.DefaultIfEmpty()

newSet can be any name you want which do not collide with keyword or other
variables.

That is a little bit as saying: if a row in b has disappeared, re-introduce
it supplying null (default) for the unpreserved side, and now, the 'alias'
to use to refer to the unpreserved side is x.


------------------------------

There is no LINQ-Right outer join, or LINQ-full outer join.


--------------------------
You can use sub-query, here, in the SQL-select clause:


LINQ:
from a in b
select new { ..., someName = { from c in d ... select something} }

SQL:
SELECT... (SELECT something FROM d AS c ... ) AS someName

-------------------------

We can cascade the queries:

var query1 = from ... ;
var query2 = from query1 ... ;

which can be used too if you need complex outer join, for your SQL
statement.
------------------------------

For non-equi join, I suggest you start with a cross join and add the
required where clause. That won't do much for outer non equi join, though.

-------------------------------


You can use direct to SQL with ExecuteQuery<T> method accepting the SQL
statement as a string.

-------------------------------

There is much more to it.

-------------------------------


Note that I didn't include a SELECT clause in many of the examples: you have
to, in order to make a valid statement.



Hoping it may help,
Vanderghast, Access MVP


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
M

Michel Walsh

If used alone, yes, but with a WHERE clause, MS SQL Server optimizer should
recognize an (implicit) inner join and thus, limit the effective amount of
data transferred through the wire.

Vanderghast, Access MVP



By the way, a natural inner join between {a, a, b} and { a, a, c} returns
4 rows, not 5. My mistake.
 
F

Frans Bouma [C# MVP]

Michel said:
If used alone, yes, but with a WHERE clause, MS SQL Server optimizer should
recognize an (implicit) inner join and thus, limit the effective amount of
data transferred through the wire.

That's not what makes it inefficient. The cross-join execution path is
often less ideal as the cross-join action by itself is first handled,
and THEN the where clause is handled to filter out rows you don't want.
As a cross-join can spawn millions of tuples, which are often not
needed, the where clause will run slower than an ON clause in an
inner-join set operation.

It's not the data that's sent over the wire, it's the query execution
itself which makes it less ideal. That's also why I find it a big design
flaw in linq that the only operator in the linq's join clause is an
'equals' operator, which makes the production of left/right joins
awkward, yet these operations are often required. The alternative is a
nested from but that leads to lesser ideal queries than a left-join +
accompanying on clause.

FB
Vanderghast, Access MVP



By the way, a natural inner join between {a, a, b} and { a, a, c} returns
4 rows, not 5. My mistake.


--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
That's not what makes it inefficient. The cross-join execution path is
often less ideal as the cross-join action by itself is first handled,
and THEN the where clause is handled to filter out rows you don't want.

I'd certainly hope that the query optimiser would do the job there
though. For instance, if the where clause is on an indexed column I'd
be *hugely* surprised if the inner join were fully evaluated by SQL
server before looking at the index.

It's not like LINQ passes the join in one SQL statement and then
applies the where clause separately: the query optimiser has the full
query to work with.
 
A

Andrus

To get the SQL LEFT outer join, we can expand the LINQ group join, but for
"a" values having no match, we introduce nulls on the unpreserved side,
like this:

from a in b
join
c in d
on
a.field equals c.otherField
into newSet
from x in newSet.DefaultIfEmpty()

newSet can be any name you want which do not collide with keyword or other
variables.

That is a little bit as saying: if a row in b has disappeared,
re-introduce it supplying null (default) for the unpreserved side, and
now, the 'alias' to use to refer to the unpreserved side is x.

This looks to complicated. A better and simpler way for existing realtions
is to use chained properties

Northwind db = CreateDB();
var query = from e in db.Employees
select new {
Name = e.FirstName,
ReportsTo = e.ParentEmployee.FirstName
};
var list = query.ToList();

which generates left join clause automatically.

Andrus.
 
F

Frans Bouma [C# MVP]

Jon said:
I'd certainly hope that the query optimiser would do the job there
though. For instance, if the where clause is on an indexed column I'd
be *hugely* surprised if the inner join were fully evaluated by SQL
server before looking at the index.

In database land, 'hope' isn't a tool in your toolbox. In simple
queries like:
select distinct c.* from customers c cross join orders o
where c.CustomerId = o.Customerid

vs.
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId

the execution plan is the same. It could be that in database ABC this
isn't the case however. It also can be that with more explicit joins and
explicit where clauses, the optimizer gives up and the execution plans
will differ. For example, in some queries, moving the where clause to
the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'

select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid

The second query has less reads than the first query. The main trick is
that with more complicated joins, every next table to join with the set
is joined more efficiently if the set it has to be joined with is
smaller. So if you can already limit that set with the filter you'll
apply anyway, it will bring down the overall query burden. With simple
queries this might not be noticable, but these are fast anyway. It's the
large queries which are the point of interest as these will also take
the majority of the performance eaten by the application.

It's also the case that you can't mimic all queries with crossjoins +
wheres. For example, some LEFT join constructs are hard or impossible to
formulate with solely cross-joins and where clauses.
It's not like LINQ passes the join in one SQL statement and then
applies the where clause separately: the query optimiser has the full
query to work with.

That's not the point. The point is that Linq contains a set-theory
operator 'join' but has just 1 argument for it and has no freedom how to
formulate the predicate to use in the join of the two sets. I know linq
works with sequences and not sets, but it would have been way better if
'join' in Linq would have had normal predicate support and a join
argument, it wouldn't have hurt Linq in any way shape or form.

Btw, The cross joins in MS code often rely on a T-SQL specific
statement 'CROSS APPLY', which isn't a known statement in many other
databases.

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

Frans Bouma said:
In database land, 'hope' isn't a tool in your toolbox.

It's a reasonable one to suggest what you should then check. I happened
not to be in a situation where I could easily verify it, and I didn't
want to give the impression that I'd done so.

In a production environment, I *would* check it of course.
In simple
queries like:
select distinct c.* from customers c cross join orders o
where c.CustomerId = o.Customerid

vs.
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId

the execution plan is the same. It could be that in database ABC this
isn't the case however.

I'd be very surprised to see such a database, personally, as least
outside the embedded or "hobby" market.
It also can be that with more explicit joins and
explicit where clauses, the optimizer gives up and the execution plans
will differ. For example, in some queries, moving the where clause to
the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'

select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid

The second query has less reads than the first query.

Again, I'd expect a decent optimiser to be able to work out the
equivalent between them. That's the nice thing about set operations -
there's so much that an optimiser can do.
The main trick is that with more complicated joins, every next table
to join with the set is joined more efficiently if the set it has to
be joined with is smaller. So if you can already limit that set with
the filter you'll apply anyway, it will bring down the overall query
burden. With simple queries this might not be noticable, but these
are fast anyway. It's the large queries which are the point of
interest as these will also take the majority of the performance
eaten by the application.

Well you can easily add where clauses to each effective side of a join
in LINQ:

from c in customers
where c.Country='USA'
inner join (from o in OrderDetails where o.TotalCost > 100m)
on c.CustomerId = o.CustomerId
It's also the case that you can't mimic all queries with crossjoins +
wheres. For example, some LEFT join constructs are hard or impossible to
formulate with solely cross-joins and where clauses.
Agreed.


That's not the point.

I thought it was worth pointing out as your earlier post could easily
give people the wrong impression:

<quote>
The cross-join execution path is often less ideal as the cross-join
action by itself is first handled, and THEN the where clause is handled
to filter out rows you don't want.
</quote>

That made it sound (to me at least) like the query engine *couldn't*
take any notice of the where clause until after the cross join had been
handled, due to the way LINQ was doing things - which just isn't true.
As it is, it's a matter of how good the query optimiser is.
The point is that Linq contains a set-theory operator 'join' but has
just 1 argument for it and has no freedom how to formulate the
predicate to use in the join of the two sets. I know linq works with
sequences and not sets, but it would have been way better if 'join'
in Linq would have had normal predicate support and a join argument,
it wouldn't have hurt Linq in any way shape or form.

We've had this debate elsewhere before, and I think we'll have to agree
to differ.
Btw, The cross joins in MS code often rely on a T-SQL specific
statement 'CROSS APPLY', which isn't a known statement in many other
databases.

I don't see that as a problem if it's generated by a provider which
knows it's talking to a SQL Server database. I'd expect any good
provider to use any useful vendor-specific extensions when it can.
 
F

Frans Bouma [C# MVP]

Jon said:
I'd be very surprised to see such a database, personally, as least
outside the embedded or "hobby" market.

In multi-branch joins with expressions other than simple field
compares, it can become very awkward to optimize the query, especially
if one side (or both!) is for example a derived table (subquery). You'll
then see that it's not as optimal as you'd have hoped it to be.

The main thing is: if you want the RDBMS to perform an equ-join or a
left join, specify it as such and it will do so. Specifying a query with
different constructs and then relying on the optimizer to make the best
of it isn't optimal and will likely fail in some (and least acceptable)
situations.

Sqlserver's optimizer is more advanced than the rest out there,
although Oracle is catching up. On Oracle and especially DB2 you'll
still see many different execution plans with the different join types
especially with multi-branch joins and the more advanced where clauses.

Though there are still differences in performance. Take these two:

select c.*
from customers c cross join orders o
cross join employees e cross join [order details] od
where
c.CustomerId = o.CustomerId
and
e.EmployeeId = o.EmployeeId
and o.OrderId = od.OrderId

vs.
select c.*
from customers c inner join orders o
on c.CustomerId = o.CustomerId
inner join employees e on e.EmployeeId = o.EmployeeId
inner join [order details] od on o.OrderId = od.OrderId

THe server trace shows that the duration of the query first query is
much longer than the second query.
It also can be that with more explicit joins and
explicit where clauses, the optimizer gives up and the execution plans
will differ. For example, in some queries, moving the where clause to
the ON clause of a left join can increase performance. For example:
select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
inner join [order details] od on
o.OrderId = od.Orderid
where c.Country='USA'

select distinct c.* from customers c inner join orders o
on c.Customerid = o.CustomerId
and c.Country='USA'
inner join [order details] od on
o.OrderId = od.Orderid

The second query has less reads than the first query.

Again, I'd expect a decent optimiser to be able to work out the
equivalent between them. That's the nice thing about set operations -
there's so much that an optimiser can do.

till the optimizer can't decide what to do because the # of options
grows too large.
Well you can easily add where clauses to each effective side of a join
in LINQ:

from c in customers
where c.Country='USA'
inner join (from o in OrderDetails where o.TotalCost > 100m)
on c.CustomerId = o.CustomerId

'inner join' ? ;)

I was more thinking about 'on c.CustomerId = o.CustomerId &&
o.OrderDate > myOrderDate'

in left/right join situations (which are utterly stupidly formulated in
linq) it's sometimes essential that the predicate is formulated with the
join statement, to prevent row exclusion in some edge queries
(combinations of inner and left operations in a chain)
I thought it was worth pointing out as your earlier post could easily
give people the wrong impression:

<quote>
The cross-join execution path is often less ideal as the cross-join
action by itself is first handled, and THEN the where clause is handled
to filter out rows you don't want.
</quote>

That made it sound (to me at least) like the query engine *couldn't*
take any notice of the where clause until after the cross join had been
handled, due to the way LINQ was doing things - which just isn't true.
As it is, it's a matter of how good the query optimiser is.

The way how Linq is designed forces more cross-joins upon the database
than with equivalent more flexible query languages, as the developer
using Linq often has no other option but to use nested from clauses and
a subsequential where clause. This then results in suboptimal queries in
some cases in the DATABASE. I was talking about that database level, not
in linq providers

However, in a linq provider, it's undoable to optimize away a cross
join with a where into inner joins, so the developer has to hope the
optimizer produces a solid optimized query, which isn't something the
developer should hope for, if the alternative, inner/left joins, is
already available though not through linq.
We've had this debate elsewhere before, and I think we'll have to agree
to differ.

A sequence is a set with an order. As a sequence is a set, and the
different join operations are operations for sets, it's logical to
assume that a more specialized form of a set inherits these operations,
otherwise a sequence wouldn't be a set. Limiting operations on the
sequence therefore means limiting the freedom a developer has, which is
IMHO unnecessary and solely done to make things less complicated for
linq to objects.
I don't see that as a problem if it's generated by a provider which
knows it's talking to a SQL Server database. I'd expect any good
provider to use any useful vendor-specific extensions when it can.

It shows you never wrote a linq provider otherwise you'd know what you
wrote isn't doable in the case of cross apply: there's no alternative
for databases which don't support cross apply. So if you have a linq
query with two nested froms and a where with a count filter on one side
where the count is on the other side for example, cross apply is
necessary otherwise the query can't be formulated unless it's rewritten.
So if the user has to run the query on oracle, it can't do anything else
than simply give up.

FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
F

Frans Bouma [C# MVP]

Andrus said:
This looks to complicated. A better and simpler way for existing
realtions is to use chained properties

Northwind db = CreateDB();
var query = from e in db.Employees
select new {
Name = e.FirstName,
ReportsTo = e.ParentEmployee.FirstName
};
var list = query.ToList();

which generates left join clause automatically.

Only with nullable FKs

FB

--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

On Jul 7, 9:40 am, "Frans Bouma [C# MVP]"

<big snip>

Unsurprisingly I don't think we're ever going to agree on this. By the
way, you asked me to remind you not to reply to me as we never seem to
get very far...
        A sequence is a set with an order. As a sequence is a set, and the
different join operations are operations for sets, it's logical to
assume that a more specialized form of a set inherits these operations,
otherwise a sequence wouldn't be a set. Limiting operations on the
sequence therefore means limiting the freedom a developer has, which is
IMHO unnecessary and solely done to make things less complicated for
linq to objects.

As I said before when we had this discussion, allowing a simple
predicate takes an equijoin from O(n+m) to O(n*m) which is
unacceptable in my view.

If you want joins against arbitrary predicates, SelectMany is the
operator to use. How that's translated is up to the LINQ provider.
(I'll readily admit that it doesn't help for right joins.)
        It shows you never wrote a linq provider otherwise you'd know what you
wrote isn't doable in the case of cross apply: there's no alternative
for databases which don't support cross apply. So if you have a linq
query with two nested froms and a where with a count filter on one side
where the count is on the other side for example, cross apply is
necessary otherwise the query can't be formulated unless it's rewritten.
So if the user has to run the query on oracle, it can't do anything else
than simply give up.

So there are some LINQ queries which will only work on certain
databases. That doesn't surprise me, nor does it particularly
disappoint me. I'd hope that such queries are rare (otherwise they'd
be expressible in other SQL dialects) but I think it's acceptable.
Anyone approaching LINQ with a "write once run anywhere" mindset
(exclusing "test on all target platforms") *will* be disappointed. I
think such an expectation is pretty naive though.

Jon
 
M

Michel Walsh

In MS SQL Server, the two queries:


-----------------
SELECT au_fname, au_ord
FROM authors, titleauthor
WHERE authors.au_id=titleauthor.au_id
------------------

and

---------------------
SELECT au_fname, au_ord
FROM authors INNER JOIN titleauthor
ON authors.au_id=titleauthor.au_id
-----------------------

produce, both, the SAME query plan. Remember that SQL is not about HOW TO
get the result, but about to DESCRIBE the result you want. The HOW TO, the
imperative part, is ideally left to the database.


So, yes, if the query plan would use a cross join to solve the problem, that
would be inefficient, but as writing the description of what I want, there
is NO PROBLEM using a CROSS JOIN to do it, FOLLOWED by a WHERE clause.


When I said that SQL joins are probably easier to understand by STARTING
with cross joins, I meant as a mean to understand them, not HOW the database
should (would) do it. In fact, I am far from the first using that approach,
indeed, I 'borrow' the approach from Joe Celko. (The outer joins are then
explained by the re-injection of each row, from the preserved side, which
would have disappear.)


LINQ can thus allow easy non-equi-join if you start by describing a cross
join, and the use the WHERE clause to describe the condition that you would
normally put in the ON clause of the INNER join. The problem is for OUTER
NON EQUI join, only. In some cases, you can still make use of sub-queries.



Vanderghast, Access MVP
 
M

Michel Walsh

(...)
Btw, The cross joins in MS code often rely on a T-SQL specific statement
'CROSS APPLY'
(...)


Hum, are you sure? CROSS APPLY is allowing an ON clause based on the
evaluation of a 'function' returning one or multiple rows (the rows are
thus supplied by the function rather than by a db.table). Furthermore, only
with MS SQL Server 2005. It did not exist before 2005, while CROSS JOIN
existed long time before. So CROSS JOIN can hardly be said to be based on
CROSS APPLY.


You were probably having something else in mind.



Vanderghast, Access MVP
 
M

Michel Walsh

(...)
Though there are still differences in performance. Take these two:

select c.*
from customers c cross join orders o
cross join employees e cross join [order details] od
where
c.CustomerId = o.CustomerId
and
e.EmployeeId = o.EmployeeId
and o.OrderId = od.OrderId

vs.
select c.*
from customers c inner join orders o
on c.CustomerId = o.CustomerId
inner join employees e on e.EmployeeId = o.EmployeeId
inner join [order details] od on o.OrderId = od.OrderId

THe server trace shows that the duration of the query first query is much
longer than the second query.



The query plans are the same. The tracer may show that the second took less
time simply because some data was still in the data cache!

Look at the execution plans! they are both the same!



Vanderghast, Access MVP
 
M

Michel Walsh

There is only one table! There is no join, so no outer join either :)

You probably pasted the wrong code snippet.


Vanderghast, Access MVP
 
A

Andrus

There are two tables !

e.ParentEmployee !

There is join, left outer join in case of nullable fk as Frans responds.

I pasted correct snippet.

Andrus.
 
M

Michel Walsh

I only see one FROM clause, one table, aliased as e:


var query = from e in db.Employees
select new {
Name = e.FirstName,
ReportsTo = e.ParentEmployee.FirstName}



and e.ParentEmployee.FirstName sounds like part of an undefined relation (at
least, undefined in the code snippet). Surely you will agree that a join CAN
occur over tables which have no predefined relation between them.



Vanderghast, Access MVP
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top