Why is Linq slower?

P

Paul

The reason FindAll was faster probably was because it uses a for-loop instead
of a foreach-loop (I guess its because the foreach requires that extra call
for GetEnumerator() -method). So if I switch to a for-loop in my original
foreach-based code, I get around 1 800 000 ticks. So the good old reliable
for-loop might be the choice for those hardcode fps-game-programmers (for the
ones who dare to come out of the C++ closet that is). For the rest of us, I
guess it couldn't matter less :)
 
P

Paul

That's pretty tricky without having complete source code. Give us a
proper test harness and we can experiment appropriately.

Sure, as you can see i have isolated the important parts.
Paste in a new Console-app:

class Program
{
private static List<ConferenceRoom> rooms = new List<ConferenceRoom>
{
new ConferenceRoom{SeatingCapacity = 3},
new ConferenceRoom{SeatingCapacity = 5},
new ConferenceRoom{SeatingCapacity = 7},
new ConferenceRoom{SeatingCapacity = 9},
new ConferenceRoom{SeatingCapacity = 12}
};

static void Main(string[] args)
{
DateTime start = DateTime.Now;

for(int i = 0; i < 1000000; i++)
ForWithIf(8);

Console.WriteLine((DateTime.Now - start).Ticks);

Console.Read();
}

//Ca: 2 500 000 ticks
public static List<ConferenceRoom> ForeachTest(int minimumSeatingCapacity)
{
List<ConferenceRoom> availableRooms = new List<ConferenceRoom>();

foreach (ConferenceRoom room in rooms)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
availableRooms.Add(room);
}

return availableRooms;
}

//Ca: 4 500 000 ticks
public static List<ConferenceRoom> LinqTest(int minimumSeatingCapacity)
{
return (from room in rooms
where room.SeatingCapacity >= minimumSeatingCapacity
select room).ToList();
}

//Ca: 2 400 000 ticks
public static List<ConferenceRoom> FindAllTest(int minimumSeatingCapacity)
{
return rooms.FindAll(delegate(ConferenceRoom room)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
return true;
return false;
});
}

//Ca: 1 800 000 ticks
public static List<ConferenceRoom> ForTest(int minimumSeatingCapacity)
{
List<ConferenceRoom> list = new List<ConferenceRoom>();
for (int i = 0; i < rooms.Count; i++)
{
if (rooms.SeatingCapacity >= minimumSeatingCapacity)
list.Add(rooms);
}
return list;
}
}

public class ConferenceRoom
{
private int seatingCapacity;

public int SeatingCapacity
{
get { return seatingCapacity; }
set { seatingCapacity = value; }
}
}
 
J

Jon Skeet [C# MVP]

The reason FindAll was faster probably was because it uses a for-loop instead
of a foreach-loop (I guess its because the foreach requires that extra call
for GetEnumerator() -method). So if I switch to a for-loop in my original
foreach-based code, I get around 1 800 000 ticks. So the good old reliable
for-loop might be the choice for those hardcode fps-game-programmers (for the
ones who dare to come out of the C++ closet that is). For the rest of us, I
guess it couldn't matter less :)

Alternatively, use foreach but don't call ToList() at the end of the
LINQ expression. That avoids making a copy to a new list at all, which
may well save rather a lot of time. Only a benchmark would tell, of
course :)

Jon
 
M

Marc Gravell

By my measurement, most of the LINQ time is in the ToList()... I
changed the return (for all) to be IEnumerable<ConferenceRoom>, and
ran with that:

ForeachTest: 1,376,752
LinqTest: 0,856,144
LinqListTest: 4,174,352
FindAllTest: 1,632,382
ForTest: 0,844,985

(where:


Paul said:
That's pretty tricky without having complete source code. Give us a
proper test harness and we can experiment appropriately.

Sure, as you can see i have isolated the important parts.
Paste in a new Console-app:

class Program
{
private static List<ConferenceRoom> rooms = new List<ConferenceRoom>
{
new ConferenceRoom{SeatingCapacity = 3},
new ConferenceRoom{SeatingCapacity = 5},
new ConferenceRoom{SeatingCapacity = 7},
new ConferenceRoom{SeatingCapacity = 9},
new ConferenceRoom{SeatingCapacity = 12}
};

static void Main(string[] args)
{
DateTime start = DateTime.Now;

for(int i = 0; i < 1000000; i++)
ForWithIf(8);

Console.WriteLine((DateTime.Now - start).Ticks);

Console.Read();
}

//Ca: 2 500 000 ticks
public static List<ConferenceRoom> ForeachTest(int
minimumSeatingCapacity)
{
List<ConferenceRoom> availableRooms = new List<ConferenceRoom>();

foreach (ConferenceRoom room in rooms)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
availableRooms.Add(room);
}

return availableRooms;
}

//Ca: 4 500 000 ticks
public static List<ConferenceRoom> LinqTest(int
minimumSeatingCapacity)
{
return (from room in rooms
where room.SeatingCapacity >= minimumSeatingCapacity
select room).ToList();
}

//Ca: 2 400 000 ticks
public static List<ConferenceRoom> FindAllTest(int
minimumSeatingCapacity)
{
return rooms.FindAll(delegate(ConferenceRoom room)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
return true;
return false;
});
}

//Ca: 1 800 000 ticks
public static List<ConferenceRoom> ForTest(int
minimumSeatingCapacity)
{
List<ConferenceRoom> list = new List<ConferenceRoom>();
for (int i = 0; i < rooms.Count; i++)
{
if (rooms.SeatingCapacity >= minimumSeatingCapacity)
list.Add(rooms);
}
return list;
}
}

public class ConferenceRoom
{
private int seatingCapacity;

public int SeatingCapacity
{
get { return seatingCapacity; }
set { seatingCapacity = value; }
}
}
 
M

Marc Gravell

darn... wrong key:

where:

public static IEnumerable<ConferenceRoom> LinqTest(int
minimumSeatingCapacity) {
return (from room in rooms
where room.SeatingCapacity >= minimumSeatingCapacity
select room);
}

public static IEnumerable<ConferenceRoom> LinqListTest(int
minimumSeatingCapacity) {
return (from room in rooms
where room.SeatingCapacity >= minimumSeatingCapacity
select room).ToList();
}

[Jon]>Only a benchmark would tell, of course :)
Get out of my mind! ;-p

Marc
 
M

Marc Gravell

(oops - this is actually a continuation of another branch in this
topic - but you get the idea...)
 
J

Jon Skeet [C# MVP]

(oops - this is actually a continuation of another branch in this
topic - but you get the idea...)

You want to bring continuations into the equation as well? Yikes!

;)

Jon
 
P

Paul

But without the ToList() the methods are not functional equal anymore. It
becomes unfair to compare them since the other methods have to go through and
test all 5 rooms 1 million times where the deferred execution one (LinkTest)
doesn't (until we eventually start iterating). Or am I missing something?

Marc Gravell said:
By my measurement, most of the LINQ time is in the ToList()... I
changed the return (for all) to be IEnumerable<ConferenceRoom>, and
ran with that:

ForeachTest: 1,376,752
LinqTest: 0,856,144
LinqListTest: 4,174,352
FindAllTest: 1,632,382
ForTest: 0,844,985

(where:


Paul said:
That's pretty tricky without having complete source code. Give us a
proper test harness and we can experiment appropriately.

Sure, as you can see i have isolated the important parts.
Paste in a new Console-app:

class Program
{
private static List<ConferenceRoom> rooms = new List<ConferenceRoom>
{
new ConferenceRoom{SeatingCapacity = 3},
new ConferenceRoom{SeatingCapacity = 5},
new ConferenceRoom{SeatingCapacity = 7},
new ConferenceRoom{SeatingCapacity = 9},
new ConferenceRoom{SeatingCapacity = 12}
};

static void Main(string[] args)
{
DateTime start = DateTime.Now;

for(int i = 0; i < 1000000; i++)
ForWithIf(8);

Console.WriteLine((DateTime.Now - start).Ticks);

Console.Read();
}

//Ca: 2 500 000 ticks
public static List<ConferenceRoom> ForeachTest(int
minimumSeatingCapacity)
{
List<ConferenceRoom> availableRooms = new List<ConferenceRoom>();

foreach (ConferenceRoom room in rooms)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
availableRooms.Add(room);
}

return availableRooms;
}

//Ca: 4 500 000 ticks
public static List<ConferenceRoom> LinqTest(int
minimumSeatingCapacity)
{
return (from room in rooms
where room.SeatingCapacity >= minimumSeatingCapacity
select room).ToList();
}

//Ca: 2 400 000 ticks
public static List<ConferenceRoom> FindAllTest(int
minimumSeatingCapacity)
{
return rooms.FindAll(delegate(ConferenceRoom room)
{
if (room.SeatingCapacity >= minimumSeatingCapacity)
return true;
return false;
});
}

//Ca: 1 800 000 ticks
public static List<ConferenceRoom> ForTest(int
minimumSeatingCapacity)
{
List<ConferenceRoom> list = new List<ConferenceRoom>();
for (int i = 0; i < rooms.Count; i++)
{
if (rooms.SeatingCapacity >= minimumSeatingCapacity)
list.Add(rooms);
}
return list;
}
}

public class ConferenceRoom
{
private int seatingCapacity;

public int SeatingCapacity
{
get { return seatingCapacity; }
set { seatingCapacity = value; }
}
}

 
J

Jon Skeet [C# MVP]

But without the ToList() the methods are not functional equal anymore. It
becomes unfair to compare them since the other methods have to go through and
test all 5 rooms 1 million times where the deferred execution one (LinkTest)
doesn't (until we eventually start iterating). Or am I missing something?

The important thing is to work out whether you really *need* a list.
Often you won't - you just need to iterate.

So yes, the benchmark for each should iterate over the filtered
results each time.

We should also test with a larger list - say 1000 elements, and
100,000 elements (with fewer iterations, of course), making sure that
a reasonable proportion (e.g. half) match the filter each time.

Jon
 
M

Marc Gravell

D'oh! Yes you are right.... deferred exectuion; the LINQ (without
ToList()) hasn't done anything!

OK, I've returned Count, using property on lists, and
Enumerable.Count() on the LINQ (w/o list), and added a checksum:

ForeachTest: 1,839,284 2000000
LinqTest: 3,210,923 2000000
LinqListTest: 3,919,258 2000000
FindAllTest: 1,342,625 2000000
ForTest: 0,850,867 2000000

So yes, it looks slower (approx factor of 2)... but even so, you are
talking 1s for 1M iterations, and it appears to scale linearly...

Not a great answer, I'll admit...

Marc
 
M

Marc Gravell

Often you won't - you just need to iterate.
But it helps if you *do* iterate... my mistake ;-/

I also ran some tests for Enumerable.Count vs doing the test and
incrementing an accumulator directly (foreach and indexer); the
results are again roughly a factor of 2 out, /probably/ largely due to
the overhead of calling the delegate.

Marc
 
M

Marc Gravell

OK; I did some more tests... the problem appears to not be LINQ as
much as virtual calls...

Your test rig works directly off the List<T> variable, so (since this
is sealed) it can jump the virtcall and go direct to the method. When
all you have is the interface (IList<T>, IEnumerable<T>, etc), then it
cannot do this.

I constructed another test rig just doing a filtered count (no LINQ)
using 10000 iterations over 10000 objects, with all permutations of:
* interface vs concrete List<T>
* foreach vs indexer
* inline test vs predicate

I haven't included the full rig code (quite long) - but I will keep it
for a little while if anybody wants to verify / etc.
Results are console, using stopwatch, with forced GC between runs,
release without debugger (last column is a checksum to prove all
equal):

ConcreteForeachInline 1439ms [39040000]
InterfaceForeachInline 2479ms [39040000]
ConcreteIndexerInline 678ms [39040000]
InterfaceIndexerInline 675ms [39040000]
ConcreteForeachPredicate 2212ms [39040000]
InterfaceForeachPredicate 3239ms [39040000]
ConcreteIndexerPredicate 1445ms [39040000]
InterfaceIndexerPredicate 1453ms [39040000]

This pretty much ties in with what we were seeing for LINQ, in
particular looking at the Foreach results (since LINQ-to-objects
primarily uses IEnumerable<T>), both for inline and predicate usage.

In reality, you are unlikely to ever face a scenario where this is the
actual bottleneck, but I'm glad I understand it a bit better. For
example, if this was critical you could write a few List<T>-specific
extension methods for Where etc, and still use LINQ without a single
change to your code (except perhaps an extra "using" statement). This
would work because of how the compiler works - but the problem would
be what to return from Where etc - if you return IEnumerable<T> you
are back in interface-ville, but you don't necessarily want the memory
overhead of filling another List. But to demonstrate I went for the
latter option and knocked a few together:

Std LINQ Count 4234ms [39040000]
List LINQ Count 2264ms [39040000]

Std LINQ Where ToList 7797ms [39040000]
List LINQ Where ToList 3160ms [39040000]

Does that fill in a few blanks?

Marc
 
M

Marc Gravell

List LINQ Count 2264ms [39040000]

I should clarify that this was a predicate count; the Count() without
predicate is a lot quicker; the standard LINQ implementation has the
good-sense to test for ICollection<T>, so the unfiltered Count() works
blindingly either way; not worth re-implementing...

Marc
 
M

Marc Gravell

Actually, a caffeine low meant I fluffed the test slightly
(InterfaceIndexer* was using List<T>); I'll try and get my act
together!

The good news is that the revised code further demonstrates this
point, highlighting that the difference also applies with indexers:

ConcreteForeachInline 1916ms [39040000]
InterfaceForeachInline 3297ms [39040000]
ConcreteIndexerInline 888ms [39040000]
InterfaceIndexerInline 1655ms [39040000]
ConcreteForeachPredicate 3022ms [39040000]
InterfaceForeachPredicate 4355ms [39040000]
ConcreteIndexerPredicate 1908ms [39040000]
InterfaceIndexerPredicate 2692ms [39040000]

For the record, I would still recommend coding to interfaces; it makes
the code far more reusable and open to refactoring.

I've treble checked my signatures this time:

static int ConcreteForeachInline(List<Foo> foos, int cutoff) {...}
static int InterfaceForeachInline(IEnumerable<Foo> foos, int cutoff)
{...}
static int ConcreteIndexerInline(List<Foo> foos, int cutoff) {...}
static int InterfaceIndexerInline(IList<Foo> foos, int cutoff) {...}
static int ConcreteForeachPredicate(List<Foo> foos, int cutoff) {...}
static int InterfaceForeachPredicate(IEnumerable<Foo> foos, int
cutoff) {...}
static int ConcreteIndexerPredicate(List<Foo> foos, int cutoff) {...}
static int InterfaceIndexerPredicate(IList<Foo> foos, int cutoff)
{...}

Marc
 
F

Frans Bouma [C# MVP]

Jon said:
The important thing is to work out whether you really need a list.
Often you won't - you just need to iterate.

Though in practise, you will return the list. The thing is: do you
want to pass the query around? Remember, the query can contain
expensive resources like a database connection, and you might not want
to expose that to a given tier, or even can't because the receiver is
on another machine.

FB
--
------------------------------------------------------------------------
Lead developer of LLBLGen Pro, the productive O/R mapper for .NET
LLBLGen Pro website: http://www.llblgen.com
My .NET blog: http://weblogs.asp.net/fbouma
Microsoft MVP (C#)
------------------------------------------------------------------------
 
J

Jon Skeet [C# MVP]

On Feb 1, 8:59 am, "Frans Bouma [C# MVP]"

Though in practise, you will return the list.

To what? It really depends on the precise nature of how this is being
used.
The thing is: do you
want to pass the query around? Remember, the query can contain
expensive resources like a database connection, and you might not want
to expose that to a given tier, or even can't because the receiver is
on another machine.

Well, in this case there's no indication that there's necessarily a
database ever involved, or a different tier. Yes, if you need to
return results to a different tier you're likely to need all the data,
although you may not need to copy it to an actual List<T> first.
Imagine building up some XML to return - you may well only need to
iterate through it to build up the XML - so creating a List<T> would
be a wasteful copy.

It's the classic performance answer, really: "it depends" (on just
about everything).

Jon
 
Top