Out of curiosity: file count and total size from one LINQ query

H

Harlan Messinger

Playing with LINQ, for code to write out the number of files in a
directory as well as the total bytes in them, I got this:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

var files =
from path in filePaths
let file = new System.IO.FileInfo(path)
select file.Length;

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, files.Sum());


Is there some way to return both the file count and total bytes from the
query? I was thinking of something like

select new { count = 1, size = file.Length };

but then I don't have separate sequences to apply .Sum() to. Or is there
a way to do the rollups both within the query?
 
P

Peter Duniho

Playing with LINQ, for code to write out the number of files in a
directory as well as the total bytes in them, I got this:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

var files =
from path in filePaths
let file = new System.IO.FileInfo(path)
select file.Length;

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, files.Sum());


Is there some way to return both the file count and total bytes from the
query? I was thinking of something like

select new { count = 1, size = file.Length };

but then I don't have separate sequences to apply .Sum() to. Or is there
a way to do the rollups both within the query?

It's not really clear to me what you're specifically wanting here. You
already have the file count from the original call to GetFiles(). Why
would it be important to then recalculate that as part of the query?

For that matter, what's the point of the query? If all you're doing is
summing the enumeration, you can much more easily just use the Sum()
method directly:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, filesPaths.Sum(file => new
FileInfo(file).Length));

Was there some specific reason you wanted to end up with a collection of
integers, rather than just the final tally?

Pete
 
J

Jeroen Mostert

Harlan said:
Playing with LINQ, for code to write out the number of files in a
directory as well as the total bytes in them, I got this:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

var files =
from path in filePaths
let file = new System.IO.FileInfo(path)
select file.Length;

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, files.Sum());


Is there some way to return both the file count and total bytes from the
query?

There is no real reason to, as getting the length of the array is a
constant-time operation and using LINQ can only slow things down.

Incidentally, NET 4.0 happens to improve on this scenario (enumerating files
to get metadata) quite a bit. See http://msdn.microsoft.com/magazine/ee428166.
I was thinking of something like

select new { count = 1, size = file.Length };
That's one approach.
but then I don't have separate sequences to apply .Sum() to. Or is there
a way to do the rollups both within the query?

There is a way to do both rollups at once on the same sequence, but not in
the same query (unless you redundantly copy filePaths.Length to every
result, which is just cheating). It involves accumulating extra data along
the way (a typical functional programming technique).

var files =
...
select file.Length;

var fileTotals = files.Aggregate(
new { count = 0, size = 0L },
(c, f) => new { count = c.count + 1, size = c.size + f }
);


But don't use this in production code; using LINQ to "count" lengths this
way is inefficient and hardly intuitive (for example, the reason you need to
write "size = 0L" is dangerously subtle). Even if you don't have an array
and .Count() is an O(N) operation, it's still better to use a regular
for-loop and compute the aggregates within that. I can't think of many
scenarios with multiple aggregates where .Aggregate() would be easier to
read than the equivalent plain old for(each) loop.

Actually, .Aggregate() in general is too propeller-headed to be used in most
scenarios, it's just a way of saying "look ma, no loops!" The loops are
still there, of course, they're just under the covers.
 
H

Harlan Messinger

Peter said:
Playing with LINQ, for code to write out the number of files in a
directory as well as the total bytes in them, I got this:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

var files =
from path in filePaths
let file = new System.IO.FileInfo(path)
select file.Length;

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, files.Sum());


Is there some way to return both the file count and total bytes from
the query? I was thinking of something like

select new { count = 1, size = file.Length };

but then I don't have separate sequences to apply .Sum() to. Or is
there a way to do the rollups both within the query?

It's not really clear to me what you're specifically wanting here. You
already have the file count from the original call to GetFiles(). Why
would it be important to then recalculate that as part of the query?

As I said, out of curiosity. I'm experimenting with the capabilities of
LINQ. I happened to be playing with file aggregation when this trivial
version of the question came to me. A more substantive application would
be where you have an IEnumeration<MyClass> where MyClass has four
different integer properties, and I want the sum for each of them.
Obviously it's more efficient to iterate through the sequence once to
accomplish this than it is to iterate through the sequence four times.
For that matter, what's the point of the query? If all you're doing is
summing the enumeration, you can much more easily just use the Sum()
method directly:

string[] filePaths = System.IO.Directory.GetFiles("c:\\myfiles", "*.*",
System.IO.SearchOption.AllDirectories);

Console.WriteLine(
"Number of files: {0:0,0}. Total size: {1:0,0} bytes.",
filePaths.Length, filesPaths.Sum(file => new
FileInfo(file).Length));

Ah, OK, I haven't gotten familiar enough yet with the way lambda
expressions work with the aggregation functions. Nevertheless, what I
said above applies here too. If I had

class MyClass
{
public int Property1;
public int Property2;
public int Property3;
public int Property4;

public MyClass(int p1, int p2, int p3, int p4)
{
Property1 = p1;
Property2 = p2;
Property3 = p3;
Property4 = p4;
}
}

MyClass[] items = new MyClass[] {
new MyClass(1,2,3,4),
new MyClass(2,4,6,8),
new MyClass(1,4,9,16)
};

// OUTPUT: Totals are: 4, 10, 18, 28

Console.WriteLine("Totals are: {0}, {1}, {2}, {3}",
items.Sum(item => item.Property1),
items.Sum(item => item.Property2),
items.Sum(item => item.Property3),
items.Sum(item => item.Property4));

one wouldn't praise me for writing efficient code. Is there some means,
using LINQ or lambda expressions or extension methods or whatever, of
accumulating the sum for all six properties in a single pass through the
items array?
 
P

Peter Duniho

[...]
Console.WriteLine("Totals are: {0}, {1}, {2}, {3}",
items.Sum(item => item.Property1),
items.Sum(item => item.Property2),
items.Sum(item => item.Property3),
items.Sum(item => item.Property4));

one wouldn't praise me for writing efficient code. Is there some means,
using LINQ or lambda expressions or extension methods or whatever, of
accumulating the sum for all six properties in a single pass through the
items array?

Your lambda can be a block statement, in which you perform the necessary
aggregation of sums. But at that point, it's debatable whether there's
any value in doing that rather than just writing a plain "foreach" loop
and summing each total within. Especially since to actually do the
calculation, you'd need to force the LINQ result to be evaluated anyway
(which would involve some kind of enumeration of it, implicit or explicit).

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top