File Search

G

Guest

Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method to
search files, it works fine for the directory which has smaller size, but it
takes 2 or 3 minutes to search the directory of size 8GB (say “C:â€). Is there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search in
recursive method.
 
M

Marc Gravell

I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.

Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array; I
don't know (without trying) how optimised this is; it *might* still be your
best option to iterate through the directories calling GetFiles;

The following takes about 8 seconds to search my c: drive:

static void Main(string[] args)
{
Console.WriteLine(CheckDir(new DirectoryInfo("c:\\")));
}

static int counter = 0;

static int CheckDir(DirectoryInfo di)
{

int count = 0;
try
{ // watch out for permission denied ;-p
count+=di.GetFiles("*.txt").Length;
foreach(DirectoryInfo subDi in di.GetDirectories())
{
count += CheckDir(subDi);
}
}
catch {} // lazy
if(++counter%100 == 0) Console.WriteLine(counter); // just to see it is
working
return count;
}
 
N

Nick Hounsome

Dhans said:
Framework version : 1.1

For a given directory (which may have subdirectories), I need to identify
the number of text files (*.txt). For this I have tried recursive method
to
search files, it works fine for the directory which has smaller size, but
it
takes 2 or 3 minutes to search the directory of size 8GB (say "C:"). Is
there
any other quicker method to identify whether the given file type (txt) is
available in the given directory, to avoid unnecessary sequential search
in
recursive method.

It MIGHT be worth eliminating the recursion.

Start with a list of directories (proably just 1) and an (empty) list for
the txt files.
While the directory list is not empty
{
take first directory off the list and examine its content
append directories to directory list
append txt files to file list
}
 
M

Marc Gravell

The same occurred to me; the natural choice here would be a
Queue<DirectoryInfo>, which obviously doesn't exist in 1.1 (as per OP)...
however, timings indicate no appreciable difference in performance between
recursive functions and queueing (some variance both up and down on repeated
tests, but within the same range indicating HDD is the cause). Clearly the
file-system is being the slow dog. Recursion isn't necessarily a sensible
option for horrendous trees, so might be worth refactoring as per Nick's
suggestion.

I ran the tests outside of the debugger, which doubles the performance to
roughly 4.1s to scan my disk (over any implementation). My comparison also
highlighted that SearchOption.AllDirectories is not really a very good
option, as it breaks too easily with any permission denial (unless you are
sa, but of course we don't ever run as admin ;-p).

Code for 2.0 follows:

Queue<DirectoryInfo> queue = new Queue<DirectoryInfo>();
queue.Enqueue(di); // root of search
int files = 0;
while (queue.Count > 0) {
DirectoryInfo current = queue.Dequeue();
try { // watch out for permission denied ;-p
files += current.GetFiles(pattern).Length; // or put
into a List<FileInfo> or something
foreach (DirectoryInfo subDir in
current.GetDirectories()) {
queue.Enqueue(subDir);
}
} catch { } // lazy
}
return files;

Marc
 
G

Guest

Marc Gravell said:
I guess one question would be: how long does it take Windows to perform the
same search? If it takes about the same time you're probably not doing too
much wrong.

More or less mysearch take same time duration for a search as windows
takes.
Under 2.0 the GetFiles method can accept
System.IO.SearchOption.AllDirectories which recurses on your behalf, but to
be honest if all you want to do is count them I'm not even sure that this is
the best option, as this will end up returning a relatively big array;

No, I want the file names (fullpath) which matches the search criteria.
 
M

Marc Gravell

Ahh; you mislead me by saying "number of"... but never mind:

Try this; for me under 1.1 this takes 4 seconds to return the 6000+ dll
files on my c: drive (not including UI time to display them) - about 1/4 of
the Windows search time (use different command-line params to select the
root and pattern); what timings do you get with this? How many txt files /
folders are we talking? If the numbers are *very* high, then resizing the
array might be sucking some cycles, in which case eventing or custom
iterators might help...


using System;
using System.IO;
using System.Collections;

namespace ConsoleApplication3
{
/// <summary>
/// Summary description for Class1.
/// </summary>
class Program
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static int Main(string[] args)
{
try
{
DateTime start = DateTime.Now;
FileInfo[] files =GetFiles(args[0], args[1]);
DateTime stop = DateTime.Now; // stop now as have results
foreach(FileInfo file in files)
Console.WriteLine(file.FullName);
Console.WriteLine(files.Length);
Console.WriteLine(stop.Subtract(start).TotalMilliseconds);
return 0;
}
catch (Exception e)
{
Console.WriteLine(e);
return -1;
}

}



static FileInfo[] GetFiles(string path, string pattern)
{
ArrayList queue = new ArrayList(), files = new ArrayList();
queue.Add(new DirectoryInfo(path));
while(queue.Count>0)
{
DirectoryInfo dir = (DirectoryInfo) queue[0];
queue.RemoveAt(0);

try // watch out for permission denied ;-p
{
files.AddRange(dir.GetFiles(pattern));
queue.AddRange(dir.GetDirectories());
}
catch {} // lazy
}
return (FileInfo[]) files.ToArray(typeof(FileInfo));
}
}
}
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top