Wondering how to make my search faster?

B

brett

I have a program written in VB.NET using the 3.5 framework.

What the program does:
Searches a list of folders on a remote server for all files with
the .PST extension and reports it back in a text box list and also
outputs to a csv file.

My question:

There are approx 800,000 files that are to be searched through to get
my listing. The way that the program is working so far is really
slow, about 4 hours to complete. The following is a bit of code that
does my searching. How can i make this faster?

Private Sub search_loop(ByVal sdir As String)
Try
For Each fname As String In IO.Directory.GetFiles(sdir)
total_files += 1

Dim d3 As New SetTextCallback(AddressOf filesSearched)
Me.Invoke(d3, New Object() {total_files.ToString})

If fname.EndsWith("pst") Then
psts_found += 1

Dim d4 As New SetTextCallback(AddressOf pstsFound)
Me.Invoke(d4, New Object() {psts_found.ToString})

Dim NewText As String = fname
' Check if this method is running on a different
thread
' than the thread that created the control.
If Me.pst_found_box.InvokeRequired Then
' It's on a different thread, so use Invoke.
Dim d2 As New SetTextCallback(AddressOf
SetText2)
Me.Invoke(d2, New Object() {[NewText] + vbCr})
Else
' It's on the same thread, no need for Invoke.
Me.pst_found_box.AppendText([NewText] + vbCr)
End If
Else
Dim NewText As String = fname
' Check if this method is running on a different
thread
' than the thread that created the control.
If Me.pst_status_box.InvokeRequired Then
' It's on a different thread, so use Invoke.
Dim d As New SetTextCallback(AddressOf
SetText)
Me.Invoke(d, New Object() {[NewText] + vbCr})
Else
' It's on the same thread, no need for Invoke.
Me.pst_status_box.AppendText([NewText] + vbCr)
End If
End If
Next

For Each subdir As String In IO.Directory.GetDirectories
(sdir)
search_loop(subdir)
Next
Catch ioex As System.UnauthorizedAccessException
Dim d5 As New SetTextCallback(AddressOf SetText)
Me.Invoke(d5, New Object() {ioex.ToString + vbCr})
Catch generatedExceptionName As Exception
Dim d6 As New SetTextCallback(AddressOf SetText)
Me.Invoke(d6, New Object()
{generatedExceptionName.ToString + vbCr})
End Try
End Sub
 
C

Cor Ligthert[MVP]

Brett,

You know that using extra threats can make the throughput time faster, but
definily makes the total processing time in a serial process longer.

This searching is a serial process. (You cannot read two file names on one
disk at the same time)

Cor
 
R

Robert

I have a program written in VB.NET using the 3.5 framework.

What the program does:
Searches a list of folders on a remote server for all files with
the .PST extension and reports it back in a text box list and also
outputs to a csv file.
My question:

There are approx 800,000 files that are to be searched through to get
my listing. The way that the program is working so far is really
slow, about 4 hours to complete. The following is a bit of code that
does my searching. How can i make this faster?

Run your prog on the server. You will be sending a lot of file metadata
over the wire.

On the server open a command prompt.
In it try Dir *.pst /B /S > PSTFiles.Txt

This is the speed of light for you..
Maybe you can just parse the above file, and be done.

You may also want to look into the new PowerShell betas. It allows you to
transparently run cmdlets on a remote server, and pull the results back.
This would involve only one check of your security credentials.
With your current implementation, you may be getting dinged on every
dir/subdir/file.

More threads might help, if your server has a decent RAID array.

Doing the math, 800k/4*60*60 =55 files per second.
Even with NTFS and fat metadata of say 1k/ file, this is
VERY low disk thruput.
Something else is the bottleneck. Maybe a chatty file sharing protocol,
or maybe security checks, etc.
 
B

brett

What do you guys think about taking the total listing of first level
folders ,[c:\userfolders\username] username in this case, and then
sorting that into two lists usernames that start with A-M and N-Z.
Taking both of those lists and performing the search on each list at
the same time using threads. This would accomplish the thought of
performing more than one search at the same time but the overall
outcome make take longer since i am sorting before i am searching.

Robert,
I completely agree with you that our thruput sucks! You may be
correct in the thinking that since I am performing this action from a
remote machine to the SAN that every file i search would be first
authenticating my credentials, searching, going to the next file,
authenticating and searching and so on.

Thank you guys for your input.
I appreciate being able to ping people that have an idea of what i am
talking about instead of trying to talk to someone in the office that
has no clue.

-brett
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top