How to search files for text string most efficiently?

  • Thread starter Thread starter Jim
  • Start date Start date
J

Jim

Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!

Jim
 
Hello,

I am working on a small windows application for a client, and as one of the
functions they want a search that will let them enter a search string, then
search a directory for all flies that contain that search string AND display
the lines that contain the search string.

They have windows ME, XP and 2000 systems.

Does anyone have any ideas as to the most efficient way to do this?

Also, if multiple directories are chosen, should threads be used for the
search operation?

Thanks!

Personnaly I would do it this way:

Each directory would be queried so as to load all filenames in one
array (I've never done that part so...)

Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

Error checking would be needed all the way through.
**********************************************************************
Richard Jalbert Programmer-Analyst (e-mail address removed)

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
**********************************************************************
 
Richard Jalbert said:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.
[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.
 
What do you define as a "small" file?

How would you get the line the occurrence is on to show it? Use the INSTR
to find the string, then find the prior CRLF, and next CRLF from that
position?

And, what about the threading portion of the question?

Jim

Herfried K. Wagner said:
Richard Jalbert said:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.
[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

If the files are "small", that's a good approach. If the files are large,
it's trickier, you'll have to read the file in chunks of a certain size
and then perform 'InStr', notice that you will have to check for
occurances that overlap the ends of two chunks separately.
 
Richard Jalbert said:
I am working on a small windows application for a client, and as one of
the
functions they want a search that will let them enter a search string,
then
search a directory for all flies that contain that search string AND
display
the lines that contain the search string.
[...]
Then using that array, get each filename, get and read by OPEN AS
BINARY each file, loading its content in a single one string buffer,
sized after getting the file size. on which you could do a "instr"
function.

If the files are "small", that's a good approach.

Is not the maximum size for a string buffer something like 0 to 2
billion characters?
If the files are large,

What would be a large file?
I have one that is 214 Megs (PI to a million place and I cannot open
it on my machine (I concaneted it from 20 smaller files))
it's trickier, you'll have to read the file in chunks of a certain size and
then perform 'InStr', notice that you will have to check for occurances that
overlap the ends of two chunks separately.

Overlap is easily checked by reading the first buffer then when
reading the second, back the byte pointer by at least the size of the
substring to be found.

One detail that was not stated: what is the substring is split by a
vbCRLF character. this mean they would have to be removed from the
file before doing the search, no ?


**********************************************************************
Richard Jalbert Programmer-Analyst (e-mail address removed)

Dogs have owners, cats have staff.

http://www3.sympatico.ca/richmann/
**********************************************************************
 
Richard Jalbert said:
Is not the maximum size for a string buffer something like 0 to 2
billion characters?

.... but your physical RAM is limited... ;-).
What would be a large file?
I have one that is 214 Megs (PI to a million place and I
cannot open it on my machine (I concaneted it from 20
smaller files))

That's a "large" file.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top