Web site spider

  • Thread starter Thread starter ruso
  • Start date Start date
R

ruso

i am writing a program web site spider i am getting all pages of a
site to local
after that what i want to do is that i have about 5000 keywords which
i want to search them in the website which download from a site.I am
doing this search by regexpression but it is slow is there any faster
search algorithm to suggest me.

Thanks
 
Ruso,

A regex is probably the fastest way. How large are the files, and are
you passing them as complete strings through the RegEx classes? Is there
any way you can break them up into smaller pieces?

The Match method only takes a string, so its probably the loading of all
the content into the string which is causing slowdown (all of the strings
contents loaded into memory).

If you can break the files into smaller pieces, then it would help, as
you wouldn't have to load such large strings into memory.

Hope this helps.
 
how can i take the strings into memory

Nicholas Paldino said:
Ruso,

A regex is probably the fastest way. How large are the files, and are
you passing them as complete strings through the RegEx classes? Is there
any way you can break them up into smaller pieces?

The Match method only takes a string, so its probably the loading of all
the content into the string which is causing slowdown (all of the strings
contents loaded into memory).

If you can break the files into smaller pieces, then it would help, as
you wouldn't have to load such large strings into memory.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

ruso said:
i am writing a program web site spider i am getting all pages of a
site to local
after that what i want to do is that i have about 5000 keywords which
i want to search them in the website which download from a site.I am
doing this search by regexpression but it is slow is there any faster
search algorithm to suggest me.

Thanks
 
Using a MemoryStream ;)

--
Mark Harris
Head Developer
GameHost CP

how can i take the strings into memory

Nicholas Paldino said:
Ruso,

A regex is probably the fastest way. How large are the files, and
are
you passing them as complete strings through the RegEx classes? Is
there
any way you can break them up into smaller pieces?

The Match method only takes a string, so its probably the loading
of all
the content into the string which is causing slowdown (all of the
strings
contents loaded into memory).

If you can break the files into smaller pieces, then it would help,
as
you wouldn't have to load such large strings into memory.

Hope this helps.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

ruso said:
i am writing a program web site spider i am getting all pages of a
site to local
after that what i want to do is that i have about 5000 keywords which
i want to search them in the website which download from a site.I am
doing this search by regexpression but it is slow is there any faster
search algorithm to suggest me.

Thanks
 
Back
Top