Holding session in HTTPWebRequest

C

chrisplanters

Hi

I am doing a small screen scraping project and the requirement is to
scrape a website which shows search results.

I am able to scrape the results of the first search page successfully
but since search results are in paging, ( divided into many pages ) I
am unable to send a second request with page number within my first
HTTPweb request.

I am a bit new to this type of work.
Can anyone tell me how this can be achieved, How can I scrape all
search results in all pages.

Regards
Chris
 
K

Kevin Spencer

Usually, there is a link to further results, and more often than not, it
contains a query string that identifies which "page" it is linking to. If
you can identify the link, you can use it to request each successive "page"
of results.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
 
C

chrisplanters

Hi Kevin,

I am able to identify the link and calling the second page also, but
somehow not getting the response. The next page always gives response
as no search results available. However when I do it manually there
are results.

So I assume somehow session or some other state persistence drops in
between first and second request.

Thanks
Chris
 
K

Kevin Spencer

The Session ID will be in a non-persistent cookie passed back to the
original page.

--
HTH,

Kevin Spencer
Microsoft MVP

Printing Components, Email Components,
FTP Client Classes, Enhanced Data Controls, much more.
DSI PrintManager, Miradyne Component Libraries:
http://www.miradyne.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top