Web Crawler

  • Thread starter Thread starter Ernie
  • Start date Start date
E

Ernie

All,

I am trying to create a crawler to go out to a site, collect
information and store it. While I have addressed a lot of the issues in
the collection of the information I am stuck on one particular issue.

It apears that a page posts back to itself to move to the next result
set. The following is the "View Source" from the page:

<td align="right"><b>Page&nbsp;</b></td>
<td width="14" align="center"><input
name="_ctl1:PageControlTop:PageNumberEdit" type="text" value="1"
maxlength="3" id="_ctl1_PageControlTop_PageNumberEdit" size="2"
style="width:30px;" /></td>
<td><b>&nbsp;of 21</b></td>
<td>
<a id="_ctl1_PageControlTop_GoToBtn"
href="javascript:__doPostBack('_ctl1$PageControlTop$GoToBtn','')"><b>Go<
/b></a>

</td>
<td>&nbsp;</td>
<td align="right">

Next&nbsp;</td>
<td width="21"><a id="_ctl1_PageControlTop_NextBtn"
href="javascript:__doPostBack('_ctl1$PageControlTop$NextBtn','')"><img
src="/images/buttons/next_button.gif" width="21" height="15" border="0"
alt="Next"></a>

</td>

Any help would be greatly appreciated. This is a windows
application using C#. I can also use VB.NET also.

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
do a submit to the form's url with code (i think you know how to post with
code..).
for example, if case is :
__doPostBack('_ctl1$PageControlTop$GoToBtn','')
send __EVENTTARGET=_ctl1:PageControlTop:GoToBtn
and __EVENTARGUMENT=''
as post data.
hope this helps..
 
Crow,

Thank you, I will look at the process with writing the page back
with post, I dont know how to do it yet but I can at least now know what
to look for. Once again Thanx

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
Crow,

I thought I did, I was using the following:
HttpWebRequest urlRequest = (HttpWebRequest)
WebRequest.Create(newUrl);
HttpWebResponse urlResponse = (HttpWebResponse)
urlRequest.GetResponse();
But how do you attache the current page and viewstate to the page
so I can post it back to itself?

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
Crow,

I thought I did, I was using the following:
HttpWebRequest urlRequest = (HttpWebRequest)
WebRequest.Create(newUrl);
HttpWebResponse urlResponse = (HttpWebResponse)
urlRequest.GetResponse();
But how do you attache the current page and viewstate to the page
so I can post it back to itself?

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
Just to identify, I set the Method to a POST:
urlRequest.Method="POST"

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
I set the Method to POST:
urlRequest.Method="POST"

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
You can easily write Web Crawler with SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm ).

SW Explorer Automation (SWEA) creates an object model (automation
interface) for any Web application running in Internet Explorer. It
allows visually generate test scripts based on the defined object
model.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top