Web Crawler

E

Ernie

All,

I am trying to create a crawler to go out to a site, collect
information and store it. While I have addressed a lot of the issues in
the collection of the information I am stuck on one particular issue.

It apears that a page posts back to itself to move to the next result
set. The following is the "View Source" from the page:

<td align="right"><b>Page&nbsp;</b></td>
<td width="14" align="center"><input
name="_ctl1:pageControlTop:pageNumberEdit" type="text" value="1"
maxlength="3" id="_ctl1_PageControlTop_PageNumberEdit" size="2"
style="width:30px;" /></td>
<td><b>&nbsp;of 21</b></td>
<td>
<a id="_ctl1_PageControlTop_GoToBtn"
href="javascript:__doPostBack('_ctl1$PageControlTop$GoToBtn','')"><b>Go<
/b></a>

</td>
<td>&nbsp;</td>
<td align="right">

Next&nbsp;</td>
<td width="21"><a id="_ctl1_PageControlTop_NextBtn"
href="javascript:__doPostBack('_ctl1$PageControlTop$NextBtn','')"><img
src="/images/buttons/next_button.gif" width="21" height="15" border="0"
alt="Next"></a>

</td>

Any help would be greatly appreciated. This is a windows
application using C#. I can also use VB.NET also.

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
T

The Crow

do a submit to the form's url with code (i think you know how to post with
code..).
for example, if case is :
__doPostBack('_ctl1$PageControlTop$GoToBtn','')
send __EVENTTARGET=_ctl1:pageControlTop:GoToBtn
and __EVENTARGUMENT=''
as post data.
hope this helps..
 
E

Ernie

Crow,

Thank you, I will look at the process with writing the page back
with post, I dont know how to do it yet but I can at least now know what
to look for. Once again Thanx

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
E

Ernie

Crow,

I thought I did, I was using the following:
HttpWebRequest urlRequest = (HttpWebRequest)
WebRequest.Create(newUrl);
HttpWebResponse urlResponse = (HttpWebResponse)
urlRequest.GetResponse();
But how do you attache the current page and viewstate to the page
so I can post it back to itself?

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
E

Ernie

Crow,

I thought I did, I was using the following:
HttpWebRequest urlRequest = (HttpWebRequest)
WebRequest.Create(newUrl);
HttpWebResponse urlResponse = (HttpWebResponse)
urlRequest.GetResponse();
But how do you attache the current page and viewstate to the page
so I can post it back to itself?

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
E

Ernie

Just to identify, I set the Method to a POST:
urlRequest.Method="POST"

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 
E

Ernie

I set the Method to POST:
urlRequest.Method="POST"

A Plan without Action is a DayDream
Action without a Plan is a Nightmare
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top