Screen Scraping a web page

S

Steve

I am working on an application to screen scrape information from a web page.
I have the base code working but the problem is I have to login before I can
get the info I need. The page is hosted on my Router. When I go to the IP of
the router I get the following page.

<HTML>
<head>
<meta http-equiv="content-type" content="text/html;charset=iso-8859-1">
<title>Login</title>
</head>

<BODY bgcolor="#f79900">
<form action="LOGIN.HTM" method="post" name="tF">
<input type="hidden" name="page" value="login">
<table border="0" width="100%" height="184" cellspacing="0">
<tr>
<td width="100%" height="103" colspan="2" align="center">
<a href="http://support.speedstream.com"><img border="0"
src="IMAGE/SIELOGOBLACK.JPG" width="270" height="40"></a>
</td>
</tr>
<tr>
<td width="100%" height="19" colspan="2" align="center">
<H2><font face="Arial, Helvetica, sans-serif" color="#FFFFFF">Login
&nbsp;Screen</font></H2>
</td>
</tr>

<tr>
<td width="50%" height="19" align="right">
<font face="Arial, Helvetica, sans-serif" size="2"
color="#FFFFFF">Password&nbsp;&nbsp;&nbsp;:</font></td>
<td width="50%" height="19" align="left">
<INPUT type="password" maxLength=12 size=9 name=pws></td><p>
</tr>
<tr>
<td width="50%" height="19">&nbsp;</td>
<td width="50%" height="19">&nbsp;</td>
</tr>
<tr>
<td width="50%" height="19" align="right">
<INPUT type="submit" value=" Login ">
</td>
<td width="50%" height="19" align="left">
<INPUT class=button onclick=window.close(); type=button value= Cancel >
</td>
</tr>
</table>
</form></BODY>


</HTML>

Using the informaation on this page I have developed the following code for
my application, but ever time I run it I get:

An unhandled exception of type 'System.Net.WebException' occurred in
system.dll

Additional information: The underlying connection was closed: The server
committed an HTTP protocol violation.

My code is as follows:

Imports System.Net

Imports System.IO

Public Class Form1

Inherits System.Windows.Forms.Form

#Region " Windows Form Designer generated code "

Public Sub New()

MyBase.New()

'This call is required by the Windows Form Designer.

InitializeComponent()

'Add any initialization after the InitializeComponent() call

End Sub

'Form overrides dispose to clean up the component list.

Protected Overloads Overrides Sub Dispose(ByVal disposing As Boolean)

If disposing Then

If Not (components Is Nothing) Then

components.Dispose()

End If

End If

MyBase.Dispose(disposing)

End Sub

'Required by the Windows Form Designer

Private components As System.ComponentModel.IContainer

'NOTE: The following procedure is required by the Windows Form Designer

'It can be modified using the Windows Form Designer.

'Do not modify it using the code editor.

Friend WithEvents lblMyIp As System.Windows.Forms.Label

Friend WithEvents txtMyIp As System.Windows.Forms.TextBox

<System.Diagnostics.DebuggerStepThrough()> Private Sub InitializeComponent()

Dim resources As System.Resources.ResourceManager = New
System.Resources.ResourceManager(GetType(Form1))

Me.lblMyIp = New System.Windows.Forms.Label

Me.txtMyIp = New System.Windows.Forms.TextBox

Me.SuspendLayout()

'

'lblMyIp

'

Me.lblMyIp.Location = New System.Drawing.Point(8, 8)

Me.lblMyIp.Name = "lblMyIp"

Me.lblMyIp.Size = New System.Drawing.Size(56, 23)

Me.lblMyIp.TabIndex = 0

Me.lblMyIp.Text = "My Ip:"

'

'txtMyIp

'

Me.txtMyIp.Location = New System.Drawing.Point(64, 8)

Me.txtMyIp.Multiline = True

Me.txtMyIp.Name = "txtMyIp"

Me.txtMyIp.Size = New System.Drawing.Size(376, 248)

Me.txtMyIp.TabIndex = 1

Me.txtMyIp.Text = ""

'

'Form1

'

Me.AutoScaleBaseSize = New System.Drawing.Size(7, 19)

Me.ClientSize = New System.Drawing.Size(448, 262)

Me.Controls.Add(Me.txtMyIp)

Me.Controls.Add(Me.lblMyIp)

Me.Font = New System.Drawing.Font("Times New Roman", 12.0!,
System.Drawing.FontStyle.Regular, System.Drawing.GraphicsUnit.Point,
CType(0, Byte))

Me.Icon = CType(resources.GetObject("$this.Icon"), System.Drawing.Icon)

Me.MaximizeBox = False

Me.MinimizeBox = False

Me.Name = "Form1"

Me.StartPosition = System.Windows.Forms.FormStartPosition.CenterScreen

Me.Text = "MyIpReader"

Me.ResumeLayout(False)

End Sub

#End Region

Private Sub Form1_Load(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles MyBase.Load

txtMyIp.Text = ReadHTMLPage(http://192.168.1.1:88/login.htm)

End Sub

Public Function ReadHTMLPage(ByVal url As String) As String

Dim result As String = ""

Dim strPost As String = "page=login&pws=password"

Dim myWriter As StreamWriter

Dim objRequest As HttpWebRequest = WebRequest.Create(url)

objRequest.Method = "POST"

objRequest.ContentLength = strPost.Length

objRequest.ContentType = "application/x-www-form-urlencoded"

Try

myWriter = New StreamWriter(objRequest.GetRequestStream())

myWriter.Write(strPost)

Catch e As Exception

Return e.Message

Finally

myWriter.Close()

End Try

Dim objResponse As HttpWebResponse = objRequest.GetResponse()

Dim sr As StreamReader

sr = New StreamReader(objResponse.GetResponseStream())

result = sr.ReadToEnd()

sr.Close()

Return result



End Function

End Class

I can't figure out what I am doing wrong here? Any guidance would be
appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top