Ferreting out broken links

  • Thread starter Thread starter Dave
  • Start date Start date
D

Dave

Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave
 
Try something like this. It is not the most intelligent nor elegant
solution, but it will get you what you want.

Dim aList As ArrayList
Dim qXML As Xml.XmlDocument
qXML = New Xml.XmlDocument

aList = New ArrayList

Dim oml As mylist


With oml
.SiteIndex = 1
.SiteURL = "http://www.microsoft.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

With oml
.SiteIndex = 1
.SiteURL = "http://www.yourdomain.com"
.SiteValidFlagBoolean = False
End With

aList.Add(oml)

For Each oml In aList

Try
qXML.Load(oml.SiteURL)
oml.SiteValidFlagBoolean = True
Catch exxml As System.Xml.XmlException
'Page loaded, but was not parsable by xml
oml.SiteValidFlagBoolean = True
Catch exweb As System.net.WebException
'Page Not Found
If exweb.ToString.IndexOf("404") > 0 Then
oml.SiteValidFlagBoolean = False
Else
'Some Other Net Message, prolly domain not found.
MsgBox(exweb.ToString)
End If
Catch ex As Exception
MsgBox(ex.ToString)
End Try

Next
 
Dave,

Not exactly sure what you are wanting but it might be similar to a function
I use in one of my apps. You can call this function within in a loop, and if
you don't receive a reponse it will catch the exception, it uses the MSHTML
class. It might be a little more than you need but might be what you are
looking for.

Public Function Send(ByVal URL As String, _
Optional ByVal PostData As String = "", _
Optional ByVal Method As HTTPMethod = HTTPMethod.HTTP_GET, _
Optional ByVal ContentType As String = "") As String
Dim Request As HttpWebRequest = WebRequest.Create(URL)
Dim Response As HttpWebResponse
Dim SW As StreamWriter
Dim SR As StreamReader
Dim ResponseData As String
Dim I As Integer
Dim RcookCon As New CookieContainer

' Prepare Request Object
Request.Method = Method.ToString().Substring(5)
Request.KeepAlive = True
Request.AllowAutoRedirect = True
If HldCookCon.Count > 0 Then
RcookCon = HldCookCon
End If
Request.CookieContainer = RcookCon

' Set form/post content-type if necessary
If (Method = HTTPMethod.HTTP_POST AndAlso PostData <> "" AndAlso
ContentType = "") Then
ContentType = "application/x-www-form-urlencoded"
End If

' Set Content-Type
If (ContentType <> "") Then
Request.ContentType = ContentType
Request.ContentLength = PostData.Length
End If

' Send Request, If Request
If (Method = HTTPMethod.HTTP_POST) Then
Try
SW = New StreamWriter(Request.GetRequestStream())
SW.Write(PostData)
Catch Err As WebException
MsgBox(Err.Message, MsgBoxStyle.Information, "Error")

Finally
Try
SW.Close()
Catch
'Don't process an error from SW not closing
End Try
End Try
End If
'Get Response
Try
Response = Request.GetResponse()
SR = New StreamReader(Response.GetResponseStream())
ResponseData = SR.ReadToEnd()
'Display cookies
For I = 0 To Response.Cookies.Count - 1
HldCookCon.Add(Response.Cookies.Item(I))
Next
Catch Err As WebException
Return False
Finally
Try
SR.Close()
Catch
'Don't process an error from SR not closing
End Try
End Try
Return ResponseData
End Function

Curtis
 
Is it difficult to write a program that, given an array of URLs, will probe
each one, and return a status of Found or Not Found? How would you approach
it?

While Googling, I found utility after utility that will do something like
that for you, but I would like to write a custom program to do this. It
doesn't have to be VB. C#, javascript, etc. - whatever will run on .NET.

Thanks in advance,

Dave

hmmm...

Should be fairly straight forward using the System.Net.WebClient class.
Or better yet, would probably be the System.Net.HttpWebRequest class...

Something like:

Dim request As HttpWebRequest
Dim response As HttpWebResponse

For Each url As String In urls
request = WebRequest.Create (url)
response = request.GetResponse ()

If Response.StatusCode = 404 Then
Console.WriteLine ("Not Found")
Else
Console.WriteLine ("Found")
End If
Next

Actually, you might want to do a more take a closer look at the
StatusCode :)
 
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"


So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave
 
Amdrit, Curtis, Tom,

I haven't worked all these details out yet, but I didn't want too much time
to pass before I said thanks.

I have only had time to try Curtis & Tom's solution. The
Response.GetResponse() line is giving me a problem when I try to use
https:// sites because they cannot establish a trust relationship. This may
not be fatal - it may be acceptable to just inform my users that they will
have to manually check those.

I'm getting a problem when I hit a 404; Tom's line "If Response.StatusCode
= 404 Then" just doesn't work.

I get the crash screen with the message "The remote server returned an
error: (404) Not Found"


So, I'm going to try to work with those methods, then I'll try Amdrit's
method if I can't get them good to go.

I really do appreciate the responses. I had no clue how to approach this
problem, and now I have plenty to work with.

Thanks again.

Dave

Dave,

Sounds like you might need to refine what I wrote some :) I did that
off the top, and didn't test any of that - so it may not be exactly
right. Another method that's a bit more work, but not to bad, is to
simply open a socket connection to the server and make the http request
your self. This would avoid the "non-trust" issues and exceptions :)
 
Dave,

This should fix the trusted relationship problem. Its kinda of a work around
that I found. Here is a link to an explanation of it:
http://gotdotnet.com/Community/MessageBoard/Thread.aspx?id=40795. I have
implemented it in VB.net by creating a new class with the following code.

Imports System.Net
Imports System.Security.Cryptography.X509Certificates
Public Class myCertificatePolicy
Implements ICertificatePolicy
Public Function CheckValidationResult(ByVal srvPoint As ServicePoint, _
ByVal cert As X509Certificate, ByVal request As
WebRequest, _
ByVal certificateProblem As Integer) _
As Boolean Implements
ICertificatePolicy.CheckValidationResult
'Return True to force the certificate to be accepted.
Return True
End Function
End Class

You would then call the class with this line in your application:

'force the certificate to be accepted
System.Net.ServicePointManager.CertificatePolicy = New myCertificatePolicy

This basically overrides a "non-trusted connection" by making your
application alway accept the certificates.

Curtis
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top