WebClient.DownloadFile - Missing Carriage Returns & Line Feeds

  • Thread starter Thread starter Lila Godel
  • Start date Start date
L

Lila Godel

I am having a problem with the download of web pages via the
WebClient.DownloadFile function in the specialized VB.Net 2003 I.E.
plug-in I am designing to speed up the work on my latest project.

When I edit the web pages in notepad I see a square wherever I normally
see a carriage return and a line feed when viewing the source in I.E.

How can I make the web pages come down readable so I can easily make my
two manual deletions (different text in each file)?

The code located in modMain.bas (app has no forms) is as follows:

Sub SaveFile
'define variables
Dim clsWebClient As WebClient = New WebClient
Dim intFileNameStart As Integer
Dim intFilePathStart As Integer
Dim strFileName As String
Dim strLocalFilePath As String
Dim strRemoteFilePath As String

'intialize variables
strRemoteFilePath = System.Environment.GetCommandLineArgs(1).ToString
intFileNameStart = (InStrRev(strRemoteFilePath, "/") + 1)
strFileName = Mid$(strRemoteFilePath, intFileNameStart)

strLocalFilePath = Left$(strRemoteFilePath, (intFileNameStart - 2))
intFilePathStart = (InStrRev(strLocalFilePath, "/") + 1)
strLocalFilePath = Mid$(strLocalFilePath, intFilePathStart)

'download and store file
clsWebClient.DownloadFile(strRemoteFilePath, _
conCfgInfo.strDefaultParentPath & _
strLocalFilePath & "\" & strFileName)

'clean up
clsWebClient.Dispose()

End Sub

conCfgInfo.strDefaultParentPath which contains "D:\Pictures\Big Dig\" is
read from Save File.cfg via another sub. The first command line
argument is set by VBS code in Save File.htm when I select the right
click menu Save File... choice in I.E. to the URL of what I right
clicked on. strLocalFilePath is set to the last subdirectory in the
remote path.

So if strRemoteFilePath contains
http://web.archive.org/web/20021009015209/http://www.bigdig.com/thtml/f073100/img001.htm
img001.htm gets downloaded to D:\Pictures\Bigdig\f073100\img001.htm in
my local file system.
 
I just found the problem. After numerous revisions to try and get the
web page to be saved in an encoding different from the Western European
(Windows Encoding) used by explorer with Auto-Select checked that would
let notepad render the source properly I noticed that the MS-DOS Editor
and type command would render the page properly.

Opening the page in the MS-DOS Editor and using the Save command on the
File menu would enable me to see the page correctly in notepad, but with
the source code of the page you get redirected to from
http://www.bigdig.com/thtml/f073100/img001.htm not the source code for
http://web.archive.org/web/20021009015209/www.bigdig.com/thtml/f073100/img001.htm.

After that I replaced the

clsWebClient.DownloadFile(strRemoteFilePath, _
conCfgInfo.strDefaultParentPath & _
strLocalFilePath & "\" & strFileName)

code with

WriteWebPage(GetWebPage(strRemoteFilePath),
conCfgInfo.strDefaultParentPath & _
strLocalFilePath & "\" & strFileName)

where the new GetWebPage function returns a string with the source of a
passed in web page and the new WriteWebPage sub writes a passed in web
page source to a passed in file name. I placed

strNewData = Replace(strWebPage, vbLf, vbCrLf)
'eliminates the display of squares in notepad

in the WriteWebPage sub just before the PrintLine command.

As soon as I rebuilt the program and ran it again I got a readable web
page source (still for the wrong web page).

A quick check revealed improper VBS code in Save File.htm.

It originally had:

if external.menuArguments.event.srcElement.tagName = "IMG" then
userURL = external.menuArguments.event.srcElement.href
else
userurl = external.menuArguments.event.srcElement.parent.href
end if

because I could not use
userURL = external.menuArguments.event.srcElement.href for .htm files
with out getting a script error.

When I replaced the second occurrence of
userurl = external.menuArguments.event.srcElement.parent.href with
userurl = external.menuArguments.document.URL I got the right web page
complete with carriage returns and line feeds.
 
Back
Top