Using WebRequest to get the rendered HTML of protected page, returns login page

S

Stephen Miller

I have an ASPX report and I want to capture the rendered HTML and
write to a file on the webserver. Several posts suggest using
WebRequest to make a second call to the page, and screen-scrape the
resulting HTML. The technique typically described is:

'-- Get the current URL and request page
Dim url As String =
System.Web.HttpContext.Current.Request.Url.AbsoluteUri
Dim req As System.Net.WebRequest = System.Net.WebRequest.Create(url)

Dim result As System.Net.WebResponse = req.GetResponse()
Dim ReceiveStream As Stream = result.GetResponseStream()

Dim read() As Byte = New Byte(512) {}
Dim bytes As Integer = ReceiveStream.Read(read, 0, 512)

'-- Read contents and append to StringBuilder
Dim sbPage As New System.Text.StringBuilder()
While (bytes > 0)
Dim encode As System.Text.Encoding =
System.Text.Encoding.GetEncoding("utf-8")
sbPage.Append(encode.GetString(read, 0, bytes))
bytes = ReceiveStream.Read(read, 0, 512)
End While

My problem is that
Firstly, doesn't this necessitate a second round trip to the server
adding performance overheads?
Secondly, my report is password protected (authentication mode is
Forms) and this technique redirects to the designated login form.

Is there another way to get a string representation of the rendered
HTML? I have been fooling around with the OutputStream without any
luck.


As a side note, writing the HTML to file is part of a dodgy workaround
that shells to a DOS program and converts the resulting HTML to PDF
format, prior to flushing the current response and sending the PDF
instead. I have looked at dozens of commercial products but haven't
found one that can convert the rendered ASPX page to PDF on the fly
(allowing me to provide all report layout in ASPX mark-up). Is anyone
aware of a commercial product that can resultant do this?

I know SQL Server 2000 Reporting Services has just become available,
but I don't have VS2003.

Regards,

Stephen
 
R

Rick Strahl [MVP]

Hi Stephen,

Is the Report and ASPX page in the same application? If so you might want to
look into just calling Server.Execute() to execute the page which allows you
to run the page and pass in your own HTMLTextWriter() and then retrieve the
result.

Something like this:

public static string AspTextMerge(string TemplatePageAndQueryString,ref
string ErrorMessage)
{
string MergedText = "";

// *** Save the current request information
HttpContext Context = HttpContext.Current;

// *** Fix up the path to point at the templates directory
TemplatePageAndQueryString = Context.Request.ApplicationPath +
"/templates/" + TemplatePageAndQueryString;

// *** Now call the other page and load into StringWriter
StringWriter sw = new StringWriter();
try
{
Context.Server.Execute(TemplatePageAndQueryString,sw);
MergedText = sw.ToString();
}
catch
{
MergedText = null;
}

return MergedText;
}

FWIW, using an HTTP request is not much slower in this situation - the above
code also requires a fair amount of overhead as ASP.Net has to perform some
fixup to 'fake' this request through Execute. I've used HTTP in a number of
situations with good results - your only concern will be not tying up the
ASP.Net thread for too long waiting for the report to finish - if that's the
case you may have to do this asynchronously...

+++ Rick ---

--

Rick Strahl
West Wind Technologies
http://www.west-wind.com/
http://www.west-wind.com/weblog/
 
S

Stephen robinson

Hi Stephen,

I have been looking to do exactly the same as you now for about 2 weeks.
What I have found is that although there are lots of comercial products
out there none really do screen scraping. I thik th ework around is as
follows. If you create a response filter you can take a copy of the
output buffer and write it to a file. Mark this file to sit in a
virtual directory (so you get the stylesheet). Then using .net pass the
HTML file into a 3rd party product such as ABCPDF or HTMLDraw (Image)
check out www.webgoo.com for these products - image products are much
cheaper than PDF ones. I have the first part working (Copy of the file)
but I now need to strip out the javascript. Then that should be it. If
you drop me a line on my email I will dive you more details. One person
mentioned that in version 2.0 of .net you can create dynamic images
which seeing as we already have the output stream my be the exact
solution.

I hope this helps

Steve
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top