Saving a web page

  • Thread starter Thread starter alexey_r
  • Start date Start date
A

alexey_r

Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!
 
You'll have to get the img tags and download them manually; basically,
write some code which normally a browser would do.

So, parse the <img> tags (and <a> tags, if you like), then use
HttpRequest to get the images.

HTH
Andy
 
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets and
so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.

Thank you in advance!

Hi,

Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <img> tag, or a <link> tag and deciding to download the file
that the tag is referencing.

You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.
 
Hello (e-mail address removed),

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT

PS: This lib could be used for parsing http://www.codeproject.com/csharp/mime_project.asp
Using HttpWebRequest and HttpWebResponse to retrieve a webpage seems
clear enough.

But unless I am missing something, this will only give me the html
source of the webpage requsted, and not all the images, stylesheets
and so on. Is there a simple way to get the entire webpage?

The alternatives I see now:
Get a WebBrowser in background to do it, but this seems very nasty.
There _has_ to be a better way. Besides, how can I select the correct
file type and enter the name in the backgound?
Interop with mshtml.dll. See above.
After getting the html file, I could iterate through the images, etc.
to request all of them separately.
Thank you in advance!
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsche
 
Michael said:
Hello (e-mail address removed),

I'd save page into MHT (web archive) and then parse it to get images
BTW images are encoded in the MHT

Ah, thank you. But how do I save it as MHT?
 
Tom said:
Hi,

Unfortunately, there isn't a simple way. The way web-browsers (usually)
work is that they start rendering the page, and download the
images/stylesheets/whatnot as they need them. They're parsing the HTML,
finding an <img> tag, or a <link> tag and deciding to download the file
that the tag is referencing.

You'll need to do this; i.e. analyse the HTML you've received, and decide
what needs to be downloaded by looking at the tags.

Thank you.
 
Hello (e-mail address removed),

What does "websites protected by password"?
Any example?
Have you tried to save that sites to MHT via IE?
Thank you again! Looks like it won't work for websites protected by
password, so I am back to plan A.
---
WBR,
Michael Nemtsev :: blog: http://spaces.msn.com/laflour

"At times one remains faithful to a cause only because its opponents do not
cease to be insipid." (c) Friedrich Nietzsch
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top