saving as Html or Mht

J

Jack B

When saving a web page there are options to do it as Html or Mht. Which is
more practical?

I know Mht is the one file while Html is file and folder.
Any browser can use Html + files; not all browsers correctly support Mht.

So, assuming you don't want to have all the separate files with Html, then
it seems best to go with Mht. Is there any downside to storing as Mht?

Jack
 
K

Kaja

Hi Jack, my knowledge is limited but I may be able to assist you a bit. On
Internet Explorer you can save it as a web page, HTML format. I think the
question you need to ask yourself is what do you want to do with the saved
document? Do you want to print it? Or have an exact copy of that webpage?
Keep in mind websites constanly change and update so what you save will not
reflect any updates or changes on that webpage.

I am a writer and one option you can do is in IE go to tools, internet
options, Programs. Then under HTML editing select Microsoft Word if you have
that. Hit apply and OK to save the setting.Then when you are on a website
hit file, and Edit With Microsoft Word. Now Word will take a moment to open
up and display the document. Now at the bottom you will see the save as
type. It should be HTMl webpage at first. Now from there you can click on
type to save and you will have many more options such as Word document, etc.
I hope this helps. Please let me know if it does.
Best Regards,
Kaja
 
J

Jack B

Kaja,

Actually, I'm just interested in preserving the web page for future
reference to the info therein. I'm wondering if saved as HTML or Mht if
anything saved communicates with the Internet and possibly changes what I've
saved when the website is changed. Or, if deleting cookies or TIFs deletes
anything in a saved HTML or Mht .

Jack
-----------------------------------


Hi Jack, my knowledge is limited but I may be able to assist you a bit. On
Internet Explorer you can save it as a web page, HTML format. I think the
question you need to ask yourself is what do you want to do with the saved
document? Do you want to print it? Or have an exact copy of that webpage?
Keep in mind websites constanly change and update so what you save will not
reflect any updates or changes on that webpage.

I am a writer and one option you can do is in IE go to tools, internet
options, Programs. Then under HTML editing select Microsoft Word if you
have
that. Hit apply and OK to save the setting.Then when you are on a website
hit file, and Edit With Microsoft Word. Now Word will take a moment to open
up and display the document. Now at the bottom you will see the save as
type. It should be HTMl webpage at first. Now from there you can click on
type to save and you will have many more options such as Word document, etc.
I hope this helps. Please let me know if it does.
Best Regards,
Kaja
 
R

Richard

(By TIF we mean Temporary Internet Files, not .tif/.tiff graphics.)

Hi Jack, I've included a lot of generic information in my reply which you
probably already know, but maybe other readers might find useful. I've
learned a few new things while researching this topic. As for potential
downside: GIF graphics files in the html "file and folder" format are stored
in the original compact GIF-format, but in a larger format within the single
file MHT, so the MHT files would take up more space on your drive.

Here's something I noticed in the IE6 "Help", concerning ways "To save a
Web page on your computer:"

"To save all of the information needed to display this page in a single
MIME-encoded file, click Web Archive. This option saves a snapshot of the
current Web page. This option is available only if you have installed
Outlook Express 5 or later."

My winXP-pro system came with OE6 and IE6. I think I read somewhere that the
IE7 and IE8 upgrades still use Outlook Express version 6. I don't know
what third-party browsers can utilize the MHT file type, but I think both
Email and Newsgroup messages are transmitted over the internet with MIME
encoding. MIME encoding is not a Microsoft-only "proprietary" thing.


Kaja said:
news:[email protected]... [snip]
I think the question you need to ask yourself is what do you want to do
with the saved document? Do you want to print it? Or have an exact
copy of that webpage?
[snip]

Jack B said:
Actually, I'm just interested in preserving the web page for future
reference to the info therein.

The web page "File|SaveAs" dialog in IE6 has 4 options;

1) Web page, complete (*.htm;*.html) - [default]
2) Web Archive, single file (*.mht)
3) Web Page, HTML only (*.htm;*.html)
4) Text File (*.txt)

If you only need text "info" without graphics, option #3 will do. That is
how I normally FileSaveAs a page, and if any graphics is an actual
illustration, I separately right-click SavePictureAs. That doesn’t always
work when viewing OFFLINE, if the web page accesses the graphics from a
different folder, or has a full URL source code, and you would have to edit
the html code (right-click|ViewSource) to change that graphic’s IMG SRC=
folder path, or create a same-name local sub-folder to store the images. (Or
separately drag/drop the image file into the browser to view it. Argh... :)

Saving as #4 (txt) stores the page as plain text only, without html tags or
formatting such as bold, colors, font size, tables, graphics, etc. That
method normally produces the smallest file sizes on your disk/drive.

Option #1, which stores an *.htm page and a folder with other stuff, does
not always include everything necessary for full "dynamic" page function.
Another drawback to that method is that the folder and htm are keyed
together somehow, so if you delete the folder, the htm file also gets
deleted. I don’t like that default behavior. (I just tried option #1 with
my local drive home page displayed in IE6, which does NOT have any style
sheet links or graphics, and it saved it as an htm file only with no folder,
which would have been empty anyway.)

Another drawback to the page+folder method would be saving several pages
from the same site that use the same graphics and style sheets in each page,
and you end up with the same graphics and CSS files stored in EACH
page+folder file combo. Likewise with the MHT files, with MIME encoded
graphics duplicated in each page of a series. When viewing pages on the
internet, the common files and graphics are downloaded to your TIF once, and
when you view other pages that use the same files, IE6 fetches the graphics
from TIF, rather than download them again with each page in a series.

I'm wondering if saved as HTML or Mht if anything saved communicates with
the Internet and possibly changes what I've saved when the website is
changed. Or, if deleting cookies or TIFs deletes anything in a saved
HTML or Mht .
Jack


If viewing the page from your local drive while OFFLINE, there should not be
any "changes" due to scripts adding in updated or different data, or cycling
advertisement graphics. Whether online or offline, relative links to www
pages will not work, but full URL (including the http stuff, not the file
name only,) may cause your browser to connect, (or try to connect,) to the
internet. (An exception would be if the page includes the html BASE HREF
element, which appends the relative links to the full url base.)

OK, I just did an MHT FileSaveAs of my local copy of my website home page,
which has a separate style sheet file and 5 graphics files, totalling about
14K, and the single file MHT was about 21K, which surprised me that it was
only about 50% larger. I right-clicked|OpenWith|Notepad, and discovered that
the file itself is a plain ASCII text file, in MIME format, with the
graphics included as base64 encoded ASCII characters, (Upper and lower case
letters, numerals, punctuation characters, but no extended ANSI characters
above CharNum 127.) After double-clicking the MHT it displayed in the
browser the same as usual, and the animated GIF was animating - another
surprise. Also surprising is that the MHT included the page counter graphic
which is generated by the web-based CGI script, but I was offline, so that
graphic should not have appeared, or so I thought. I found a copy in the TIF
folder from my last visit to my website, so it used that to display the
page. So, back to the question about deleting TIFs: If I had deleted the
TIF before displaying my local drive copy of the page, there would have been
no "count.gif" from TIF displayed when I saved the page as MHT. After
deleting TIF, however, the MHT base64 encoded gif is still in the MHT.

And back to your question if anything saved communicates with the internet:
If I had been CONNECTED when I accessed my local copy of the above page with
page visit counter, and there had NOT been a copy of count.gif in TIF, I
assume that page would have accessed the CGI script on the internet website
to fetch a next higher number count.gif, since the graphic in that page has
a full URL internet address. I haven't verified that.

Another downside to the MHT appeared when I right-clicked on the animated
GIF display, chose SavePicture, but was only given the option to save it in
..BMP format, which would only save the 1st of several pictures from the
original GIF animation. The GIF files in the folder of the Page+Folder
method retain the original GIF structure. (I also got an IE6 security
warning about running "active content" on my local computer when I
right-clicked the animation, and chose Properties, which identified the
picture in the MHT as Unknown Protocol.)

Deleting cookies or TIFs (Temporary Internet Files) should have no effect on
files in folders located outside the TIF folder, (with incidental exceptions
like the count.gif case mentioned above.) Of course if there is any unsaved
file in TIF that you want a copy of, you need to open TIF, locate the
file(s), select, and right-click|copy, and paste into another folder
somewhere, before you delete your TIF.

Another way to save text from web pages, with some limited formatting such
as bold, underline, text-color, is to SelectAll (or partially select certain
info on page only,) copy (ctrl-C) and then switch to Windows Word Pad and
paste it into a blank page and FileSaveAs *.RTF (Rich Text Format.) Tip: On
some multi-column pages with advertising or navigation bars to the side of
the article, you can select only the article, by first: beginning to select
the first word in the article, then mouse-scrolling to the bottom (or
PageDown key or End key,) and then while holding SHIFT-key, click at the end
to select all between.

Summary:
: If you only need the text, use option #3, #4, or copy to RTF.
: If you need the graphics also, use option #1 or #2.
: If you don't want multiple copies of graphics, use #3 and separately save
graphics once. (And possibly edit html or create a sub-folder for the
graphics with the folder name the page is looking for. Argh... :)

For those who don't know:
WordPad location: Start | [All] Programs | Accessories | WordPad
To open TIF from browser: Tools | Internet Options | Settings | View Files
or outside the browser:
Start | Settings | Control Panel | Internet Options | Settings | View Files

(Triple-Click here to: Have a nice day! --Richard :)

- - -
The Pilgrims and the Mayflower Compact
http://www.avbtab.org/rc/pilgrims.htm
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top