How to index HTML files locally even with ROBOTS noindex?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I have a local mirror copy of the Web sites I manage. Some of the HTML pages
I don't want to be indexed by Web spiders / robots, so I put the ROBOTS
meta-tag with "noindex" in them.

However, I would like those files to be indexed locally, so that I can find
things in them with the local Windows indexed search function. But the
Windows HTML filter intentionally does NOT index files with ROBOTS noindex,
so I don't get those files in my local searches.

Is there a way to tell the HTML filter to go ahead and index HTML files even
if they have the ROBOTS noindex meta-tag? I want my local and remote copies
to be indentical, so I don't want to have ROBOTS index locally and ROBOTS
noindex remotely.

Anybody else run into that problem? Anyone has a solution?

Thanks!

YMA
 
YMA said:
I have a local mirror copy of the Web sites I manage. Some of the HTML
pages
I don't want to be indexed by Web spiders / robots, so I put the ROBOTS
meta-tag with "noindex" in them.

However, I would like those files to be indexed locally, so that I can
find
things in them with the local Windows indexed search function. But the
Windows HTML filter intentionally does NOT index files with ROBOTS
noindex,
so I don't get those files in my local searches.

Is there a way to tell the HTML filter to go ahead and index HTML files
even
if they have the ROBOTS noindex meta-tag? I want my local and remote
copies
to be indentical, so I don't want to have ROBOTS index locally and ROBOTS
noindex remotely.

Anybody else run into that problem? Anyone has a solution?


I just put a robots.txt file in the root folder of the website instead. I
do not know of the metatag, but maybe the text file is more flexible, as you
can define which folders the spiders can index or not.

Loads more info here:
http://www.google.co.uk/search?sourceid=navclient&ie=UTF-8&rlz=1T4SUNA_enGB227GB227&q=robots.txt

ss.
 
Synapse Syndrome said:
I just put a robots.txt file in the root folder of the website instead. I
do not know of the metatag, but maybe the text file is more flexible, as
you can define which folders the spiders can index or not.

Loads more info here:
http://www.google.co.uk/search?sourceid=navclient&ie=UTF-8&rlz=1T4SUNA_enGB227GB227&q=robots.txt


Also, it says that not all spiders listen to the metatag, according to this
page:
http://www.robotstxt.org/wc/exclusion.html#meta

ss.
 
Thanks for your answer, but I do not have access to the root folder of my
websites (with just one exception). So, I really need to be able to tweak the
local HTML filter on my machine...

BTW, I am aware of the limitations of the ROBOTS noindex meta-tag, but I can
live with them.

YMA
 
Back
Top