Chad A. Beckner said:
Agreed, but the (main) problem is that I am going to be serving templated
.htm files, which will run through asp.net. I need to retain the title and
meta tags. If you have a better idea of how to do this, trust me, I'm all
ears (well, actually, in this case, eyes...

) We have files published
that our users edit, in .htm format, and I really don't want to have to
change all them over to .aspx files everytime they get updated (or change
the 2000+ files we already have). Thoughts?
Sorry, Chad, I forgot about your problem with the legacy .htm files. In
particular, you're saying that users have to be able to generate these .htm
files _as_ .htm files. Perhaps the users use "Save As Web Page" from within
some application?
The advice I gave earlier stems from the time when I had to implement
templating. We had a large number of .aspx pages which had been translated
(barely) from .asp pages. I was able to get at the title, meta and other
tags by writing a "simple" line editor and then the code to call it on the
existing pages to do the necessary edits. Some of those edits involved
finding the title and meta tags and translating them into something
server-side.
Now, you can't do this if you can't modify the source files. Instead, you
will have to parse the .htm files. So, I think you should do two things. Go
ahead with the "iframe trick" that I mentioned earlier, but also you will
have to read the .htm files into a string and use regular expressions to
search them for the title and meta tags you need.
This can seem like a daunting challenge, especially if you don't have much
experience with regular expressions. Here are a few tips.
1) Like mathematical expressions, regular expression can be broken down into
pieces simple enough to understand and then combined to make more
complicated expressions. For instance, once you understand what expressions
"x" and "y" do, you should have no trouble understanding what "(x)|(y)"
does.
2) I wrote a smallish Windows Forms application to help me test out my
regular expressions. This saved my sanity and that of my colleagues. The
current issue of MSDN magazine mentions a tool called "Regulator" which
seems to do similar things. They give
http://royo.is-a-geek.com/regulator as
the URL.
3) One of the big problems I had in parsing HTML was in handling all of the
attributes a particular tag might have. Although true regex gurus might have
been able to do it better, I wound up creating regular expressions which
could handle any combination of any of the attributes, and which would store
the value of the attribute in a named match whose name was based on the
attribute name. In fact, this happened so frequently that I created a method
which would take a tag name and an array of attribute names, and which would
return a regular expression to take care of all the possibilities. This was
not fun until I got it to work, believe me!
4) If any of this HTML was created by hand, you will find a fair number of
typos which a browser just happens not to complain about, but which your
regular expressions won't like. You'll want to have error processing which
will clearly point out the error so that it can be fixed.
Good luck. Unfortunately, I don't own the code I wrote, or I'd contribute
it.