Page Source Code In Memory??

Chad A. Beckner · Jun 4, 2004

Ok, here's the situation:

I want to read the currently executing .aspx page "source code" from memory
as it is executing so I can grab certain values from within the page (for
example, say the page title). Any ideas?

Chad

John Saunders · Jun 4, 2004

Chad A. Beckner said:
Ok, here's the situation:

I want to read the currently executing .aspx page "source code" from memory
as it is executing so I can grab certain values from within the page (for
example, say the page title). Any ideas?

Chad, I've already told you "no". The source code is not in memory at all.
It will already have been compiled.

It's not like you're on the client and can use the DOM.

Chad A. Beckner · Jun 4, 2004

Ok, ok, just trying to figure out different ways...

I can use a streamreader and open up the file, place the contents into a
string, and use a regular expression, right? How much server load do you
think that will add?

Chad

Chad A. Beckner · Jun 4, 2004

Ok, ok, I was just looking into all possibilities. :-)

How much overhead would a file reader add?

Chad

John Saunders · Jun 4, 2004

Chad A. Beckner said:
Ok, ok, I was just looking into all possibilities.

How much overhead would a file reader add?

Chad

Chad, surely there's a better way to do this? I think you should consider
re-evaluating your requirements.

What if the <title> tag doesn't actually appear in the .aspx file? It can be
generated by script, for instance. It could also be generated within a user
control or a custom server control. In this case, you'll never see it.

You said that you need access to the title and to the meta tags. Did you
realize that you can turn both into server controls simply by adding
runat="server"?

<html>
<head>
<title runat="server" id="myTitle">Some Title</title>
<meta runat="server" id="myMeta1" name="GENERATOR" content="Microsoft
Visual Studio.NET 7.0">
</head>
<body>

</body>
</html>

Both myTitle and myMeta1 will appear in your codebehind as
HtmlGenericControl instances, and you'll be able to access their attributes,
InnerText and InnerHtml.

Chad A. Beckner · Jun 4, 2004

Agreed, but the (main) problem is that I am going to be serving templated
..htm files, which will run through asp.net. I need to retain the title and
meta tags. If you have a better idea of how to do this, trust me, I'm all
ears (well, actually, in this case, eyes...

) We have files published
that our users edit, in .htm format, and I really don't want to have to
change all them over to .aspx files everytime they get updated (or change
the 2000+ files we already have). Thoughts?

(and thanks!)

Chad

Kevin Spencer · Jun 4, 2004

You're barking way up the wrong tree here, Chad. Forget about source code.
Think about objects. The Page is an object. The Page title is an object. In
..Net, Everything is an object.

--
HTH,
Kevin Spencer
..Net Developer
Microsoft MVP
Big things are made up
of lots of little things.

bruce barker · Jun 4, 2004

try a httphandler.

Chad A. Beckner said:
Agreed, but the (main) problem is that I am going to be serving templated
.htm files, which will run through asp.net. I need to retain the title and
meta tags. If you have a better idea of how to do this, trust me, I'm all
ears (well, actually, in this case, eyes... ) We have files published
that our users edit, in .htm format, and I really don't want to have to
change all them over to .aspx files everytime they get updated (or change
the 2000+ files we already have). Thoughts?

(and thanks!)

Chad

can

Chad A. Beckner · Jun 4, 2004

Ok, how would I get the page title using a HTTPHandler? (sorry, somewhat new
to HTTPHandlers). I know I can route requests for .htm files to be
processed by the asp.net compiler, etc, but how would I retrieve the title
of a page?

Chad

Chad A. Beckner · Jun 4, 2004

That was my ititial thoughts, but I can't figure out how to access a page's
title element!! :-( I can't add "runat=server" to all our .htm pages, it
just isn't possible. Do you have an example that I could investigate?

Thanks

Chad

Chad A. Beckner · Jun 4, 2004

Here is what I have ended up with (for now). Hopefully someone has a better
solution. Remember, I can't go through and change all 2000+ .htm files to
aspx and add in <title id="pageTitle" runat="server" />, but if you have a
better idea, either by accessing the page title using vb, I'll be very
happy!

Dim Header As String
Dim Page_Contents As String
Dim File_Stream As New StreamReader(Server.MapPath(Request.RawUrl))
Page_Contents = File_Stream.ReadToEnd()
File_Stream.Close()

Dim RegExp As Regex
Dim Matcher As Match
'Create a regular expression object
RegExp = New Regex("<head>(.*)<\/head>", RegexOptions.IgnoreCase Or
RegexOptions.Multiline Or RegexOptions.Singleline Or
RegexOptions.IgnorePatternWhitespace Or RegexOptions.Compiled)
Header_Found = RegExp.IsMatch(Page_Contents)
If Header_Found Then
Matcher = RegExp.Match(Page_Contents)
Header = Matcher.ToString().Replace("<head>", "").Replace("</head>", "")
End If
Chad

Kevin Spencer · Jun 4, 2004

Okay, I'm beginning to follow you somewhat. How are these .htm files loaded
into the ASPX pages? It seems that you would have to fetch them from the
file system somewhere, read them into a string, and then put them into the
page somehow. If at any point you have the HTML content as a string, you
could use a Regular Expression to find and work with the title.

--
HTH,
Kevin Spencer
..Net Developer
Microsoft MVP
Big things are made up
of lots of little things.

Chad A. Beckner · Jun 4, 2004

I plan on using server.execute in order to conserve memory (and time). I
also want to be able to "template" the .aspx files (which is working) and
grab the title. I know about the runat="server" tag for the title header,
but I also do not want to go through and put it in the code behind...

Chad

John Saunders · Jun 4, 2004

Chad A. Beckner said:
Agreed, but the (main) problem is that I am going to be serving templated
.htm files, which will run through asp.net. I need to retain the title and
meta tags. If you have a better idea of how to do this, trust me, I'm all
ears (well, actually, in this case, eyes... ) We have files published
that our users edit, in .htm format, and I really don't want to have to
change all them over to .aspx files everytime they get updated (or change
the 2000+ files we already have). Thoughts?

Sorry, Chad, I forgot about your problem with the legacy .htm files. In
particular, you're saying that users have to be able to generate these .htm
files _as_ .htm files. Perhaps the users use "Save As Web Page" from within
some application?

The advice I gave earlier stems from the time when I had to implement
templating. We had a large number of .aspx pages which had been translated
(barely) from .asp pages. I was able to get at the title, meta and other
tags by writing a "simple" line editor and then the code to call it on the
existing pages to do the necessary edits. Some of those edits involved
finding the title and meta tags and translating them into something
server-side.

Now, you can't do this if you can't modify the source files. Instead, you
will have to parse the .htm files. So, I think you should do two things. Go
ahead with the "iframe trick" that I mentioned earlier, but also you will
have to read the .htm files into a string and use regular expressions to
search them for the title and meta tags you need.

This can seem like a daunting challenge, especially if you don't have much
experience with regular expressions. Here are a few tips.

1) Like mathematical expressions, regular expression can be broken down into
pieces simple enough to understand and then combined to make more
complicated expressions. For instance, once you understand what expressions
"x" and "y" do, you should have no trouble understanding what "(x)|(y)"
does.

2) I wrote a smallish Windows Forms application to help me test out my
regular expressions. This saved my sanity and that of my colleagues. The
current issue of MSDN magazine mentions a tool called "Regulator" which
seems to do similar things. They give http://royo.is-a-geek.com/regulator as
the URL.

3) One of the big problems I had in parsing HTML was in handling all of the
attributes a particular tag might have. Although true regex gurus might have
been able to do it better, I wound up creating regular expressions which
could handle any combination of any of the attributes, and which would store
the value of the attribute in a named match whose name was based on the
attribute name. In fact, this happened so frequently that I created a method
which would take a tag name and an array of attribute names, and which would
return a regular expression to take care of all the possibilities. This was
not fun until I got it to work, believe me!

4) If any of this HTML was created by hand, you will find a fair number of
typos which a browser just happens not to complain about, but which your
regular expressions won't like. You'll want to have error processing which
will clearly point out the error so that it can be fixed.

Good luck. Unfortunately, I don't own the code I wrote, or I'd contribute
it.

mikeb · Jun 4, 2004

John said:
....snip...

Now, you can't do this if you can't modify the source files. Instead, you
will have to parse the .htm files. So, I think you should do two things. Go
ahead with the "iframe trick" that I mentioned earlier, but also you will
have to read the .htm files into a string and use regular expressions to
search them for the title and meta tags you need.

This can seem like a daunting challenge, especially if you don't have much
experience with regular expressions. Here are a few tips.

An alternative to using regex's might be Chris Lovett's SgmlReader
class. I'm not very experienced with it, but the general idea is that
you can run an HTML document through it and it'll spit out equivalent
XHTML that can me manipulated using XmlDocument methods.

You can find the title and meta tags easily, modify them, then send that
XHTML document down to the browser.

SgmlReader:

http://www.gotdotnet.com/Community/...mpleGuid=B90FDDCE-E60D-43F8-A5C4-C3BD760564BC

Kevin Spencer · Jun 7, 2004

Here's a link to an awesome freeware ustility for building Regular
Expressions. It's called "Regex Coach" and it enables you to develop and
test Regular Expressions easily:

http://www.weitz.de/regex-coach/

--
HTH,
Kevin Spencer
..Net Developer
Microsoft MVP
Big things are made up
of lots of little things.

Page Source Code In Memory??

Chad A. Beckner

John Saunders

Chad A. Beckner

Chad A. Beckner

John Saunders

Chad A. Beckner

Kevin Spencer

bruce barker

Chad A. Beckner

Chad A. Beckner

Chad A. Beckner

Kevin Spencer

Chad A. Beckner

John Saunders

mikeb

Kevin Spencer