Regex help

J

JKJ

I need help with a regular expression that will pull the
title and all the meta tags held in the head section of an
HTML file (including the head tags). I want to exclude
everything else such as link tags, script tags, etc. I
have a pretty big process that pulls this stuff now using
simple Regex expressions, but I know I'm not using the
Regex's to their fullest. . .
 
M

Michael Lippert

I need help with a regular expression that will pull the
title and all the meta tags held in the head section of an
HTML file (including the head tags). I want to exclude
everything else such as link tags, script tags, etc. I
have a pretty big process that pulls this stuff now using
simple Regex expressions, but I know I'm not using the
Regex's to their fullest. . .

I think you need to explain what you want a little more. What exactly is
the input to the regular expression, and what are you expecting as the
output? Perhaps a simple example and or a sample of what you're doing now?
 
D

David Elliott

Here is an example of what I have done with an explaination

public static MatchCollection HtmlMatchCollection(string input, string matchstr)
{
string expression;

expression = HttpUtilities.FullHtmlExpression(matchstr);
MatchCollection mc = Regex.Matches(input, expression,
RegexOptions.Multiline |
RegexOptions.IgnoreCase |
RegexOptions.IgnorePatternWhitespace);

return mc;
}

public static string FullHtmlExpression(string str)
{
/* Example
* <td colspan=2><img src="/images/b.gif" alt="" width="1" height="25"></td>
*
* data1 ==> colSpan=2
* data2 ==> <img src="/images/b.gif" alt="" width="1" height="25">
*
*/
string expression =
"<" + str + // (?# Match the character sequence <"str")
"(?<data1>.*?)" + // (?# Capture the characters between <"str" and > )
">" + // (?# Match the > character )
"(?<data2>.*?)" + // (?# Capture the characters between <"str"> and </"str">)
"</" + str + ">" ; // (?# Match the closing </"str">)

return expression;
}

Cheers,
Dave
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Regex: How to find a "<" less than Symbol 3
Regex woes 8
Regex 1
html and regex 11
Need help in Regex. 9
Using Regular Expressions 6
Problem with named groups using Regex 4
Regex expression to get href value in c# 1

Top