Regex to retain only the HTML body

  • Thread starter Thread starter Karch
  • Start date Start date
K

Karch

If you run this:

string result = "<html><head></head><body>The body</body></html>";
result = retainBody.Replace(result, "$1");


With the following Regex:

private static readonly Regex retainBody = new
Regex(@"<\s*body[^>]*>(.*)<[\s/]*body[^>]*>", RegexOptions.Compiled |
RegexOptions.IgnoreCase | RegexOptions.Singleline);


You get this as the return:

<html><head></head>The body</html>

I want this instead:

The body
 
Karch said:
If you run this:

string result = "<html><head></head><body>The body</body></html>";
result = retainBody.Replace(result, "$1");


With the following Regex:

private static readonly Regex retainBody = new
Regex(@"<\s*body[^>]*>(.*)<[\s/]*body[^>]*>", RegexOptions.Compiled |
RegexOptions.IgnoreCase | RegexOptions.Singleline);


You get this as the return:

<html><head></head>The body</html>

I want this instead:

The body
Try this

string result = "<html><head></head><body>The body</body></html>";
Regex reg = new
Regex(@"<\s*body[^>]*>(?<body>(.*))<[\s/]*body[^>]*>");
Match body=reg.Match(result);
Console.WriteLine(body.Groups["body"].ToString());
 
Back
Top