Regex for HTML

  • Thread starter Thread starter TJoker .NET
  • Start date Start date
T

TJoker .NET

Hi all.
I have this database table (inherited from an legacy application) that
contains some information that I want to extract.
Basically, in one of the tables, there's a column containing a description
that starts with a NUMBER, but can be preceeded by some raw html elements.
Examples:
ex1:
<p>12 this is the first item ....
ex2:
<p>12. this is the first item ....
ex3:
<span id="my id" style="width:3" ><p>12. this is the first item ....
ex4:
12. this is the first item ....

I'm trying to extract the Number ("12" in all above examples)

The closest I got was when I tried the following regular expression pattern
:
string pattern = @"(<\w*>)*(?<digit>(\d+)).+";

It didn't match put the number in the right match group (= digit). I'm
still new to Regex.

Has anybody came accross any similar situation ?

thnks a bunch

TJ !
 
Hi all.
I have this database table (inherited from an legacy application) that
contains some information that I want to extract.
Basically, in one of the tables, there's a column containing a
description that starts with a NUMBER, but can be preceeded by some
raw html elements. Examples:
ex1:
<p>12 this is the first item ....
ex2:
<p>12. this is the first item ....
ex3:
<span id="my id" style="width:3" ><p>12. this is the first item ....
ex4:
12. this is the first item ....

I'm trying to extract the Number ("12" in all above examples)

The closest I got was when I tried the following regular expression
pattern
:
string pattern = @"(<\w*>)*(?<digit>(\d+)).+";

It didn't match put the number in the right match group (= digit).
I'm still new to Regex.

hmm I'd try the following .NET regular expression:

"(<[^>]+>)*(?<digit>\d+)[^\d]"
0 or more tags where a tag is defined as starting with '<' followed by at
least 1 character not a '>' followed by a '>'.

followed by a string consisting of all the digits (at least 1) up to but
not including the 1st non digit. This could be a problem if it is
possible for the number to be the last thing on the line. It will work if
there are always characters that follow the number.

Mike
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Back
Top