Regular Expression for all attributes in HTML tag

G

Gert Conradie

I need to list all the key/value pairs of and HTML tag. I already have
the complete tag as an text string.

For example: (Worst case scenario where standards was not followed in
the past)
<myTag key1="aaa" key2 = "bbb" key3='ccc' key4=444 key5= 555
key5="Please click here" >

I end up with two versions, each with its own flaw and I cant seems to
merge them:
A) Allow for no " or ' around values but fail when there is a space in
the attribute value:
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*[",']?(?<Value>[\w]*)[",']?

B)Allow for space in value of attribute but miss those without " or '
around the value.
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*[",']?(?<Value>[\w\s]*)[",']

This is my merge attempt that find all the key's and integer values,
but not the text values:
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*(?<Value>((?<!["'])[\d]+(?!["']))|((?<=["']?)[\w\s]*(?=["']?)))

Thanks in advance - help here would be much appreciated.

Gert
 
J

Jani Järvinen [MVP]

Hi Gert,
I need to list all the key/value pairs of and HTML tag. I already have
the complete tag as an text string.
(Worst case scenario where standards was not followed in the past)

Since your parser needs to be aware of all kinds of ways to write
attributes, I think trying to write an all-around regular expression quickly
becomes a steep uphill climb.

I would probably forget about regular expressions altogether, and instead
write a simple text parser of my own. I think that would be simpler.

Just a thought, I'm not saying you can't do it with regex.

--
Regards,

Mr. Jani Järvinen
C# MVP
Helsinki, Finland
(e-mail address removed)
http://www.saunalahti.fi/janij/
 
K

Kevin Spencer

This ought to do it for you:

(\w+)=(?:["']?([^"'>=]*)["']?)

Translation: a sequence of one or more word characters (letters and/or
digits), followed by an equals sign, followed by 0 or 1 single quote or
double quote, followed by any number of any character that is not a single
quote or a double quote or a right angle bracket, followed by 0 or 1 single
or double quotes.

--
HTH,

Kevin Spencer
Microsoft MVP
Chicken Salad Surgery

It takes a tough man to make a tender chicken salad.
 
G

Gert Conradie

Hi Kevin & other
(\w+)=(?:["']?([^"'>=]*)["']?)

This one misses the "key4=444" in my example but surely make my attempt
look like a goods train compared. :) I will use it as a starting point
to try again.

Yani & Winista, I will try the parser and let you know the results...

Thanks, gert




Kevin said:
This ought to do it for you:

(\w+)=(?:["']?([^"'>=]*)["']?)

Translation: a sequence of one or more word characters (letters and/or
digits), followed by an equals sign, followed by 0 or 1 single quote or
double quote, followed by any number of any character that is not a single
quote or a double quote or a right angle bracket, followed by 0 or 1 single
or double quotes.

--
HTH,

Kevin Spencer
Microsoft MVP
Chicken Salad Surgery

It takes a tough man to make a tender chicken salad.


Gert Conradie said:
I need to list all the key/value pairs of and HTML tag. I already have
the complete tag as an text string.

For example: (Worst case scenario where standards was not followed in
the past)
<myTag key1="aaa" key2 = "bbb" key3='ccc' key4=444 key5= 555
key5="Please click here" >

I end up with two versions, each with its own flaw and I cant seems to
merge them:
A) Allow for no " or ' around values but fail when there is a space in
the attribute value:
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*[",']?(?<Value>[\w]*)[",']?

B)Allow for space in value of attribute but miss those without " or '
around the value.
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*[",']?(?<Value>[\w\s]*)[",']

This is my merge attempt that find all the key's and integer values,
but not the text values:
\b(?<Keyword>[^>\s][\w]+)[\s]*=[\s]*(?<Value>((?<!["'])[\d]+(?!["']))|((?<=["']?)[\w\s]*(?=["']?)))

Thanks in advance - help here would be much appreciated.

Gert
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top