Problem Creating Regex Expression

S

Sean

I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.

I have the following (dummy) data:

<td>Name:</td> <td>Kherie Kali</td>

If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"

I indeed get a match.

The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

and I didn't get a thing.

Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.

Thank you,

-Sean
 
S

Sergey Zyuzin

I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.

I have the following (dummy) data:

<td>Name:</td>  <td>Kherie Kali</td>

If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"

I indeed get a match.

The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>

and I didn't get a thing.

Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.

Thank you,

-Sean

Hi, Sean

"([a-zA-Z_$][a-zA-Z0-9_$]*)" will match any letter, underscore or '$'
character followed by zero or more letters, digits, underscores, '$'
chars.

It seems you don't take into account space in the middle of "Kherie
Kali".
If you write more specific requirements I could write a RegEx

Thanks,
Sergey
 
K

Kevin Spencer

The following will work capture all content inside <td></td> tags:

(?<=td>)(.*?)(?=<)

The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.

--
HTH,

Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP
 
S

Sean

The following will work capture all content inside <td></td> tags:

(?<=td>)(.*?)(?=<)

The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.

--
HTH,

Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP




I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td>  <td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -

- Show quoted text -

Kevin and Sergey,

Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.

Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.

Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"

-Sean
 
S

Sergey Zyuzin

The following will work capture all content inside <td></td> tags:

The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.
Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP
I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td>  <td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -
- Show quoted text -

Kevin and Sergey,

Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.

Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.

Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"

-Sean- Hide quoted text -

- Show quoted text -

You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
If you don't have specific requirements than probably you could use
expression
similar to what Kevin suggests or something like "<td>Name:</td>
\s*<td>(.*?)</td>"

Thanks,
Sergey
 
S

Sean

The following will work capture all content inside <td></td> tags:
(?<=td>)(.*?)(?=<)
The first part is a positive look-ahead, indicating that a match must be
preceded by the character sequence "td>" (non-capturing). The second part
indicates any character 0 or more times with a lazy quantifier, meaning that
it will capture as few times as possible. The third part is a positive
look-ahead, indicating that the match must be followed by a "<" character.
Since there are no "<" characters in the actual tag's content, this stops
the match at the end of the tag.
--
HTH,
Kevin Spencer
Chicken Salad Surgeon
Microsoft MVP

I am finally taking the time to get to know regex, but it seems I have
taken a bit of a tumble.
I have the following (dummy) data:
<td>Name:</td>  <td>Kherie Kali</td>
If I use this expression: "<td>Name:</td>\s*<td>Kherie Kali</td>"
I indeed get a match.
The next step I took is to get this without knowing the name in
advance. I then used the following expression: <td>Name:</td>
\s*<td>([a-zA-Z_$][a-zA-Z0-9_$]*)</td>
and I didn't get a thing.
Shouldn't the "([a-zA-Z_$][a-zA-Z0-9_$]*)" part match any word or
symbol and then the * make it multiple words or symbols? Any light
that could be shed on this situation is appreciated. I don't
necessarily want the answer to my quandry, but insight as to what I am
doing wrong.
Thank you,
-Sean- Hide quoted text -
- Show quoted text -
Kevin and Sergey,
Kevin: Thank you for the explanation! This find the Name: Tag, but
won't find the dummy person's actual name. I think I have to add in a
place for spaces like sergey said.
Sergey: I have been fittling with the last sequence by attempting to
add spaces, but I still can't get it to work for some reason. There
really is no specific requirements, I'm just trying to pull that name
out.
Thank you both for the explanations, it was very helpful, I'm just
still having problems understanding why it won't work. The latest one
I used was "([a-zA-Z_$][a-zA-Z0-9_$]\s*)"
-Sean- Hide quoted text -
- Show quoted text -

You should put \s inside brackets "([a-zA-Z_$][a-zA-Z0-9_$\s]*)"
If you don't have specific requirements than probably you could use
expression
similar to what Kevin suggests or something like "<td>Name:</td>
\s*<td>(.*?)</td>"

Thanks,
Sergey- Hide quoted text -

- Show quoted text -

Perfect, thank you both!

I used your last suggestion Sergey, and did the following: "<td>Name:</
td>\s*<td>(?<a0>(.*?))</td>" which correctly matched "Kherie Kali" and
put it into the a0 group.

I appreciate both of your help!

-Sean
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top