About Regular Expressions

N

norton

Hi,

I am learning Regular Expression and currently i am trying to capture
information from web page.
I wrote the following code to capture the ID as well as the Title

Dim regex = New Regex( _
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle"">(?<Title>.*)\</a>",
_
RegexOptions.IgnoreCase _
Or RegexOptions.Multiline _
Or RegexOptions.Compiled _
)

It works fine
Now i am trying to get the post date by using the following regex
"s=""postdetails"">(?<Date>.*)\<br />"

It works fine too, but when i combine the two reg ex together (That is
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle"">(?<Title>.*)\</a>.*s=""
postdetails"">(?<Date>.*)\<br />"", it cannot got the correct data, could
anyone know how can i fix this? whatz wrong on my regex?

Thx a lot
Regards,
Norton


(Here is the source text for reg exp to take place, the ID i want to capture
is 245090, title is "Can someone help me import a database" and the date is
Wed Dec 08, 2004 6:31 pm)

<tr>
<td class="row1" align="center" valign="middle" width="20"><img
src="templates/subSilver/images/folder.gif" width="19" height="18" alt="No
new posts" title="No new posts" /></td>
<td class="row1" width="100%"><span class="topictitle"><a
href="viewtopic.php?t=245090" class="topictitle">Can someone help me import
a database</a></span><span class="gensmall"><br />
</span></td>
<td class="row2" align="center" valign="middle"><span
class="postdetails">2</span></td>
<td class="row3" align="center" valign="middle"><span class="name"><a
href="profile.php?mode=viewprofile&amp;u=154279">savethesquirrels</a></span>
</td>
<td class="row2" align="center" valign="middle"><span
class="postdetails">126</span></td>
<td class="row3Right" align="center" valign="middle"
nowrap="nowrap"><span class="postdetails">Wed Dec 08, 2004 6:31 pm<br /><a
href="profile.php?mode=viewprofile&amp;u=154279">savethesquirrels</a> <a
href="viewtopic.php?p=1344942#1344942"><img
src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest
post" title="View latest post" border="0" /></a></span></td>
</tr>
 
V

Victor Urnyshev [MSFT]

Hi Norton,

This seems to be a bug in RegEx implementation. The bug is passed to our
Devs. I cannot promise anything at this point, but we will try to address
it in the future releases.

--
Regards,
Victor Urnyshev

This posting is provided "AS IS" with no warranties, and confers no rights.

--------------------
| X-Tomcat-ID: 294753620
| References: <[email protected]>
| MIME-Version: 1.0
| Content-Type: text/plain
| Content-Transfer-Encoding: 7bit
| From: (e-mail address removed) (Victor Urnyshev [MSFT])
| Organization: Microsoft
| Date: Mon, 13 Dec 2004 14:08:46 GMT
| Subject: RE: About Regular Expressions
| X-Tomcat-NG: microsoft.public.dotnet.languages.vb
| Message-ID: <wQ#[email protected]>
| Newsgroups: microsoft.public.dotnet.languages.vb
| Lines: 86
| Path: cpmsftngxa10.phx.gbl
| Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.languages.vb:248290
| NNTP-Posting-Host: TOMCATIMPORT1 10.201.218.122
|
| Hi Norton,
|
| I'll look more into the issue. I don't have an answer yet what is wrong,
| but I can reproduce your issue.
| So far I can advise only to build more specific regular expressions
| minimizing ".*" usage. I suspect this is the root of the problem. As soon
| as I get more information, I'll get back to you.
|
| --
| Regards,
| Victor Urnyshev
|
| This posting is provided "AS IS" with no warranties, and confers no
rights.
|
| --------------------
| | From: "norton" <[email protected]>
| | Subject: About Regular Expressions
| | Date: Fri, 10 Dec 2004 01:15:23 +0800
| | Lines: 56
| | X-Priority: 3
| | X-MSMail-Priority: Normal
| | X-Newsreader: Microsoft Outlook Express 6.00.3790.181
| | X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.181
| | Message-ID: <[email protected]>
| | Newsgroups: microsoft.public.dotnet.languages.vb
| | NNTP-Posting-Host: 210006240035.ctinets.com 210.6.240.35
| | Path:
|
cpmsftngxa10.phx.gbl!TK2MSFTNGXA01.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP15
| .phx.gbl
| | Xref: cpmsftngxa10.phx.gbl microsoft.public.dotnet.languages.vb:247781
| | X-Tomcat-NG: microsoft.public.dotnet.languages.vb
| |
| | Hi,
| |
| | I am learning Regular Expression and currently i am trying to capture
| | information from web page.
| | I wrote the following code to capture the ID as well as the Title
| |
| | Dim regex = New Regex( _
| |
| "viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle"">(?<Title>.*)\</a>",
| | _
| | RegexOptions.IgnoreCase _
| | Or RegexOptions.Multiline _
| | Or RegexOptions.Compiled _
| | )
| |
| | It works fine
| | Now i am trying to get the post date by using the following regex
| | "s=""postdetails"">(?<Date>.*)\<br />"
| |
| | It works fine too, but when i combine the two reg ex together (That is
| |
|
"viewtopic.php\?t=(?<ID>\d+)""\sclass=""topictitle"">(?<Title>.*)\</a>.*s=""
| | postdetails"">(?<Date>.*)\<br />"", it cannot got the correct data,
could
| | anyone know how can i fix this? whatz wrong on my regex?
| |
| | Thx a lot
| | Regards,
| | Norton
| |
| |
| | (Here is the source text for reg exp to take place, the ID i want to
| capture
| | is 245090, title is "Can someone help me import a database" and the
date
| is
| | Wed Dec 08, 2004 6:31 pm)
| |
| | <tr>
| | <td class="row1" align="center" valign="middle" width="20"><img
| | src="templates/subSilver/images/folder.gif" width="19" height="18"
alt="No
| | new posts" title="No new posts" /></td>
| | <td class="row1" width="100%"><span class="topictitle"><a
| | href="viewtopic.php?t=245090" class="topictitle">Can someone help me
| import
| | a database</a></span><span class="gensmall"><br />
| | </span></td>
| | <td class="row2" align="center" valign="middle"><span
| | class="postdetails">2</span></td>
| | <td class="row3" align="center" valign="middle"><span class="name"><a
| |
|
href="profile.php?mode=viewprofile&amp;u=154279">savethesquirrels</a></span>
| | </td>
| | <td class="row2" align="center" valign="middle"><span
| | class="postdetails">126</span></td>
| | <td class="row3Right" align="center" valign="middle"
| | nowrap="nowrap"><span class="postdetails">Wed Dec 08, 2004 6:31 pm<br
/><a
| | href="profile.php?mode=viewprofile&amp;u=154279">savethesquirrels</a> <a
| | href="viewtopic.php?p=1344942#1344942"><img
| | src="templates/subSilver/images/icon_latest_reply.gif" alt="View latest
| | post" title="View latest post" border="0" /></a></span></td>
| | </tr>
| |
| |
| |
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top