Regex and screen scraping

C

Chris Wertman

Hello all,

Well I have to say Im getting exicted about my app , its almost there,
I have added a button to IE and am calling the current instance of IE
and grabbing th URL out just fine. Im using the webclient to grab the
html so far so good and Im only half bald.

Now I am at the point I need to extract out a couple of fields from
the HTML itself. I have read about usin regex to do this but am a
little confusedm, maybe Ive just been staring at the screen too long.

I get this HTML returned.

<b>Binding:</b> Paperback<br> <b>Publisher:</b>

What I need to extract is the word Paperback from the above string.

Here is what I have so far, I have no Idea if its right is

Dim regex As New Regex("<b>Binding:</b>((.|\n)*?)<br> <b>Publisher:",
RegexOptions.IgnoreCase)

But uhhhhhh now what do I do with that to return just the word
Paperback ?

I have several item on the same page that need to be returned, I am a
little lost about what or how I need to read it in ,do I need to put
it into StreamReader or ......well what do I do with it then.

Chris
 
C

Cor Ligthert

Hi Chris,

When you want to do it in a Document Object Model way you can use mshtml.
You have to set a reference to it using

project->add references-> .Net -> microsoft.mshtml

Do not set an import to it, because it freezes your IDE and reference it
every time you need it.

However did you know that the newsgroup

microsoft.public.dotnet.languages.vb is much more for this kind of
questions.

I hope this helps?

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Screen scraping and regular expressions 1
Regex with quotes 6
Newbie question about Regex 8
Regex problem - any help greatfully accepted! 2
Regex woes 8
Regex Help 1
regex help 1
Regex references 3

Top