Web Scrapper project using vb.net

R

Radith

Hi all,

I am about to commence a web scrapper project using vb.net + sql server. I
am currently a uni. student with no commerical experience (btw I am doing
this as a commercial project). Firstly, the aim of this scrapper is to:
Scrape content off a website and store them in a SQL Server database. The
content is dynamic. However, before the scraping; some HTML parsing must be
done. I.E. I have to iterate through all possible registration numbers
(feeding the reg. #'s via my app.) through most likely the use of a HTML
parser.

So, I just want to know if HTML parsing would be the answer for that sort of
iteration that's required and also require some help on the scraping
component. Can someone please point me in the right direction.

All forms of help appreciated. Thanks in advance for all your help...

Cheers,
Radith
 
C

Cor Ligthert[MVP]

Radith,

I assume that many of us were busy with .Net have making once a webscrapper.
Be aware that it is more difficult than in past. This is just because the
webpages are not as easy so scrap anymore as in past.

It seems an easy chalenge a and really has fun to do. A simple HTML page is
easy, you just can use the WebBrowser class and MSHTML (the DOM related
code) to do that. To download you can use the Webclient (part of HTTP) even
assynchonasly to download and you can use the HTTP classes to get
information about the websites. In fact is that the basic where Google is
build around.

However a webpage is not as easy anymore as it was let say 5 years ago.
Large parts are in SQL databases, all kind of activeX modules are widely
used etc. etc. In my idea by going around those (which are not standard to
retrieve as the HTML parts and images are in a webbrowser with a right
click) your scrapping becomes stealing.

Therefore think as you want to go further than the basics like I wrote, you
are probably in a lot of countries (by instance the EU and the US)
committing a crime.

Just my thoughts about your question.

Cor
 
A

alex

Hi all,

I am about to commence a web scrapper project using vb.net + sql server. I
am currently a uni. student with no commerical experience (btw I am doing
this as a commercial project). Firstly, the aim of this scrapper is to:
Scrape content off a website and store them in a SQL Server database. The
content is dynamic. However, before the scraping; some HTML parsing must be
done. I.E. I have to iterate through all possible registration numbers
(feeding the reg. #'s via my app.) through most likely the use of a HTML
parser.

So, I just want to know if HTML parsing would be the answer for that sort of
iteration that's required and also require some help on the scraping
component. Can someone please point me in the right direction.

All forms of help appreciated. Thanks in advance for all your help...

Cheers,
Radith

You can look at SWExplorerAutomation (SWEA) from http://webius.net.
SWEA creates an automation API for any Web application which uses HTML
and DHTML and works with Microsoft Internet Explorer. The Web
application becomes programmatically accessible from any .NET
language.

SWEA API provides access to Web application controls and content. The
API is generated using SWEA Visual Designer. SWEA records, replays
test scripts and generates C# or VB.NET script code.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top