Internet access and geting source - possible?

A

Alex Meleta

Hi ofiras,

As concept:

1. Gather a source of an index page.
WebRequest webRequest = WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?article=Retrieve-data-from-a-web-page");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new StreamReader(webResponse.GetResponseStream());
string htmlContext = streamReader.ReadToEnd();

2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.

Kind Regards, Alex Meleta
[TechBlog] http://devkids.blogspot.com



o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I get
o> the source code of the page to a string variable?
o> For example:
o> There is a website www.hahaha.com, and it has some pages that the
o> urls
o> are:
o> www.hahaha.com/1,
o> www.hahaha.com/2,
o> www.hahaha.com/6.
o> I know the URL www.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir
 
O

ofiras

Hi ofiras,

As concept:

1. Gather a source of an index page.
WebRequest webRequest = WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?article=Retrieve-data-fr...");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new StreamReader(webResponse.GetResponseStream());
string htmlContext = streamReader.ReadToEnd();

2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I get
o> the source code of the page to a string variable?
o> For example:
o> There is a websitewww.hahaha.com, and it has some pages that the
o> urls
o> are:
o>www.hahaha.com/1,
o>www.hahaha.com/2,
o>www.hahaha.com/6.
o> I know the URLwww.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir.


Thanks a lot, it was very helpful.
Well, I have some more questions now...
Is there a way to search for words in a string and get their place?
Is there a way of getting a string with two places of chars I have in
a string? (From ? to ?)
Thanks again,
Ofir.
 
A

Alex Meleta

Hi ofiras,

o> Is there a way to search for words in a string and get their place?...
That's for you:
http://www.peachpit.com/articles/article.asp?p=31938&seqNum=12&rl=1

PS. if you want to find some strings in the whole text by matching then use
regex
http://geekswithblogs.net/royhwa/archive/2005/12/11/62816.aspx

Kind Regards, Alex Meleta
[TechBlog] http://devkids.blogspot.com



Hi ofiras,

As concept:

1. Gather a source of an index page.

WebRequest webRequest =
WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?artic
le=Retrieve-data-fr...");

WebResponse webResponse = webRequest.GetResponse();

StreamReader streamReader = new
StreamReader(webResponse.GetResponseStream());

string htmlContext = streamReader.ReadToEnd();

2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.
Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I
get
o> the source code of the page to a string variable?
o> For example:
o> There is a websitewww.hahaha.com, and it has some pages that the
o> urls
o> are:
o>www.hahaha.com/1,
o>www.hahaha.com/2,
o>www.hahaha.com/6.
o> I know the URLwww.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir.
o> Thanks a lot, it was very helpful.
o> Well, I have some more questions now...
o> Is there a way to search for words in a string and get their place?
o> Is there a way of getting a string with two places of chars I have in
o> a string? (From ? to ?)
o> Thanks again,
o> Ofir.
 
?

=?ISO-8859-1?Q?G=F6ran_Andersson?=

ofiras said:
Hi ofiras,

As concept:

1. Gather a source of an index page.
WebRequest webRequest = WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?article=Retrieve-data-fr...");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new StreamReader(webResponse.GetResponseStream());
string htmlContext = streamReader.ReadToEnd();

2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I get
o> the source code of the page to a string variable?
o> For example:
o> There is a websitewww.hahaha.com, and it has some pages that the
o> urls
o> are:
o>www.hahaha.com/1,
o>www.hahaha.com/2,
o>www.hahaha.com/6.
o> I know the URLwww.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir.


Thanks a lot, it was very helpful.
Well, I have some more questions now...
Is there a way to search for words in a string and get their place?
IndexOf

Is there a way of getting a string with two places of chars I have in
a string? (From ? to ?)
Substring

Thanks again,
Ofir.
 
O

ofiras

Hi ofiras,

o> Is there a way to search for words in a string and get their place?...
That's for you:http://www.peachpit.com/articles/article.asp?p=31938&seqNum=12&rl=1

PS. if you want to find some strings in the whole text by matching then use
regexhttp://geekswithblogs.net/royhwa/archive/2005/12/11/62816.aspx

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

Hi ofiras,
As concept:
1. Gather a source of an index page.
WebRequest webRequest =
WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?artic
le=Retrieve-data-fr...");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new
StreamReader(webResponse.GetResponseStream());
string htmlContext = streamReader.ReadToEnd();
2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.
Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I
get
o> the source code of the page to a string variable?
o> For example:
o> There is a websitewww.hahaha.com, and it has some pages that the
o> urls
o> are:
o>www.hahaha.com/1,
o>www.hahaha.com/2,
o>www.hahaha.com/6.
o> I know the URLwww.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir.

o> Thanks a lot, it was very helpful.
o> Well, I have some more questions now...
o> Is there a way to search for words in a string and get their place?
o> Is there a way of getting a string with two places of chars I have in
o> a string? (From ? to ?)
o> Thanks again,
o> Ofir.


Wow, thank you both so much... I never know of those methods, and I
needed them so much...
In PHP I always go to http://www.php.net/ and find the method I want,
but in C# I don't know where to look...
And... I need another method, if there is a method to erase a place in
the string... like Substring, just the opposite...
Substring gives the place I told him, and I need something which gives
the rest...
Thank you so much,
Ofir.
 
O

ofiras

Hi ofiras,
o> Is there a way to search for words in a string and get their place?...
That's for you:http://www.peachpit.com/articles/article.asp?p=31938&seqNum=12&rl=1
PS. if you want to find some strings in the whole text by matching then use
regexhttp://geekswithblogs.net/royhwa/archive/2005/12/11/62816.aspx
Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o>
Hi ofiras,
As concept:
1. Gather a source of an index page.
WebRequest webRequest =
WebRequest.Create("http://aspnetlibrary.com/articledetails.aspx?artic
le=Retrieve-data-fr...");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new
StreamReader(webResponse.GetResponseStream());
string htmlContext = streamReader.ReadToEnd();
2. Match <a href> tags to find a list of links.
3. Filter only this internal site links.
4. Repeat step 1 for each internal link to find another pages.
Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o> Hii everyone,
o> Dose someone knows if and how I can access internet urls? Can I
get
o> the source code of the page to a string variable?
o> For example:
o> There is a websitewww.hahaha.com, and it has some pages that the
o> urls
o> are:
o>www.hahaha.com/1,
o>www.hahaha.com/2,
o>www.hahaha.com/6.
o> I know the URLwww.hahaha.com, and I want to get all dose pages URL
o> while I don't know them or how much of them I have...
o> Is it possible? If yes, how do I do it?
o> Please help,
o> Ofir.
o> Thanks a lot, it was very helpful.
o> Well, I have some more questions now...
o> Is there a way to search for words in a string and get their place?
o> Is there a way of getting a string with two places of chars I have in
o> a string? (From ? to ?)
o> Thanks again,
o> Ofir.

Wow, thank you both so much... I never know of those methods, and I
needed them so much...
In PHP I always go tohttp://www.php.net/and find the method I want,
but in C# I don't know where to look...
And... I need another method, if there is a method to erase a place in
the string... like Substring, just the opposite...
Substring gives the place I told him, and I need something which gives
the rest...
Thank you so much,
Ofir.


Never mind... I did my own method to do it...
So... My lest question (I hope) - for some urls, it throws an
exception when I try to get response, and it says "The remote server
returned an error: (403) Forbidden.", and in the details it is written
"Connection: close".
What is the problem, and is there a way out of it?
Thanks,
Ofir.
 
A

Alex Meleta

Hi ofiras,
I try to get response, and it says "The remote server returned an error:
(403) Forbidden.", and in the details it is written "Connection: close".

The server understood the request but refused to fulfil (may be access to
the page is forbidden or request is incorrect (quesry format or something
else)). Server should describe the reason for the refusal in the entity.

Some info about status codes:
http://support.microsoft.com/kb/318380/en-us
http://htmlhelp.com/tools/valet/http.txt

Kind Regards, Alex Meleta
[TechBlog] http://devkids.blogspot.com



o> The remote server
o>
 
O

ofiras

Hi ofiras,
I try to get response, and it says "The remote server returned an error:

(403) Forbidden.", and in the details it is written "Connection: close".

The server understood the request but refused to fulfil (may be access to
the page is forbidden or request is incorrect (quesry format or something
else)). Server should describe the reason for the refusal in the entity.

Some info about status codes:http://support.microsoft.com/kb/318380/en-ushttp://htmlhelp.com/tools/valet/http.txt

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

o> The remote server
o>


So, isn't there any way of getting the source code from it?
For example:

WebRequest webRequest = WebRequest.Create("http://en.wikipedia.org/
wiki/Ofir");
WebResponse webResponse = webRequest.GetResponse();
StreamReader streamReader = new
StreamReader(webResponse.GetResponseStream());
string html = streamReader.ReadToEnd();

This site gives an exception, but this: http://en.wikipedia.org/wiki/A
doesn't.
Is there a way of getting the source code?

Please help,
Ofir.
 
A

Alex Meleta

Hi ofiras,

Envelop the code by "try ... catch... " to catch WebException exception.
I need exception.Message, exception.Response...Stream and exception.Status
to analyse the exception.

Kind Regards, Alex Meleta
[TechBlog] http://devkids.blogspot.com



Hi ofiras,
I try to get response, and it says "The remote server returned an
error:
(403) Forbidden.", and in the details it is written "Connection:
close".

The server understood the request but refused to fulfil (may be
access to the page is forbidden or request is incorrect (quesry
format or something else)). Server should describe the reason for the
refusal in the entity.

Some info about status
codes:http://support.microsoft.com/kb/318380/en-ushttp://htmlhelp.com
/tools/valet/http.txt

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o> The remote server
o>
o> So, isn't there any way of getting the source code from it? For
o> example:
o>
o> WebRequest webRequest = WebRequest.Create("http://en.wikipedia.org/
o> wiki/Ofir");
o> WebResponse webResponse = webRequest.GetResponse();
o> StreamReader streamReader = new
o> StreamReader(webResponse.GetResponseStream());
o> string html = streamReader.ReadToEnd();
o> This site gives an exception, but this:
o> http://en.wikipedia.org/wiki/A
o> doesn't.
o> Is there a way of getting the source code?
o> Please help,
o> Ofir.
 
O

ofiras

Hi ofiras,

Envelop the code by "try ... catch... " to catch WebException exception.
I need exception.Message, exception.Response...Stream and exception.Status
to analyse the exception.

Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

Hi ofiras,
I try to get response, and it says "The remote server returned an
error:
(403) Forbidden.", and in the details it is written "Connection:
close".
The server understood the request but refused to fulfil (may be
access to the page is forbidden or request is incorrect (quesry
format or something else)). Server should describe the reason for the
refusal in the entity.
Some info about status
codes:http://support.microsoft.com/kb/318380/en-ushttp://htmlhelp.com
/tools/valet/http.txt
Kind Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com
o> The remote server
o>

o> So, isn't there any way of getting the source code from it? For
o> example:
o>
o> WebRequest webRequest = WebRequest.Create("http://en.wikipedia.org/
o> wiki/Ofir");
o> WebResponse webResponse = webRequest.GetResponse();
o> StreamReader streamReader = new
o> StreamReader(webResponse.GetResponseStream());
o> string html = streamReader.ReadToEnd();
o> This site gives an exception, but this:
o>http://en.wikipedia.org/wiki/A
o> doesn't.
o> Is there a way of getting the source code?
o> Please help,
o> Ofir.


The information is:

Message: "The remote server returned an error: (403) Forbidden."

Response: System.Net.HttpWebResponse
Response Headers:
"X-Cache: MISS from sq18.wikimedia.org,MISS from
knsq2.knams.wikimedia.org,MISS from knsq2.knams.wikimedia.org
X-Cache-Lookup: MISS from sq18.wikimedia.org:3128,MISS from
knsq2.knams.wikimedia.org:3128,MISS from knsq2.knams.wikimedia.org:80
Connection: close
Content-Length: 35
Content-Type: text/html
Date: Sat, 07 Jul 2007 19:30:07 GMT
Server: Apache
Via: 1.0 sq18.wikimedia.org:3128 (squid/2.6.STABLE12), 1.0
knsq2.knams.wikimedia.org:3128 (squid/2.6.STABLE12), 1.0
knsq2.knams.wikimedia.org:80 (squid/2.6.STABLE12)
X-Powered-By: PHP/5.1.4"

Status: System.Net.WebExceptionStatus.ProtocolError

Stream: Didn't find it...

StackTrack: "at System.Net.HttpWebRequest.GetResponse()\r\n at
wikidic.Form1..ctor() in E:\\c#\\wikidic\\wikidic\\Form1.cs:line 24\r
\n at wikidic.Program.Main() in E:\\c#\\wikidic\\wikidic\
\Program.cs:line 17\r\n at
System.AppDomain.nExecuteAssembly(Assembly assembly, String[] args)\r
\n at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence
assemblySecurity, String[] args)\r\n at
Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()\r
\n at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
\r\n at System.Threading.ExecutionContext.Run(ExecutionContext
executionContext, ContextCallback callback, Object state)\r\n at
System.Threading.ThreadHelper.ThreadStart()"

Thanks for helping,
Ofir.
 
A

Alex Meleta

Hi ofiras,

Stream it's a exception.Response...GetResponseStream() to get error message
provided by the server. Sure there is a key.

However, try to change your code to:

HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://en.wikipedia.org/wiki/Ofir");
webRequest.UserAgent = "Mozilla/4.0";

(HttpWebRequest gets you access to some header values which are pre-defined
on WebRequest and are not allowed to change [in WebRequest]). Many servers
sometimes need some extra headers to process your request (for statistic
and such) and you should provide it)

Hopefully, it helps

Regards, Alex Meleta
[TechBlog] http://devkids.blogspot.com



o> callback, Object state)\r\n at
o> System.Threading.ThreadHelper.ThreadStart()"
 
O

ofiras

Hi ofiras,

Stream it's a exception.Response...GetResponseStream() to get error message
provided by the server. Sure there is a key.

However, try to change your code to:

HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("http://en.wikipedia.org/wiki/Ofir");
webRequest.UserAgent = "Mozilla/4.0";

(HttpWebRequest gets you access to some header values which are pre-defined
on WebRequest and are not allowed to change [in WebRequest]). Many servers
sometimes need some extra headers to process your request (for statistic
and such) and you should provide it)

Hopefully, it helps

Regards, Alex Meleta
[TechBlog]http://devkids.blogspot.com

o> callback, Object state)\r\n at
o> System.Threading.ThreadHelper.ThreadStart()"


It worked, thank you so much.
I was stuck because of that, and you solved it...
Till I had a good idea for a program...
Thanks a lot,
Ofir.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

How can I access procss? 3
Polygon and line collision detection 7
Using CDO and CreateMHTMLBody 0
Unsolicited Emails 12
View > Source 8
Meet Mycroft: An Open Source Virtual Assistant 3
Windows 7 Internet Access 3
Accessing a remote path? 4

Top