Reading web page

  • Thread starter Thread starter lorenzo
  • Start date Start date
L

lorenzo

Hi, I have a problem while reading the text of a web page.
I use the following code:

System.Net.WebClient webc= new System.Net.WebClient();
webc.Headers.Clear();

//Here I add some headers...


byte[] bytes=webc.DownloadData(url);



System.Text.UTF8Encoding decoder= new System.Text.UTF8Encoding();


string dataread=decoder.GetString(bytes);


return dataread;


The code works well with english pages, but I have some problems with
other european languages: characters like è,à,ò,ù are skipped, I can
not find them in the read data.
I guess it's caused by the encoding, may be UTF8Encoding isn't
correct...
Am I right?
Thanks,
Lorenzo
 
you should look at the Content-Type header, to see if a character set is
specified, and also the <meta http-equiv="Content-Type"> tag to determine
the character set (override the header if specified).

-- bruce (sqlwork.com)


| Hi, I have a problem while reading the text of a web page.
| I use the following code:
|
| System.Net.WebClient webc= new System.Net.WebClient();
| webc.Headers.Clear();
|
| //Here I add some headers...
|
|
| byte[] bytes=webc.DownloadData(url);
|
|
|
| System.Text.UTF8Encoding decoder= new System.Text.UTF8Encoding();
|
|
| string dataread=decoder.GetString(bytes);
|
|
| return dataread;
|
|
| The code works well with english pages, but I have some problems with
| other european languages: characters like è,à,ò,ù are skipped, I can
| not find them in the read data.
| I guess it's caused by the encoding, may be UTF8Encoding isn't
| correct...
| Am I right?
| Thanks,
| Lorenzo
 
Back
Top