HttpWebResponse.CharacterSet

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Why does HttpWebResponse.CharacterSet always return ISO-8859-1? I am
accessing a Chinese Web site (http://cn.yahoo.com) -- View Source in IE shows
the character set to be "gb2312", but the HttpWebResponse shows it to be
ISO-8859-1. Websites like www.cnn.com also show the character set to
ISO-8859-1

Here is the code snippet:

string pageAddress = "http://cn.yahoo.com";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Console.WriteLine(resp.CharacterSet);
 
Jason said:
Why does HttpWebResponse.CharacterSet always return ISO-8859-1? I am
accessing a Chinese Web site (http://cn.yahoo.com) -- View Source in
IE shows the character set to be "gb2312", but the HttpWebResponse
shows it to be ISO-8859-1. Websites like www.cnn.com also show the
character set to ISO-8859-1

Here is the code snippet:

string pageAddress = "http://cn.yahoo.com";
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(pageAddress);
HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
Console.WriteLine(resp.CharacterSet);

HttpWebResponse needs to rely on HTTP headers to obtain such
information. Unfortunately, Google isn't really playing nice here and
uses HTML META tags to specify character encoding...

<meta http-equiv="content-type" content="text/html; charset=gb2312">

.... whereas the Content-Type header only says:

Content-Type: text/html

Cheers,
 
Back
Top