Store in a file a web page written in chinese

A

Antonio

Hi,
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress = "http://babelfish.altavista.com/babe...h&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
 
J

Jon Skeet [C# MVP]

Antonio said:
I want to read an html page written in chinese and store it in a file
having extension .aspx , I'm not sure where is the problem, I use the
following lines of code:

String sAddress =
"http://babelfish.altavista.com/babelfish/trurl_pagecontent?lp=en_zh
&trurl=http://www.etantonio.it/EN/index.aspx"
;

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

But the file produced didn't contain the chinese characters so, how
can I solve the problem???

Are you sure that it's returning the data in UTF-8? How are you
checking whether or not the file contained Chinese characters?

I'd look in more depth myself, but using the code above, it's
complaining that the server committed an HTTP protocol violation :(
 
A

Antonio

Hi,
I simply try to connect to the url

http://babelfish.altavista.com/babe...h&trurl=http://www.etantonio.it/EN/index.aspx

with internet explorer and this is the result where I can see that the
charset=UTF-8 and I can normally see chinese symbols :


<html><meta http-equiv="content-type" content="text/html;
charset=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!-- removed --><meta http-equiv="Content-Type" content="text/html ;
CHARSET=UTF-8"><base href="http://www.etantonio.it/EN/index.aspx">
<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<head>
<title>Etantonio</title>
<meta name="author" content="Antonio DOttavio">
<meta name="description" content="Etantonio Index">
<link href="Stili.css" rel="stylesheet" type="text/css">
</head>
<body>

<script language=JavaScript src="menu_array.js"
type=text/javascript></script>
<script language=JavaScript src="mmenu.js"
type=text/javascript></script>

<table width="750" height="430" border="0" cellpadding="0"
cellspacing="0" background="/images/EsserSpettatoriNonEstSerioElefante.jpg">
<tr>
<td valign="top">

<table width="90%" border="0" align="center" cellspacing="12">
<tr height="70" valign="top">
<td>&nbsp;</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babe...p://www.etantonio.it/EN/Universita/index.aspx"
class="testoMedioVerde">大学</a></p>
<p align="center"
class="testoPiccolissimoVerde">学士路线的笔记在工程学电子,
论文、研究方法和适当尊敬对起源村庄。
</p>
</td>
<td width="25%" rowspan="2">
<p align="center"><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Economia/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Economia/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Economia/index.aspx"
class="testoMedioVerde">经济</a> </p>
<p align="center"
class="testoPiccolissimoVerde">委员会、为财政社区的技术和仪器,
详尽阐述对您在,
在供选择变迁之间,
持续从1994
年个人经验的基地。</p></td>
<td width="25%">&nbsp;</td>
</tr>
<tr height="140" valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babe...=http://www.etantonio.it/EN/Lavoro/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...=http://www.etantonio.it/EN/Lavoro/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...=http://www.etantonio.it/EN/Lavoro/index.aspx"
class="testoMedioVerde">工作</a> </p>
<p align="center"
class="testoPiccolissimoVerde">简历,
图象证实对您,
和一些仪器和参考为工作机会查寻。
</p>
</td>
<td width="25%">
<p align="center" ><a
href="http://babelfish.altavista.com/babe...www.etantonio.it/EN/Web/GifAnimate/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...www.etantonio.it/EN/Web/GifAnimate/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...www.etantonio.it/EN/Web/GifAnimate/index.aspx"
class="testoMedioVerde">网</a> </p>
<p align="center"
class="testoPiccolissimoVerde">搜索引擎在无数GIF
赋予生命从我选择了和详尽阐述了,
随后将来网的被插入的实验。
</p>
</td>
</tr>
<tr valign="top">
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babe...l=http://www.etantonio.it/EN/Varie/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...l=http://www.etantonio.it/EN/Varie/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...l=http://www.etantonio.it/EN/Varie/index.aspx"
class="testoMedioVerde">数</a> </p>
<p align="center"
class="testoPiccolissimoVerde">巨大我的利益发现这里出气孔,
艺术, 旅行,
激情以远对我的热点表的链接。
</p>
</td>
<td width="25%"> <div align="center"></div></td>
<td width="25%"> <div align="center"></div></td>
<td width="25%">
<p align="center"><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Contatti/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Contatti/index.aspx"
class="testoMedioVerde"></a><a
href="http://babelfish.altavista.com/babe...ttp://www.etantonio.it/EN/Contatti/index.aspx"
class="testoMedioVerde">联络</a></p>
<p align="center"
class="testoPiccolissimoVerde">这里它是可能接触对我为每必要或理事会是通过编写形式或插入消息nel
论坛delle 想法的邮件。
</p>
</td>
</tr>
</table>

</td>
</tr>
</table>
<script>InserisciFooter();</script>
<br>
<a href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/EN/Universita/index.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/EN/Universita/index.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/EN/Universita/index.aspx"
class="trasparente">Universita 用 </a><a
href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/FR/Universita/index.aspx"
class="trasparente">Universita</a>
<a href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/FR/Universita/index.aspx"
class="trasparente"></a><a
href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/EN/Universita/index.aspx"
class="trasparente">英语
</a><a href="http://babelfish.altavista.com/babe.../www.etantonio.it/EN/FR/Universita/index.aspx"
class="trasparente">用法语</a>
</td>
</a>
<td>


</body>
</html>



I'm trying to read and store it in a file
having extension .aspx , the result is that many characters are not
right evaluated, I use the following lines of code:

String sAddress = "http://babelfish.altavista.com/babe...h&trurl=http://www.etantonio.it/EN/index.aspx";

WebRequest req = WebRequest.Create(sAddress);
WebResponse result = req.GetResponse();
Stream ReceiveStream = result.GetResponseStream();
StreamReader reader = new StreamReader(ReceiveStream, Encoding.UTF8 );
String sHtmlTradotto = reader.ReadToEnd();

StreamWriter writer = new StreamWriter( "prova.aspx" , false,
System.Text.Encoding.UTF8) ;

writer.Write(sHtmlTradotto);
writer.Flush();
writer.Close();

Can you help me to solve the problem???

Many Thanks in advance ...

Ing. Antonio D'Ottavio
 
G

Guest

hi jon
problem u r getting can be resolved
by updating one entry in machine.config file for unsafe headers

<httpWebRequest useUnsafeHeaderParsing="true" />
make this entry in under <Systems.net><Settings>
section of machine.config file
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top