UTF-8 Encoding

G

Guest

During the course of development cycle I receive HTML files from designers
that use Macs and PCs, but use tools other then Visual Studio. So these files
sometimes are not UTF-8 Encoded.

I see that Visual Studio creates a globalization tag with UTF-8 as the
requestEndcoding and responseEncoding.

I have three questions regarding this:
1. Does the globalization tag convert an ANSI encoded file into UTF-8 when
it complies the ASPX and ASCX pages?
2. Is there a MS tool (or 3rd Partly) that can quickly tell me if a file is
UTF-8 encoded and batch convert a set of file to UTF-8? I have UltraEdit, but
it requires me to open each file, view the encoding, select conversation from
the menu.

Thanks.
 
S

Steven Cheng[MSFT]

Hi Jmh,

Welcome to ASPNET newsgroup.
As for the encoding for ASPX page in VS.NET/ ASP.NET RUNTIME, they'll
follow the below rules:

The strings we hardcoded in code file .cs or .vb are compiled into bytes at
compiled time, so we don't need care about them. The strings in aspx file
(ascx) are dynamically compiled into assembly at runtime, that what we need
to take care

1. When developing aspx page in VS.NET , the VS.NET ide will save the aspx
page through the default ANSI code page(setting in System Locale) by
default. Also, we can manually use the save options to change them to UTF8
or Unicode encoding.

Then in the web.config's <globalization> element, there is a fileEncoding
attribute this specify the encoding of the aspx file or other dynamic
resource(ascx...). Then, asp.net runtime will use this encoding to parse
the aspx pages. By default, we will find that <globalization> not
explicitly set fileEncoding, this means that runtime use the default ANSI
codepage of the machine(System Locale) to load aspx. So this is ok when we
develop asp.net pages and runtime on the same machine. But if we develop
pages on one box and will deploy to some other server which may have
different SYSTEM LOCALE settings. It's recommended that we explictly save
the aspx files as a certain charset(encoding) and speicfy the fileEncoding
as the same value in web.config.

2. After the asp.net runtime successfully parse the aspx file and load the
strings into memory, all the strings in .net (characeters) are represented
as utf-16 in memory (no matter what charset they're encoded in the source
file). And when asp.net is about to render the page content out to client
side. It will encode the in memory strings
using the charset(encoding) specified in the <globalization> settings 's
"responseEncoding" attribute.
And the "requestEncoding" attribute specify the charset(encoding) used to
decode the comming bytes from clientside( such as querystring, cookie...).

Both of them can be manually override by code using Request.ContentEncoding
/ Response.ContentEncoding

In addition, as for the
=================
Is there a MS tool (or 3rd Partly) that can quickly tell me if a file is
UTF-8 encoded and batch convert a set of file to UTF-8? I
================
question you mentioned, I have got any idea of any existing ones. However,
we can check whether a file is UTF-8 encoding by read the first bytes in
the file , most utf-8 encoded files will contains a three bytes BOM
(like the two byte BOM for unicode text file), see below:

#Byte Order Mark FAQ (from www.unicode.org)
http://www.websina.com/bugzero/kb/unicode-bom.html

Also, we can open a certain text file in notepad and click save as menu, if
the notepad has successfully load the file as utf-8, in the save as dialog,
the encoding will be automatically set as utf-8.
And as for batch convert files different codepage/charset, we can manually
using the .net's System.Text, Sytem.IO api to convert them as long as we
know the source and destination charset.

Just some of my understandings, if you have any other questions or ideas
,please feel free to post here.
Hope helps.

Steven Cheng
Microsoft Online Support

Get Secure! www.microsoft.com/security
(This posting is provided "AS IS", with no warranties, and confers no
rights.)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top