Odd problems caused by Encoding Auto-Select on by default (IE6/SP2)

A

Al Reynolds

Afternoon,

In a thread (http://tinyurl.com/5v4aa) on comp.lang.javascript,
I described a problem I was having which was rather bizarrely
solved by changing one line in my script from:
"inputbox.value = numq+ag-cw-cc;"
to:
"inputbox.value = numq+(ag)-(cw)-(cc);"

This didn't seem to make any sense, and the change was only
needed in IE6 - not in any other browser I tried. I have now
solved the mystery of why inserting the brackets removed the
problem. The browser version was:
6.0.2900.2180.xpsp_sp2_rtm.040803-2158

IE6 is, I believe, the first version of the IE browser to have
"Auto-Select" for text encoding (character set) turned on by
default. The process IE uses to determine the character set
(and its rather arbitrary nature) is rather nicely illustrated by
the three examples below, which are all short. For full effect,
make sure you have Auto-Select turned on for text encoding if
you look at any of the three web pages mentioned.

(1) http://www.ex.ac.uk/cimt/dev/oddity/plusminus-oddity-1.htm

<HTML>
<HEAD><TITLE>plus minus oddity 1</TITLE></HEAD>
<BODY>
foo+stuff-bar
</BODY>
</HTML>

This displays:
foo<oriental symbol>bar.
IE has decided that the document is Unicode (UTF-7).

(2) http://www.ex.ac.uk/cimt/dev/oddity/plusminus-oddity-2.htm

<HTML>
<HEAD><TITLE>plus minus oddity 2</TITLE></HEAD>
<BODY>
foo+stuff-bar<BR>
foo+ stuff -bar
</BODY>
</HTML>

This displays:
foo+stuff-bar
foo+ stuff -bar
IE has decided that this document is Western European (Windows).
How it has decided this is unclear to me. It contains the same first
line as example (1), but something in the second line makes it change
its mind. Perhaps it is the appearance of "stuff" without the "+"
directly in front?

(3) http://www.ex.ac.uk/cimt/dev/oddity/plusminus-oddity-3.htm

<HTML>
<HEAD><TITLE>plus minus oddity</TITLE></HEAD>
<META HTTP-EQUIV="Content-Type"
CONTENT="text/html; CHARSET=iso-8859-1">
<BODY>
foo+stuff-bar
</BODY>
</HTML>

This displays:
foo+stuff-bar
IE has correctly responded to my suggestion that this document is in
Western European (ISO) as specified in the META tag.

I'm sure that some of you will tell me that I should have always set
the character set for every HTML page I have ever written.

Anyway, I have learnt my lesson.

I can see two potential ongoing problems. Firstly, it seems odd (to
me) that the text-encoding has also been used to process the script
within the page. There will be plenty of occasions where a variable
is enclosed between a "+" and a "-", and each of these could
potentially lead to an error. Do people script in non-latin charsets?

What makes the problem worse is that the way in which IE decides
the encoding depends fairly arbitrarily on things which appear *later*
in the code and/or page. Removing a working section of code might
remove the problem, but not because there was a fault in that section
of code.

Anyway, there is an easy solution.
Make sure the text-encoding is specified on every page.

Al
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top