Their assessment is that the problem is with IE6's encoding capabilities
and not my personal computer.
I think the problem is with the page.
The coding doesn't make sense to me as UTF-8.
The <head> section contains this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=utf-8" />
Note though that we don't know what characterset the source file
was saved with.
In the TIF the source of those boxes we see in hex is C291 and C292
According to:
<
http://www.cl.cam.ac.uk/~mgk25/unicode.html >
in UTF-8 we are supposed to interpret C291 in bits as
110 xxxxx 10 xxxxxx
so C291 would be interpreted as
110 00010 10 010001
so this apparently was trying to encode 0091 (000 10010001)
The problem I think with that is that that code position is undefined
at least in the Unicode character set that my CharMap knows about.
To see this chose Font: Arial which I think is the one that is being
used by the page. Turn on the Advanced view. Choose Character set
Unicode. Group by: All.
Hovering my mouse over the tilde (~) I see "U+7e: Tilde".
Beside that there is a blank which seems to be undefined
and beside that there is "U+00A1: Inverted Exclamation mark"
Hence my conclusion that "U+0091" is undefined.
Switching to Group by: Unicode Subrange and picking General Punctuation
I can see a "U+2018: Left Single Quotation Mark"
Let's try encoding that. Using the same reference page we see
this pattern seems appropriate:
1110 xxxx 10 xxxxxx 10 xxxxxx
1110 0010 10 000000 10 011000
in hex that's E2 80 98
I was able to fix the display problem by using that 3 byte replacement
for the 2 bytes that we were given.
What is interesting is that switching the character set that I saved the
file under to "Latin 9 (ISO) - Codepage 28605" those box characters
change to 91 and 92 (only--without any lead-in). So perhaps there is
another character set which I don't have which would allow those two
characters to be saved the way they were. In that case I suspect that they
should be showing some other CHARSET= parameter than utf-8 in that
META statement they are using.
Note: this is my first foray into trying to understand Unicode.
You may want to look for a second opinion in a programming newsgroup.
This newsgroup is mainly for IE6 operation and recovery issues.
There isn't very much programming expertise here.
Unfortunately there are no newsgroups specifically aimed at
programming for IE6 so when this type of question comes up
I usually point out the ie5.programming and ie55.programming
set of newsgroups. I haven't looked at any of these to see how
active they are or if there has been any previous discussion about
this topic.
Hmm... here's a reply which may help explain the issue in more detail
and which seems to be consistent with my conjectures.
<
http://groups.google.com/[email protected]&rnum=8 >
Thanks very much for posting such an interesting problem
with such complete documentation. It was a rare pleasure
to work on it.
Robert Aldwinckle