> Their assessment is that the problem is with IE6's encoding capabilities
> and not my personal computer.
I think the problem is with the page.
The coding doesn't make sense to me as UTF-8.
The <head> section contains this:
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=utf-8" />
Note though that we don't know what characterset the source file
was saved with.
In the TIF the source of those boxes we see in hex is C291 and C292
According to:
<
http://www.cl.cam.ac.uk/~mgk25/unicode.html >
in UTF-8 we are supposed to interpret C291 in bits as
110 xxxxx 10 xxxxxx
so C291 would be interpreted as
110 00010 10 010001
so this apparently was trying to encode 0091 (000 10010001)
The problem I think with that is that that code position is undefined
at least in the Unicode character set that my CharMap knows about.
To see this chose Font: Arial which I think is the one that is being
used by the page. Turn on the Advanced view. Choose Character set
Unicode. Group by: All.
Hovering my mouse over the tilde (~) I see "U+7e: Tilde".
Beside that there is a blank which seems to be undefined
and beside that there is "U+00A1: Inverted Exclamation mark"
Hence my conclusion that "U+0091" is undefined.
Switching to Group by: Unicode Subrange and picking General Punctuation
I can see a "U+2018: Left Single Quotation Mark"
Let's try encoding that. Using the same reference page we see
this pattern seems appropriate:
1110 xxxx 10 xxxxxx 10 xxxxxx
1110 0010 10 000000 10 011000
in hex that's E2 80 98
I was able to fix the display problem by using that 3 byte replacement
for the 2 bytes that we were given.
What is interesting is that switching the character set that I saved the
file under to "Latin 9 (ISO) - Codepage 28605" those box characters
change to 91 and 92 (only--without any lead-in). So perhaps there is
another character set which I don't have which would allow those two
characters to be saved the way they were. In that case I suspect that they
should be showing some other CHARSET= parameter than utf-8 in that
META statement they are using.
Note: this is my first foray into trying to understand Unicode.
You may want to look for a second opinion in a programming newsgroup.
This newsgroup is mainly for IE6 operation and recovery issues.
There isn't very much programming expertise here.
Unfortunately there are no newsgroups specifically aimed at
programming for IE6 so when this type of question comes up
I usually point out the ie5.programming and ie55.programming
set of newsgroups. I haven't looked at any of these to see how
active they are or if there has been any previous discussion about
this topic.
Hmm... here's a reply which may help explain the issue in more detail
and which seems to be consistent with my conjectures.
<
http://groups.google.com/groups?q=un...ngen.de&rnum=8 >
Thanks very much for posting such an interesting problem
with such complete documentation. It was a rare pleasure
to work on it.
Robert Aldwinckle
---
<(E-Mail Removed)> wrote in message
news:2c1001c3a890$40e12600$(E-Mail Removed)...
> Since posting this problem for the first time, I have
> learned much more about the issue. A new posting was
> submitted earlier today that explains much better what is
> happening.
>
> The new text is as follows:
>
> In the process of a search for a news item I noticed that
> some of the search results where displaying squares in
> stead of apostrophes, dashes and perhaps other marks. The
> URL where the problem was first noticed is <MSNBC.com> (or
> precisely <http://www.msnbc.com/news/default.asp?0ct=-
> 34o> )
>
> In order to investigate this further, I typed in a word in
> the Search MSNBC box that had an apostrophe and clicked
> on "GO." The word was "can't".
>
> The first 37 results are listed under the subcategory Web
> Directory Sites. All of these were read correctly. The
> results from 38 on are listed under the subcategory Web
> Pages. It is in these that the problem appears. All of
> the quotation marks, apostrophes and dashes appear as
> squares. For example, here is number 38: 'Thank God the
> Iraqis can't aim'.
>
> I noticed that IE6 selects Unicode (UTF-8) to read the
> site. In an effort to learn more about the situation I
> changed IE6's encoding and fiddled with the font settings
> in Internet Options. When changing the encoding to other
> settings such as to Western ISO, the squares were changed
> into other characters, such as the capitol letter "A", but
> the marks were still not resolved into their proper form.
>
> I called the computing help desk at James Madison
> University (where I teach). The helpers looked at the
> same list using IE6 on both Windows 2000 and on Windows
> XP. They reported the same results, squares in place of
> the proper marks. Their assessment is that the problem is
> with IE6's encoding capabilities and not my personal
> computer.
>
> It should be noted that the Netscape 7 browser has no
> problem reading the same list. Just as IE6, it selects
> Unicode (UTF-8), which I assume means that this is
> correct.
>
> Since first discovering this problem, I spent some time
> trying searches on other Web sites. I have found the
> problem elswhere, but not with the frequency as with the
> search on the MSNBC site.
>
> Is there a solution that will correct this situation? If
> there is, I will relay it to James Madison University's
> computing center.
>
> Thanks for your kind attention to this problem.
>
> Charles
>
>
>
>
>
> >-----Original Message-----
> >Try to change your View/Encoding settings.
> >
> >
> >--
> >
> >Henri Leboeuf
> >Web page: http://www.generation.net/~hleboeuf/index.htm
> >
> >
> >
> >"Charles Maddox" <(E-Mail Removed)>
> wrote in message
> >news:007a01c3a62f$baf814c0$(E-Mail Removed)...
> >> In certain circumstances, apostrophes and dashes are
> >> displayed as small squares when using Internet Explorer
> 6.
> >> (My OS is Windows XP.) This does not happen when
> viewing
> >> the same information using my Netscape browser. This
> was
> >> discovered when searching for news on MSNBC.
> >>
> >> I went to MSN.com, clicked on "more" next to MSN Top
> >> Headlines. In the "Search MSNBC" box I typed the
> >> word "can't" then clicked on "Go." The first set of
> >> results listed under "Web Directory Sites" have no
> >> squares. The squares appear in the list of results
> >> under "Web Pages." (When the web pages are opened, the
> >> error does not appear.)
> >>
> >> Can you help me to fix this problem?
> >
> >.
> >