Apostrophes and dashes appear as squares

Charles Maddox · Nov 8, 2003

In certain circumstances, apostrophes and dashes are
displayed as small squares when using Internet Explorer 6.
(My OS is Windows XP.) This does not happen when viewing
the same information using my Netscape browser. This was
discovered when searching for news on MSNBC.

I went to MSN.com, clicked on "more" next to MSN Top
Headlines. In the "Search MSNBC" box I typed the
word "can't" then clicked on "Go." The first set of
results listed under "Web Directory Sites" have no
squares. The squares appear in the list of results
under "Web Pages." (When the web pages are opened, the
error does not appear.)

Can you help me to fix this problem?

H Leboeuf · Nov 9, 2003

Try to change your View/Encoding settings.

Chqrles · Nov 9, 2003

Dear Henri:

Thank you for your suggested solution. Each of the
feasible settings were tried, and yet the problem was not
resolved. Do you have any other suggestions?

Thank you again for your help.

Sincerely,

Charles Maddox

H Leboeuf · Nov 10, 2003

Can you look at the Source code when you get the wrong character(s)?

Part of the text may be generated by Java or Some Script.

I would then check My Java and Script support.

Also any popup stopper or security software that could cause this?

Guest · Nov 10, 2003

Before trying your further suggestion I want to convey
something learned this morning. I contacted the computing
help desk at James Madison University to have them try the
same thing on their Windows XP. The results were the same.
So, at least the problem was not with my computer alone.

In the meantime, I learned about and experimented with the
encoding languages in IE6 and Netscape 7 browser. There
seemed to be a difference in the Western ISO (that is the
default language in Netscape, the one that read the
information correctly).

I then searched the FAQs in Microsoft Support using the
keyword "Western ISO). In reading the results I found
that this may actually be a problem with IE6. The
information suggested that we wait for SP2 (whenever that
will be released). There was a quick fix that requires
changing the registery. Something that I should perhaps
not try unless I am certain that what I would be doing
will work.

In will, however follow your further suggestions and let
you know of the results just in case what I learned is not
correct.

Sincerely,

Charles

Charles · Nov 11, 2003

Dear Henri:

I can look at the Source code, but I have no idea how to
read it for the information you would like to have. The
squares turn up there along with words repeated from the
site. The remainder is a new and mysterious world for me.

Could the problem be a corrupt font?

I was going to try to refresh the fonts used by IE6, but
decided not to try as there are questions about this task
that would need answering before I could try, such as,
which fonts to refresh and where does one find the
uncorrupted replacement fonts. I could not locate them on
the Windows Installation CD.

I have considered uninstalling and reinstalling IE6, but
doing so presents other difficulties as it seems that the
program cannot simply be uninstalled.

Guest · Nov 11, 2003

Since posting this problem for the first time, I have
learned much more about the issue. A new posting was
submitted earlier today that explains much better what is
happening.

The new text is as follows:

In the process of a search for a news item I noticed that
some of the search results where displaying squares in
stead of apostrophes, dashes and perhaps other marks. The
URL where the problem was first noticed is <MSNBC.com> (or
precisely <http://www.msnbc.com/news/default.asp?0ct=-
34o> )

In order to investigate this further, I typed in a word in
the Search MSNBC box that had an apostrophe and clicked
on "GO." The word was "can't".

The first 37 results are listed under the subcategory Web
Directory Sites. All of these were read correctly. The
results from 38 on are listed under the subcategory Web
Pages. It is in these that the problem appears. All of
the quotation marks, apostrophes and dashes appear as
squares. For example, here is number 38: 'Thank God the
Iraqis can't aim'.

I noticed that IE6 selects Unicode (UTF-8) to read the
site. In an effort to learn more about the situation I
changed IE6's encoding and fiddled with the font settings
in Internet Options. When changing the encoding to other
settings such as to Western ISO, the squares were changed
into other characters, such as the capitol letter "A", but
the marks were still not resolved into their proper form.

I called the computing help desk at James Madison
University (where I teach). The helpers looked at the
same list using IE6 on both Windows 2000 and on Windows
XP. They reported the same results, squares in place of
the proper marks. Their assessment is that the problem is
with IE6's encoding capabilities and not my personal
computer.

It should be noted that the Netscape 7 browser has no
problem reading the same list. Just as IE6, it selects
Unicode (UTF-8), which I assume means that this is
correct.

Since first discovering this problem, I spent some time
trying searches on other Web sites. I have found the
problem elswhere, but not with the frequency as with the
search on the MSNBC site.

Is there a solution that will correct this situation? If
there is, I will relay it to James Madison University's
computing center.

Thanks for your kind attention to this problem.

Charles

Robert Aldwinckle · Nov 14, 2003

Their assessment is that the problem is with IE6's encoding capabilities

and not my personal computer.

I think the problem is with the page.
The coding doesn't make sense to me as UTF-8.

The <head> section contains this:

<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=utf-8" />

Note though that we don't know what characterset the source file
was saved with.

In the TIF the source of those boxes we see in hex is C291 and C292

According to:

< http://www.cl.cam.ac.uk/~mgk25/unicode.html >

in UTF-8 we are supposed to interpret C291 in bits as
110 xxxxx 10 xxxxxx
so C291 would be interpreted as
110 00010 10 010001

so this apparently was trying to encode 0091 (000 10010001)

The problem I think with that is that that code position is undefined
at least in the Unicode character set that my CharMap knows about.

To see this chose Font: Arial which I think is the one that is being
used by the page. Turn on the Advanced view. Choose Character set
Unicode. Group by: All.

Hovering my mouse over the tilde (~) I see "U+7e: Tilde".
Beside that there is a blank which seems to be undefined
and beside that there is "U+00A1: Inverted Exclamation mark"
Hence my conclusion that "U+0091" is undefined.

Switching to Group by: Unicode Subrange and picking General Punctuation
I can see a "U+2018: Left Single Quotation Mark"

Let's try encoding that. Using the same reference page we see
this pattern seems appropriate:
1110 xxxx 10 xxxxxx 10 xxxxxx
1110 0010 10 000000 10 011000
in hex that's E2 80 98

I was able to fix the display problem by using that 3 byte replacement
for the 2 bytes that we were given.

What is interesting is that switching the character set that I saved the
file under to "Latin 9 (ISO) - Codepage 28605" those box characters
change to 91 and 92 (only--without any lead-in). So perhaps there is
another character set which I don't have which would allow those two
characters to be saved the way they were. In that case I suspect that they
should be showing some other CHARSET= parameter than utf-8 in that
META statement they are using.

Note: this is my first foray into trying to understand Unicode.
You may want to look for a second opinion in a programming newsgroup.
This newsgroup is mainly for IE6 operation and recovery issues.
There isn't very much programming expertise here.
Unfortunately there are no newsgroups specifically aimed at
programming for IE6 so when this type of question comes up
I usually point out the ie5.programming and ie55.programming
set of newsgroups. I haven't looked at any of these to see how
active they are or if there has been any previous discussion about
this topic.

Hmm... here's a reply which may help explain the issue in more detail
and which seems to be consistent with my conjectures.

<
http://groups.google.com/[email protected]&rnum=8 >

Thanks very much for posting such an interesting problem
with such complete documentation. It was a rare pleasure
to work on it.

Robert Aldwinckle

Apostrophes appear as squares	1	Mar 2, 2005
Strange characters appear e.g ? and squares	1	Apr 19, 2004
Word doc problem when viewed as Webpage	3	Mar 3, 2007
Characters appear as squares	4	Oct 3, 2005
Unable to Personalize/Customize Web Pages	3	Jan 5, 2004
Apostrophes and Quotations corrupted when using Word as editor	1	Feb 11, 2004
text hotlinking	1	Jan 20, 2004
Internet Explorer problem	2	Nov 20, 2003

Apostrophes and dashes appear as squares

Charles Maddox

H Leboeuf

Chqrles

H Leboeuf

Guest

Charles

Guest

Robert Aldwinckle

Ask a Question

Similar Threads