Is there a way to open a unicode file in Notepad with the correct script

A

Andrew McLaren

Randem said:
I need to find a way to tell notepad to open a unicode japanese file and use
the japanese script so that the letters show correctly. Notepad cannot do
this own it's own.

Notepad uses a set of hueristics to determine whether a file is Unicode,
or some single-byte character set ... in other words, it tries to guess.

Canonically, Unicode files begin with the two byte "Byte-Ordering Mark",
0xFF 0xEF (or vice versoa for Little-Endian systems). Pretty well any
text file which begines with a BOM, will be opened by Notepad as Unicode.

After that, it may come down to Notepad's default font for Unicode
detection. On a English-language system, the default Notepad font
(Lucida Console?) does not contain the CJK range of Unicode chars. Try
setting the default font to a specifically Unicode font such as Arial
Unicode MS, or Lucida Sans Unicode, and try again. If glyphs for the
Unicode characters are present in the font set, then the is likely to
display correctly. If you open a file with Unicode chars which are not
in teh current font, then it cannot dispay correctly, whether it is
correctly detected as Unicode or not.

Hope it helps,

Andrew
 
A

Andrew McLaren

Randem said:
That I already knew... but the question remains...

If "the question remains" then you probably haven't described your
question precisely enough. I'm not going to keep lobbing bits of
information at you, trying to guess what you already know, and what you
don't know.

But, for the record ...

To reduce it to a single API, Notepad on XP uses the Win32
IstextUnicode() API. As a professional developer, you'll be aware that
this API has a reputation for being a bit hit-and-miss. This has been
discussed extensively in Microsoft developer Blogs for several years;
such as:

"Some files come up strange in Notepad"
http://blogs.msdn.com/oldnewthing/archive/2004/03/24/95235.aspx

"The Notepad file encoding problem, redux"
http://blogs.msdn.com/oldnewthing/archive/2007/04/17/2158334.aspx

and

"Why I don't like the IsTextUnicode API"
http://blogs.msdn.com/michkap/archive/2005/01/30/363308.aspx

"The Notepad encoding detection issues keep coming up"
http://blogs.msdn.com/michkap/archive/2007/04/22/2239345.aspx


You can read all about IstextUnicode() here:
http://msdn.microsoft.com/en-us/library/dd318672(VS.85).aspx

In Vista SP1 and later, Notepad uses it's own algorithms which are a bit
smarter than IstextUnicode(), to try to detect whether text is Unicode
or not. This reduces the frequency of mis-judgements, but it there is
still an element of chance and guessing.

If your question revolves around the interactive use of Notepad as an
end-user, then to open a file as Unicode is easy. Go to File menu, Open,
and select teh appropriate encoding from the drop down list in the File
Open dialogue: ANSI, Unicode, Unicode Big Endian, or UTF-8. Then select
the file you want to open. Again, CJK chars will only display correctly
if Notepad is using a font which contains the necessary CJK glyphs.

If this does not answer your question then please explain IN DETAIL what
you are trying to do, what the configuration is (eg what langugae
Windows are you using; do you have configured in Regional Settings etc)
and what results you are getting; preferably so we can reproduce and
debug the problem here.

Thanks

Andrew
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top