Unicode (or a subset of it) in DOS window?

Guest · Mar 30, 2005

This question may concern all NT platforms (ie NT4, 2K, 2K3, XP).

As you know, if we type something other than pure ASCII in a text file, like:
Ã§a a Ã©tÃ© trÃ¨s joli
in notepad, save the file and display it under DOS, those "extended"
characters (eg Ã§) aren't displayed correctly because DOS use CP850 (or CP437
for pure English version). The same problem happens the other way round, ie
when the text file is created under DOS (eg from logging output) and opened
in Windows environment.

Even though I don't expect DOS to support Unicode, I'm wondering if there is
any hidden, undocumented feature of DOS which makes it accept ISO-8859-1 (a
subset of Unicode) instead of CP850 or CP437 or anything else.

Detlev Dreyer · Mar 30, 2005

TFS said:
As you know, if we type something other than pure ASCII in a text
file, like: ça a été très joli in notepad, save the file and display
it under DOS, those "extended" characters (eg ç) aren't displayed
correctly because DOS use CP850 (or CP437 for pure English version).

That's not really a matter of the code page, this is caused by the
difference between the ANSI (Windows) and ASCII (DOS) charset.

Even though I don't expect DOS to support Unicode, I'm wondering if
there is any hidden, undocumented feature of DOS which makes it accept
ISO-8859-1 (a subset of Unicode) instead of CP850 or CP437 or anything
else.

Console applications (Windows programs w/o GUI) are usually able to
handle Unicode rather than DOS applications. You need to convert the
files from ANSI to ASCII or vice versa. Such converter came with Win3.0
and Borland IDEs, however, you should find them in the internet as well
(Google).

Guest · Mar 30, 2005

Charset IS codepage (different calling for the same thing). Windows uses
CP1252 (which contains more characters than ISO-8859-1 and thus isn't a
subset of Unicode) while DOS uses CP850/437.

And then, strictly speaking, ASCII uses 7 bits. If DOS only accepts
ASCII, it won't even display strange characters.

And then, the problem couldn't be solved by converting ANSI to ASCII (!!!).

Sorry, but your answer is no good.

Detlev Dreyer · Mar 30, 2005

TFS said:
And then, the problem couldn't be solved by converting ANSI to
ASCII (!!!).

Well, before giving an advice, I always try first. Tested under DOS
CP850, it was absolutely no problem to read your sample text
"ça a été très joli" using EDIT.COM after saving with Notepad (ANSI)
and converting to ASCII. It worked vice versa (tested with the German
characters ÄäÜüÖüß) as well after converting from ASCII to ANSI.

Sorry, but your answer is no good.

You won't get any better advice, most likely.

Guest · Mar 30, 2005

I don't want to frustrate you, but you have to accept that your wording
was wrong.

First, your conversion isn't from ANSI to *ASCII*, but ANSI to CP850
because ASCII only contains 7-bits data. On the other hand, ANSI means
nothing. If you're using say a PC in Central Europe, the so-called ANSI is
actually CP1250 instead of CP1252 for Western Europe. And then your
so-called ASCII also changes according to system's locale. It could be CP850
(Western Europe multi-language), CP437 (pure American English with block
drawings), etc, etc. So, a conversion from ANSI to ASCII can't help, but a
conversion from CP1252 to CP850 does help, but this isn't the subject of my
question.

And then, my original question was asked to know how to change the way
DOS behaves, but not change our ways to do things to suit DOS!

Detlev Dreyer said:
Well, before giving an advice, I always try first. Tested under DOS
CP850, it was absolutely no problem to read your sample text
"Ã§a a Ã©tÃ© trÃ¨s joli" using EDIT.COM after saving with Notepad (ANSI)
and converting to ASCII. It worked vice versa (tested with the German
characters Ã„Ã¤ÃœÃ¼Ã–Ã¼ÃŸ) as well after converting from ASCII to ANSI.

You won't get any better advice, most likely.

I wait for somebody else.

Detlev Dreyer · Mar 30, 2005

TFS said:
First, your conversion isn't from ANSI to *ASCII*, but ANSI to CP850
because ASCII only contains 7-bits data.

Yes and no. DOS (and the DOS emulation) uses the extended ASCII charset
(High ASCII). http://encyclopedia.laborlawtalk.com/Extended_ASCII

It could be CP850 (Western Europe multi-language), CP437 (pure
American English with block drawings),

That's simply wrong. Excerpt from above and verified by the charset:

| DOS computers built for the American market, for example, used
| codepage 437, which included accented characters needed for French,
| German, and a few other European languages, as well as some graphical
| line-drawing characters.

I wait for somebody else.

Good luck!

How DOS-like is the CMD prompt in WinXP, will it allow replacement of windows-critical files?	3	Aug 23, 2010
How do I run a 500 K exe file under Hiren's CD, or USB under DOS?	14	Nov 3, 2010
how find if a file is unicode or not	4	Jun 25, 2008
How to use VESA videomodes 800x600 and laters in DOS box inside Windows XP ? (my monitor says: Out o	0	Feb 10, 2004
How to display unicode characters using a special codepage.	1	Sep 19, 2006
How to read a Unicode data saved as ASCII in notepad file as txt ?	3	Aug 8, 2007
Defrag.exe in Windows Vista Dos window STILL running!	3	Nov 15, 2009
How to use VESA videomodes 800x600 and laters in DOS box inside Windows XP ? (my monitor says: "Out	2	Feb 10, 2004

Unicode (or a subset of it) in DOS window?

Guest

Detlev Dreyer

Guest

Detlev Dreyer

Guest

Detlev Dreyer

Ask a Question

Similar Threads