Unicode (or a subset of it) in DOS window?

G

Guest

This question may concern all NT platforms (ie NT4, 2K, 2K3, XP).

As you know, if we type something other than pure ASCII in a text file, like:
ça a été très joli
in notepad, save the file and display it under DOS, those "extended"
characters (eg ç) aren't displayed correctly because DOS use CP850 (or CP437
for pure English version). The same problem happens the other way round, ie
when the text file is created under DOS (eg from logging output) and opened
in Windows environment.

Even though I don't expect DOS to support Unicode, I'm wondering if there is
any hidden, undocumented feature of DOS which makes it accept ISO-8859-1 (a
subset of Unicode) instead of CP850 or CP437 or anything else.
 
D

Detlev Dreyer

TFS said:
As you know, if we type something other than pure ASCII in a text
file, like: ça a été très joli in notepad, save the file and display
it under DOS, those "extended" characters (eg ç) aren't displayed
correctly because DOS use CP850 (or CP437 for pure English version).

That's not really a matter of the code page, this is caused by the
difference between the ANSI (Windows) and ASCII (DOS) charset.
Even though I don't expect DOS to support Unicode, I'm wondering if
there is any hidden, undocumented feature of DOS which makes it accept
ISO-8859-1 (a subset of Unicode) instead of CP850 or CP437 or anything
else.

Console applications (Windows programs w/o GUI) are usually able to
handle Unicode rather than DOS applications. You need to convert the
files from ANSI to ASCII or vice versa. Such converter came with Win3.0
and Borland IDEs, however, you should find them in the internet as well
(Google).
 
G

Guest

Charset IS codepage (different calling for the same thing). Windows uses
CP1252 (which contains more characters than ISO-8859-1 and thus isn't a
subset of Unicode) while DOS uses CP850/437.

And then, strictly speaking, ASCII uses 7 bits. If DOS only accepts
ASCII, it won't even display strange characters.

And then, the problem couldn't be solved by converting ANSI to ASCII (!!!).

Sorry, but your answer is no good.
 
D

Detlev Dreyer

TFS said:
And then, the problem couldn't be solved by converting ANSI to
ASCII (!!!).

Well, before giving an advice, I always try first. Tested under DOS
CP850, it was absolutely no problem to read your sample text
"ça a été très joli" using EDIT.COM after saving with Notepad (ANSI)
and converting to ASCII. It worked vice versa (tested with the German
characters ÄäÜüÖüß) as well after converting from ASCII to ANSI.
Sorry, but your answer is no good.

You won't get any better advice, most likely.
 
G

Guest

I don't want to frustrate you, but you have to accept that your wording
was wrong.

First, your conversion isn't from ANSI to *ASCII*, but ANSI to CP850
because ASCII only contains 7-bits data. On the other hand, ANSI means
nothing. If you're using say a PC in Central Europe, the so-called ANSI is
actually CP1250 instead of CP1252 for Western Europe. And then your
so-called ASCII also changes according to system's locale. It could be CP850
(Western Europe multi-language), CP437 (pure American English with block
drawings), etc, etc. So, a conversion from ANSI to ASCII can't help, but a
conversion from CP1252 to CP850 does help, but this isn't the subject of my
question.

And then, my original question was asked to know how to change the way
DOS behaves, but not change our ways to do things to suit DOS!

Detlev Dreyer said:
Well, before giving an advice, I always try first. Tested under DOS
CP850, it was absolutely no problem to read your sample text
"ça a été très joli" using EDIT.COM after saving with Notepad (ANSI)
and converting to ASCII. It worked vice versa (tested with the German
characters ÄäÜüÖüß) as well after converting from ASCII to ANSI.


You won't get any better advice, most likely.

I wait for somebody else.
 
D

Detlev Dreyer

TFS said:
First, your conversion isn't from ANSI to *ASCII*, but ANSI to CP850
because ASCII only contains 7-bits data.

Yes and no. DOS (and the DOS emulation) uses the extended ASCII charset
(High ASCII). http://encyclopedia.laborlawtalk.com/Extended_ASCII
It could be CP850 (Western Europe multi-language), CP437 (pure
American English with block drawings),

That's simply wrong. Excerpt from above and verified by the charset:

| DOS computers built for the American market, for example, used
| codepage 437, which included accented characters needed for French,
| German, and a few other European languages, as well as some graphical
| line-drawing characters.
I wait for somebody else.

Good luck!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top