How can I make sure that a text file is saved in a given codepage?

B

Bent M

Currently I have a problem: I need to make sure that a
text-file is saved in a given codepage (in this case:
ISO8859-5, which is Russian). If I install MUI and type
the Russian characters, everything looks ok, but when I
then take a hex-editor and look at the file, I can see
that it is NOT saved in that codepage. Furthermore, when I
transfer the text-file to my target (which runs Linux),
the only thing I see is garbage.
Please give me some hints to how to save the textfile in
the correct codepage.

Best regards,
Bent
 
G

Gary Smith

Bent M said:
Currently I have a problem: I need to make sure that a
text-file is saved in a given codepage (in this case:
ISO8859-5, which is Russian). If I install MUI and type
the Russian characters, everything looks ok, but when I
then take a hex-editor and look at the file, I can see
that it is NOT saved in that codepage. Furthermore, when I
transfer the text-file to my target (which runs Linux),
the only thing I see is garbage.
Please give me some hints to how to save the textfile in
the correct codepage.

Text files don't have code pages. Code pages are a function of the
display process. ISO 8859-5 is indistinguishable from ISO 8859-1 at the
file level. It's the interpretation of the 256 possible byte values that
makes the difference between one code page and another. If you want your
Linux system to display your text as Russian, you'll have to set the
approriate terminal display parameter on the Linus system, but doing that
will make every text file display that way until you change the parameter
again.
 
G

Guest

Hi Gary,

thanks for your reply, but.... my problem is NOT to get the Linux target to display the Russian characters, but to get W2k pro to save the file in the correct codepage. You are right in saying that ISO 8859-1 is indistinguishable from the ISO 8859-5 but I know that my Linux system requires 8859-5. How do I then make sure that my w2k saves in this codepage? First: all characters in 8859-5 are not in 8859-1 and secondly they are rearranged. When I setup my w2k to Russian, then I see the correct characters in wordpad, but when I save the file as ms-dos formatted and use a hex-editor to view the file, then I can compare the hex-value to the character written in wordpad. The hex-value written in the file, crosslinked with my ascii-table of codepage 8859-5, does not correspond to the character seen (if I assume that W2k uses CP 8859-5). Do you know where I can get informations about which codepage the files are saved in? Eventually.. how to change them?

Best regards,
Bent
 
J

John Thow

Hi Gary,

thanks for your reply, but.... my problem is NOT to get the Linux target to display the Russian characters, but to get W2k pro to save the file in the correct codepage. You are right in saying that ISO 8859-1 is indistinguishable from the ISO 8859-5 but I know that my Linux system requires 8859-5. How do I then make sure that my w2k saves in this codepage? First: all characters in 8859-5 are not in 8859-1 and secondly they are rearranged. When I setup my w2k to Russian, then I see the correct characters in wordpad, but when I save the file as ms-dos formatted and use a hex-editor to view the file, then I can compare the hex-value to the character written in wordpad. The hex-value written in the file, crosslinked with my ascii-table of codepage 8859-5, does not correspond to the character seen (if I assume that W2k uses CP 8859-5). Do you know where I can get informations about which codepage the files are saved in? Eventually.. how to change them?

Best regards,
Bent

This is not a codepage issue. It's about the internal values used by
different operating systems to implement extensions to the ASCII character
set. (Ie characters above 127.)

Values from 128 to 255 are _not_ part of the ASCII standard and are
assigned different sets of characters by different computer manufacturers
and software developers.

If you create a Russian character text file on your Linux system and import it
to windoze, I bet that will look like rubbish too.

What you seem to need is an editor that will run on windoze but output files
suitable for reading on Linux. I'm really not sure where you might find one
of those. However http://ex-code.com/gtk2edit/ points at an editor which
seems to support multiple (human) languages. Its example page shows an HTML
edit incorporating English & Hebrew and says it 'supports many encodings'.

Might help?

--
John Thow
an optimist is a guy/ that has never had/ much experience -
certain maxims of archie; Don Marquis.

To e-mail me, replace the DOTs in the Reply-To: address with dots!
 
P

Paul Gorodyansky

Hi,

Bent said:
Currently I have a problem: I need to make sure that a
text-file is saved in a given codepage (in this case:
ISO8859-5, which is Russian). If I install MUI and type
the Russian characters, everything looks ok, but when I
then take a hex-editor and look at the file, I can see
that it is NOT saved in that codepage. Furthermore, when I
transfer the text-file to my target (which runs Linux),
the only thing I see is garbage.
Please give me some hints to how to save the textfile in
the correct codepage.

I do it all the time with all codepages - Russian, Polish,
Japanese (I work as I18n software engineer), so I can answer
all your questions. It's all explained on my instructional site
"Cyrillic (Russian): instructions for Windows and Internet":
http://ourworld.compuserve.com/homepages/PaulGor/ but shortly:

1) First, to work with Russian _texts_ you do NOT any MUI,
do NOT need to '...setup ... w2k to Russian'.
One can read/write in Russian in editors, browsers, e-mail
on a pure English, German, or even Japanese Windows 2000.

Obviously, you need to _activate_ Cyrillic support there -
in Control Panel/Regional Options/General mark "Cyrillic"
in the "Language Settings for the system" frame - but
it's not a change to OS itself such as MUI, etc.,it just
_adds_ Cyrillic support without making your Windows 'Russian'
in any way.

2) Second, MS Windows does NOT support iso-8859-5 for Russian, that
is when you see 'Russian' or 'Cyrillic' in fonts name or keyboard
mode it's a different encoding - "Windows(Cyrillic), code page 1251".

3) So to get iso-8859-5 text files you have 2 options:

a) simplest one - if you have Word 2000 or newer. It allows you
to type - on pure English system - any text - Russian or German
or Japanese - and then lets you _specify_ in what encoding you
want the document be places as Plain Text to .txt file.
All encodings are supported by Word, including iso-8859-5.
Same goes for opening a .TXT file in Word when you know that
the encoding of that file is different from System Code Page.

How to do it and why it works? Don't want to write the steps
here - may be you do NOT have Word 2000 or newer - so if you do:
- find "Unicode and Cyrillic: issues and solutions" section
on my site
- inside find Chapter 2 "Copy/Paste; Word and .TXT"
- see then "Word and .TXT" part of that page.

Â) Another way is to prepare a 'native' for MS Windows Russian
text - in Cyrillic(Windows-1251) encoding and then *convert* it
to iso-8859-5. That is, find a non-Unicode Plain Text editor
(i.e. it's not Wordpad nor Word nor Notepad)
that lets you choose a Russian font (f.e. Courier New(Cyrillic))
and type in Russian. I use http://www.UltraEdit.com.
Type your text and then you can convert it to iso-8859-5
either in Clipboard or using 'source file' - 'target file'
mode. Both are available in CVT32 converter - see
"Encoding Conversion" section of my site.
 
G

Guest

Hi Paul

Thanx for your help

That was the solution....

Thank you VERY much..

Sometimes the solution is right under your nose.. but you cannot see it ;-

Best regards
Bent
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top