Convert DOS Cyrillic text to Unicode

N

Nikolay Petrov

I have read this and other info in Unicode topic
My question is how can I do it in VB. I need the code.
 
J

Jon Skeet [C# MVP]

Nikolay Petrov said:
I have read this and other info in Unicode topic
My question is how can I do it in VB. I need the code.

I provide some C# code to read a file in one encoding and write it in
another. It's very simple code - it should be easy to understand and
rewrite in VB.NET. The important thing is really just the creation of
the StreamReader with the right encoding.
 
N

Nikolay Petrov

My problem is that I don't read file.
The DOS Cyrillic text is pasted in a textbox, and should apear in another.
That's all.
I don't have anyting in Binary.
 
C

Cor Ligthert

Hi Jon,

I pointed Nikolay in the language.VB newsgroup on you and Jay B, who has
answered a message in language.VB however as well not complete enough for
Nikolay. Jay B will probably not be active on this newsgroup before 13:00
GMT.

I am curious as well, what is the right encoding you think about for this
Cyrillic problem?

Nikolas wrote in the language VB group that he past it from a notepad
so I guess UTF16?

:)

Cor

....
 
J

Jon Skeet [C# MVP]

Nikolay Petrov said:
My problem is that I don't read file.
The DOS Cyrillic text is pasted in a textbox, and should apear in another.
That's all.
I don't have anyting in Binary.

If it's in a text box, you should have it as Unicode text already. All
strings are in Unicode in .NET.
 
J

Jon Skeet [C# MVP]

Cor Ligthert said:
I pointed Nikolay in the language.VB newsgroup on you and Jay B, who has
answered a message in language.VB however as well not complete enough for
Nikolay. Jay B will probably not be active on this newsgroup before 13:00
GMT.

I am curious as well, what is the right encoding you think about for this
Cyrillic problem?

Not sure - but it sounds like it won't actually be a problem, as if
he's got the data in notepad to start with, there's no encoding change
required - cut and paste should sort everything out.
Nikolas wrote in the language VB group that he past it from a notepad
so I guess UTF16?

No way - DOS precedes UTF16 by a long time!
 
N

Nikolay Petrov

The user pasts text from text files, which contain DOS Cyrillic characters.
When they are pasted in text box or even in the Notepad windows they look
like garbage.
I am not sure, can I post a file here as attachment, so you can see it?
 
J

Jon Skeet [C# MVP]

Nikolay Petrov said:
The user pasts text from text files, which contain DOS Cyrillic characters.

What does he have the text open in? It sounds like the existing app is
probably not putting it into the clipboard in Unicode :(
When they are pasted in text box or even in the Notepad windows they look
like garbage.

Ah - I thought you meant he had it working in notepad to start with.
I am not sure, can I post a file here as attachment, so you can see it?

It's probably best if you email it to me.
 
J

Jon Skeet [C# MVP]

Cor Ligthert said:
I am also interested in this question, so why not mail to the
newsgroup?

It's more that depending on the way of attaching the file, it might get
converted during the attachment process - that's less likely to happen
in a mail message.
 
C

Cor Ligthert

It's more that depending on the way of attaching the file, it might get
converted during the attachment process - that's less likely to happen
in a mail message.
So I wait the results and than you can maybe send it to me when all is
clear?

Cor
 
J

Jon Skeet [C# MVP]

So I wait the results and than you can maybe send it to me when all is
clear?

Yup, sure. I suspect there's nothing particularly interesting about the
file though - it's just I should be able to work out what encoding it's
in, so that if the OP *does* want to read it directly (rather than with
c'n'p) he should be able to.
 
N

Nikolay Petrov

Ok guys, I have mailed it to both of you

I'll also but some of this DOS text here, case anyone else is interested

???<?'? ?? 6 ?. 2004??".
 
N

Nikolay Petrov

New problem ;-(
Text is encoded partialy.
All calital letters are fine, and some of the lower, but not all.
What may coused this?
 
J

Jon Skeet [C# MVP]

Nikolay Petrov said:
New problem ;-(
Text is encoded partialy.

At what stage?
All calital letters are fine, and some of the lower, but not all.
What may coused this?

No idea - are you saying the original files are corrupt, basically?
 
P

Paul Gorodyansky

Hi,

Nikolay Petrov said:
New problem ;-(
Text is encoded partialy.
All calital letters are fine, and some of the lower, but not all.
What may coused this?

You did not answer Jon's question, but it was critical -
in what _program_ your user opens a text file with DOS Cyrillic?

I am working with Cyrillic encodings since 1995 :) so I dealt
with most of them, including CP-866.

The easiest way in your scenario would be:

Open that DOS Cyrillic .txt file in MS Word 2000 or newer,
choosing "Cyrillic (DOS)" encoding in the process:
http://ourworld.compuserve.com/homepages/PaulGor/cp_e.htm#open

Now your user should see normal Russian text - in Unicode already
converted by Word and can paste it itno your text box.

Otherwise, if you try to open a file that contains text in
DOS Cyrillic encoding in some regular MS Windows text editor,
you *will* see just gibberish - editor expects one of _Windows_
encodings, not a DOS one.

There are many more ways to get it done, say converter programs that
make "Cyrillic(Windows), 1251" text from your DOS Cyrillic text,
I18n-aware editors that - as Word - offer you to specify explicitely
what is the encoding of your file - such as
http://www.esperanto.mv.ru/UniRed/ENG/
etc., etc.

--
Regards,
Paul Gorodyansky
"Cyrillic (Russian): instructions for Windows and Internet":
http://RusWin.net
Russian On-screen Keyboard: http://Kbd.RusWin.net
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top