Convert the text from one encoding to another

P

Pavils Jurjans

Hello,

I am looking for a way to convert the given text from one character encoding
to another. The source comes in as a byte array, and it must be converted to
unicode, given any kind of character encoding that could possibly be
installed on the system. Then, some work is done with the unicode text, and
afterwards it should be encoded to another character encoding, usually
different than the source encoding.

The encodings could be both single or multi-byte. The source text is email
bodies or html pages, therefor wide range of supported character sets is
needed.

I was expecting I could do something like this: (pseudo code)

TextStream myStream = new TextStream();
TextStream.Write(sourceByteArray, "windows-1257"); // there could be any
other encoding than windows-1257, this is just for example purposes
string myUnicodeString = TextStream.Text;
// do stuff with the unicode string
targetByteArray = new ByteArray();
TextStream.Read(targetByteArray, "utf-8");

At least, I used to do something like this with ADO.Stream object in the
days before .NET.

Could someone, please, provide guidance, how I can achive this?

Thanks,

Pavils
 
J

Jon Skeet [C# MVP]

Pavils Jurjans said:
I am looking for a way to convert the given text from one character encoding
to another. The source comes in as a byte array, and it must be converted to
unicode, given any kind of character encoding that could possibly be
installed on the system. Then, some work is done with the unicode text, and
afterwards it should be encoded to another character encoding, usually
different than the source encoding.

Okay, so you need to convert a byte array to text: do that either with
a TextReader with the appropriate encoding or just Encoding.GetString.

Then do your work.

Then write using a TextWriter (eg StreamWriter) with the appropriate
encoding, or Encoding.GetBytes.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
E

Eric Cadwell

Try the encoding class for converting:
string data = System.Text.Encoding.Unicode.GetString(byte[], int start, int
length);
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(data);

-Eric
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top