Convert the text from one encoding to another

  • Thread starter Thread starter Pavils Jurjans
  • Start date Start date
P

Pavils Jurjans

Hello,

I am looking for a way to convert the given text from one character encoding
to another. The source comes in as a byte array, and it must be converted to
unicode, given any kind of character encoding that could possibly be
installed on the system. Then, some work is done with the unicode text, and
afterwards it should be encoded to another character encoding, usually
different than the source encoding.

The encodings could be both single or multi-byte. The source text is email
bodies or html pages, therefor wide range of supported character sets is
needed.

I was expecting I could do something like this: (pseudo code)

TextStream myStream = new TextStream();
TextStream.Write(sourceByteArray, "windows-1257"); // there could be any
other encoding than windows-1257, this is just for example purposes
string myUnicodeString = TextStream.Text;
// do stuff with the unicode string
targetByteArray = new ByteArray();
TextStream.Read(targetByteArray, "utf-8");

At least, I used to do something like this with ADO.Stream object in the
days before .NET.

Could someone, please, provide guidance, how I can achive this?

Thanks,

Pavils
 
Pavils Jurjans said:
I am looking for a way to convert the given text from one character encoding
to another. The source comes in as a byte array, and it must be converted to
unicode, given any kind of character encoding that could possibly be
installed on the system. Then, some work is done with the unicode text, and
afterwards it should be encoded to another character encoding, usually
different than the source encoding.

Okay, so you need to convert a byte array to text: do that either with
a TextReader with the appropriate encoding or just Encoding.GetString.

Then do your work.

Then write using a TextWriter (eg StreamWriter) with the appropriate
encoding, or Encoding.GetBytes.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
Try the encoding class for converting:
string data = System.Text.Encoding.Unicode.GetString(byte[], int start, int
length);
byte[] bytes = System.Text.Encoding.Unicode.GetBytes(data);

-Eric
 
Back
Top