How do I convert from iso-8859-1 to utf-8 (bom)?

D

Du Dang

I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.

===============================

FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);

b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);

===============================

When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.

Does anyone know what I'm doing wrong, or know a better way of doing this?

Thanks,

Du
 
J

Jon Skeet [C# MVP]

Du Dang said:
I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.

===============================

FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);

For a start, you should always use the return value of Stream.Read.
b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);

===============================

When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.

Does anyone know what I'm doing wrong, or know a better way of doing this?

Well, to start with, why are you bothering with UTF-8 at all if you're
returning a string? Just decode it from ISO-8859-1. Converting it to
UTF-8 and back won't have any effect.

Is your file *definitely* in ISO-8859-1, and not in, say,
Encoding.Default? Which characters are being mis-converted? Could you
email me a sample file?
 
D

Du Dang

Thanks Jon,

please check your email.

Jon Skeet said:
Du Dang said:
I tried to convert a block of text from iso-8859-1 to utf-8 but all I got
after the convertion is gibberish.

===============================

FileStream fs = File.Open("text.txt", FileMode.Open, FileAccess.Read);
byte[] b = new byte[length];
fs.Read(b, 0, length);

For a start, you should always use the return value of Stream.Read.
b = Encoding.Convert(Encoding.GetEncoding(28591), Encoding.UTF8, b);
return System.Text.Encoding.UTF8.GetString(b);

===============================

When I skipped the convertion line ( b = Encoding.Convert ....) the text is
legible but still in iso-8859-9 encoding.

Does anyone know what I'm doing wrong, or know a better way of doing
this?

Well, to start with, why are you bothering with UTF-8 at all if you're
returning a string? Just decode it from ISO-8859-1. Converting it to
UTF-8 and back won't have any effect.

Is your file *definitely* in ISO-8859-1, and not in, say,
Encoding.Default? Which characters are being mis-converted? Could you
email me a sample file?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top