converting letters to it's unicode representation

N

Nikola Skoric

What I have is a bunch of text in arabic, and series of Unicode bytes
which represent those arabic words (like this: \'c2\'e4\'f6\'d3\'f3\'c9
\'f1). Now I have to figure out how to convert my arabic text to bunch
of \'somethings. If I understood Unicode correctly (and I'm not sure if
I did), I first have to figure out which encoding this is (UTF-16 or
UTF-32 or some other) and then convert the letters to their byte
representation. I think I could figure out the first part by chance (by
trying every encoding and comparing to the result I already have), if
only I knew how to do the second part.

So, my question is, how to convert an Unicode character into its byte
representation?
 
J

Jon Skeet [C# MVP]

What I have is a bunch of text in arabic, and series of Unicode bytes
which represent those arabic words (like this: \'c2\'e4\'f6\'d3\'f3\'c9
\'f1). Now I have to figure out how to convert my arabic text to bunch
of \'somethings. If I understood Unicode correctly (and I'm not sure if
I did), I first have to figure out which encoding this is (UTF-16 or
UTF-32 or some other) and then convert the letters to their byte
representation. I think I could figure out the first part by chance (by
trying every encoding and comparing to the result I already have), if
only I knew how to do the second part.

So, my question is, how to convert an Unicode character into its byte
representation?

The first thing is to understand that "series of Unicode bytes"
doesn't make sense - it's like saying "series of letter digits". It
sounds like you've got a series of bytes which is an encoded string -
but you need to know which encoding you're dealing with.

The System.Text.Encoding class is the central class which manages
encoding and decoding text - converting it between strings/chars
in .NET and bytes.

See http://pobox.com/~skeet/csharp/unicode.html for more on this.

Jon
 
N

Nikola Skoric

The first thing is to understand that "series of Unicode bytes"
doesn't make sense - it's like saying "series of letter digits".

Yes, my understanding of Unicode is quite low. I will study the link you
posted, thanks.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top