Encoding convertions... optimized two-stage table?

T

ThunderMusic

Hi,
I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net
encoding does a pretty code job, but some characters are not converted, like
"O" that becomes "?"... I want it to become "oe"... So, what I want to know
is the method used by the .NET encoders to convert from one encoding to the
other... is it using an optimized two-stage table or a multistate table or
other method?

As I know the first 256 characters are the same, it's easy to convert those
256, but for the others, we have to make a correspondance, would an
optimized two-stage table be the best way to go? Does somebody know where I
could get such a table so I don't have to type it all myself?

thanks

ThunderMusic
 
J

Jon Skeet [C# MVP]

ThunderMusic said:
I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net
encoding does a pretty code job, but some characters are not converted, like
"O" that becomes "?"... I want it to become "oe"... So, what I want to know
is the method used by the .NET encoders to convert from one encoding to the
other... is it using an optimized two-stage table or a multistate table or
other method?

Converting from one encoding to another is just a matter of decoding
from a byte array to the Unicode (.NET's "native" UTF-16 format), then
encoding from the Unicode to a byte array.
As I know the first 256 characters are the same, it's easy to convert those
256, but for the others, we have to make a correspondance, would an
optimized two-stage table be the best way to go? Does somebody know where I
could get such a table so I don't have to type it all myself?

The process that .NET encodings are using won't help you much,
unfortunately. It sounds like the conversion you need is entirely
within text form - from the combined character to the multi-character
version.
 
M

Mihai N.

I want to convert UTF-16 (or unicode) to ISO-8859-1... The .Net
encoding does a pretty code job, but some characters are not converted,
like "O" that becomes "?"... I want it to become "oe"... So, what I want
to know
is the method used by the .NET encoders to convert from one encoding to the
other... is it using an optimized two-stage table or a multistate table or
other method?

The real solution is to move everything to Unicode, not trying to "squize"
the whole Unicode thru some code page hole with a non-standard,
patchy conversion.
 
T

ThunderMusic

actually, I just seen that the "O" character went wrong in the post (it must
be plain US ASCII), what I wanted to post is the one character "oe"... and
you just seen that the conversion is not perfect because only the "O" went
through... Well, I'll try my best anyway...

thanks everyone...

ThunderMusic
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top