Encoding.Convert Characters replaced by Questionmark

Paw Pedersen · Mar 22, 2006

When using Encoding.Convert to convert from UTF-8 to ISO646-US the special
Chars like ÆØÅ are replaced with a questionmark (?).
Is there any other way to convert between encodings where you can set the
char used to replace the special chars, or where the special chars is simply
left out instead? Since this is for an EDI document it won't work with a
questionmark.

Regards Paw

dorminey · Mar 22, 2006

I'm not sure how you would do this in .NET 1.1 but in 2.0 (VS2005) you
can use a class called EncoderReplacementFallback. Here's some code to
demonstrate:

using System;
using System.Collections.Generic;
using System.Text;

namespace DemonstrateEncoding
{
class Program
{
static void Main(string[] args)
{
string convertMe = "an arabic character ï»’ in a string";

//this encoding will use a blank space for unconvertable
characters
Encoding asciiEncoding = System.Text.Encoding.GetEncoding(
"ISO646-US", new EncoderReplacementFallback(""),
new DecoderReplacementFallback());

Encoding utf8Encoding =
System.Text.Encoding.GetEncoding("UTF-8");

byte[] utf8Bytes = utf8Encoding.GetBytes(convertMe);

byte[] asciiBytes =
Encoding.Convert(utf8Encoding, asciiEncoding,
utf8Bytes);

Console.WriteLine(asciiEncoding.GetString(asciiBytes));
}
}
}

Joerg Jooss · Mar 22, 2006

Thus wrote Paw Pedersen" newsATpaws.dk,

When using Encoding.Convert to convert from UTF-8 to ISO646-US the
special
Chars like ÆØÅ are replaced with a questionmark (?).
Is there any other way to convert between encodings where you can set
the
char used to replace the special chars, or where the special chars is
simply
left out instead? Since this is for an EDI document it won't work with
a
questionmark.

ISO-646-US is a fancy name for Ye Olde US-ASCII, which is completely unable
to represent any of the characters you've posted.

There's really no solution here other than masking all Unicode content with
some escape notation (think \uxxxx) or using replacement characters like
ae, oe, or ue for ä, ö, or ü in German for example.

Cheers,

Joerg Jooss · Mar 22, 2006

Thus wrote Joerg,

Thus wrote Paw Pedersen" newsATpaws.dk,

ISO-646-US is a fancy name for Ye Olde US-ASCII, which is completely
unable to represent any of the characters you've posted.

There's really no solution here other than masking all Unicode content
with some escape notation (think \uxxxx) or using replacement
characters like ae, oe, or ue for ä, ö, or ü in German for example.

Argh, next time I promise to read through the entire post first before responding
;-)

As far as automatically inserting replacements, see the other post that shows
the use of EncodeReplacementFallback and DecoderReplacementFallback.

Cheers,

Paw Pedersen · Mar 24, 2006

Thank you for the answer,

Unfortunately it is in .Net 1.1, so I'm afraid I have to replace all the
special chars before changing encoding.

Regards Paw

I'm not sure how you would do this in .NET 1.1 but in 2.0 (VS2005) you
can use a class called EncoderReplacementFallback. Here's some code to
demonstrate:

using System;
using System.Collections.Generic;
using System.Text;

namespace DemonstrateEncoding
{
class Program
{
static void Main(string[] args)
{
string convertMe = "an arabic character ? in a string";

//this encoding will use a blank space for unconvertable
characters
Encoding asciiEncoding = System.Text.Encoding.GetEncoding(
"ISO646-US", new EncoderReplacementFallback(""),
new DecoderReplacementFallback());

Encoding utf8Encoding =
System.Text.Encoding.GetEncoding("UTF-8");

byte[] utf8Bytes = utf8Encoding.GetBytes(convertMe);

byte[] asciiBytes =
Encoding.Convert(utf8Encoding, asciiEncoding,
utf8Bytes);

Console.WriteLine(asciiEncoding.GetString(asciiBytes));
}
}
}

Encoding.Convert Characters replaced by Questionmark

Paw Pedersen

dorminey

Joerg Jooss

Joerg Jooss

Paw Pedersen