Encoding.Convert Characters replaced by Questionmark

  • Thread starter Thread starter Paw Pedersen
  • Start date Start date
P

Paw Pedersen

When using Encoding.Convert to convert from UTF-8 to ISO646-US the special
Chars like ÆØÅ are replaced with a questionmark (?).
Is there any other way to convert between encodings where you can set the
char used to replace the special chars, or where the special chars is simply
left out instead? Since this is for an EDI document it won't work with a
questionmark.

Regards Paw
 
I'm not sure how you would do this in .NET 1.1 but in 2.0 (VS2005) you
can use a class called EncoderReplacementFallback. Here's some code to
demonstrate:

using System;
using System.Collections.Generic;
using System.Text;

namespace DemonstrateEncoding
{
class Program
{
static void Main(string[] args)
{
string convertMe = "an arabic character ï»’ in a string";

//this encoding will use a blank space for unconvertable
characters
Encoding asciiEncoding = System.Text.Encoding.GetEncoding(
"ISO646-US", new EncoderReplacementFallback(""),
new DecoderReplacementFallback());

Encoding utf8Encoding =
System.Text.Encoding.GetEncoding("UTF-8");

byte[] utf8Bytes = utf8Encoding.GetBytes(convertMe);

byte[] asciiBytes =
Encoding.Convert(utf8Encoding, asciiEncoding,
utf8Bytes);

Console.WriteLine(asciiEncoding.GetString(asciiBytes));
}
}
}
 
Thus wrote Paw Pedersen" newsATpaws.dk,
When using Encoding.Convert to convert from UTF-8 to ISO646-US the
special
Chars like ÆØÅ are replaced with a questionmark (?).
Is there any other way to convert between encodings where you can set
the
char used to replace the special chars, or where the special chars is
simply
left out instead? Since this is for an EDI document it won't work with
a
questionmark.

ISO-646-US is a fancy name for Ye Olde US-ASCII, which is completely unable
to represent any of the characters you've posted.

There's really no solution here other than masking all Unicode content with
some escape notation (think \uxxxx) or using replacement characters like
ae, oe, or ue for ä, ö, or ü in German for example.

Cheers,
 
Thus wrote Joerg,
Thus wrote Paw Pedersen" newsATpaws.dk,

ISO-646-US is a fancy name for Ye Olde US-ASCII, which is completely
unable to represent any of the characters you've posted.

There's really no solution here other than masking all Unicode content
with some escape notation (think \uxxxx) or using replacement
characters like ae, oe, or ue for ä, ö, or ü in German for example.

Argh, next time I promise to read through the entire post first before responding
;-)

As far as automatically inserting replacements, see the other post that shows
the use of EncodeReplacementFallback and DecoderReplacementFallback.

Cheers,
 
Thank you for the answer,

Unfortunately it is in .Net 1.1, so I'm afraid I have to replace all the
special chars before changing encoding.

Regards Paw

I'm not sure how you would do this in .NET 1.1 but in 2.0 (VS2005) you
can use a class called EncoderReplacementFallback. Here's some code to
demonstrate:

using System;
using System.Collections.Generic;
using System.Text;

namespace DemonstrateEncoding
{
class Program
{
static void Main(string[] args)
{
string convertMe = "an arabic character ? in a string";

//this encoding will use a blank space for unconvertable
characters
Encoding asciiEncoding = System.Text.Encoding.GetEncoding(
"ISO646-US", new EncoderReplacementFallback(""),
new DecoderReplacementFallback());

Encoding utf8Encoding =
System.Text.Encoding.GetEncoding("UTF-8");

byte[] utf8Bytes = utf8Encoding.GetBytes(convertMe);

byte[] asciiBytes =
Encoding.Convert(utf8Encoding, asciiEncoding,
utf8Bytes);

Console.WriteLine(asciiEncoding.GetString(asciiBytes));
}
}
}
 
Back
Top