Extended ASCII Encoding in .NET

G

Guest

Does anyone know how to decode extended ASCII string into extended ASCII bytes?

For example, "ä" is 228 in the extended ASCII character set.

ASCIIEncoding supports 7-bit ASCII, thus every character in the extended set
is decoded as "?".

UnicodeEncoding, UTF7Encoding, UTF8Encoding, UTF32Encoding does not provide
correct results. I was thinking to create a new encoder class, but before
that I would like to know if there is some class in .NET which can do the
encoding.
 
G

Gerrit H

Reading specific encoding is supported in .NET. Try code simular to the
following:

_inputReader = new StreamReader(inputStream,
System.Text.Encoding.ASCII.WindowsCodePage);
_line = _inputReader.ReadLine();

Replace the System.Text.Encoding.ASCII.WindowsCodePage with your requested
encoding type and the StreamReader will do the rest.

G
 
J

Jon Skeet [C# MVP]

JoeUser said:
Does anyone know how to decode extended ASCII string into extended ASCII bytes?

There is no one "extended ASCII" encoding.
For example, "?" is 228 in the extended ASCII character set.

In *which* "extended ASCII" character set? There are lots of code pages
which might all be called "extended ASCII".

You need to find out which code page you really mean, then ask for the
encoding for that code page.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
G

Guest

Hello Jon,

Thank you for your comments and help. Problem still exists, but I found a
detour.

By Extended ASCII set I ment characters with ASCII codes 128 ... 255, this
set is also called as the "IBM character set" or 8-bit ASCII. In the
following, I refer to this set of characters.

Closest codepage in my application is 28591.

Gerrit's reply gave me idea to try using encoding class as follows:

Encoding enc = Encoding.GetEncoding(28591);
byte[] encodedBytes = enc.GetBytes(myString);

However, this does not produce extended ASCII character set. For example
encodedBytes = enc.GetBytes("ä");
// encodedBytes[0] = 228, encodedBytes.Length = 1
// "ä" is character number 132 in the Extended ASCII set

However, it is not absolutely necessary to have ASCII conversion as long as
the conversion is unique, that is each character in the Extended set gets
unique value between 128 ... 255. Having conversion to 8-bit ASCII would have
been the best option, but this other option seems to provide a workable
solution.
 
S

Stefan Simek

Hi,

The following encodings seem to fulfill your 'ä' = 132 request

437 - IBM437
775 - ibm775
850 - ibm850
852 - ibm852
857 - ibm857
858 - IBM00858
861 - ibm861
865 - IBM865
29001 - x-Europa

But I expect you require on of the IBM 850/852 encodings, which are/were
used widely. But I've never heard of any of them to be refered to as
"Extended ASCII" ;)

HTH,
Stefan
Hello Jon,

Thank you for your comments and help. Problem still exists, but I found a
detour.

By Extended ASCII set I ment characters with ASCII codes 128 ... 255, this
set is also called as the "IBM character set" or 8-bit ASCII. In the
following, I refer to this set of characters.

Closest codepage in my application is 28591.

Gerrit's reply gave me idea to try using encoding class as follows:

Encoding enc = Encoding.GetEncoding(28591);
byte[] encodedBytes = enc.GetBytes(myString);

However, this does not produce extended ASCII character set. For example
encodedBytes = enc.GetBytes("ä");
// encodedBytes[0] = 228, encodedBytes.Length = 1
// "ä" is character number 132 in the Extended ASCII set

However, it is not absolutely necessary to have ASCII conversion as long as
the conversion is unique, that is each character in the Extended set gets
unique value between 128 ... 255. Having conversion to 8-bit ASCII would have
been the best option, but this other option seems to provide a workable
solution.

Jon Skeet said:
There is no one "extended ASCII" encoding.

In *which* "extended ASCII" character set? There are lots of code pages
which might all be called "extended ASCII".

You need to find out which code page you really mean, then ask for the
encoding for that code page.

See http://www.pobox.com/~skeet/csharp/unicode.html for more
information.
 
J

Jon Skeet [C# MVP]

JoeUser said:
Thank you for your comments and help. Problem still exists, but I found a
detour.

By Extended ASCII set I ment characters with ASCII codes 128 ... 255, this
set is also called as the "IBM character set" or 8-bit ASCII.

That doesn't describe a single set of characters. What unicode
character do you want byte 128 to mean, for instance? What unicode
character do you want byte 129 to mean?
In the following, I refer to this set of characters.

Closest codepage in my application is 28591.

Gerrit's reply gave me idea to try using encoding class as follows:

Encoding enc = Encoding.GetEncoding(28591);
byte[] encodedBytes = enc.GetBytes(myString);

However, this does not produce extended ASCII character set. For example
encodedBytes = enc.GetBytes("?");
// encodedBytes[0] = 228, encodedBytes.Length = 1
// "?" is character number 132 in the Extended ASCII set

What actual character is it?
However, it is not absolutely necessary to have ASCII conversion as long as
the conversion is unique, that is each character in the Extended set gets
unique value between 128 ... 255. Having conversion to 8-bit ASCII would have
been the best option, but this other option seems to provide a workable
solution.

If you could tell us which Unicode character you expect to get from
each byte, we could probably work out which encoding you actually mean.
Did you read the link I referenced, by the way?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top