C# Chr and Asc Function Equivalents - The Undocumented Truth!

  • Thread starter Darrell Sparti, MCSD
  • Start date
D

Darrell Sparti, MCSD

There have been many postings about this subject on this newsgroup.
Unfortunately, they're incorrect. You can't just cast a value in C#
and have it work for all ASCII characters. Nor can you use the ASCII
encoding as some have suggested.

The undocumented truth is, Microsoft uses the Western European encoding
in these functions. If you don't believe me, use 137 in your VB Chr
function then compare the C# output if you just cast it or use the
ASCII encoding. You'll see they don't match! You can even do a quick
loop and check all the values from 0 to 255 and you'll see that there
are many that won't match the VB function's output.

Now if you're doing simple stuff, that maybe OK but if your writing
components in one language and expect to communicate with components
written in the other, you're going to have a real problem. Case in
point, you've got an older VB6 object (or VB.Net object for that
matter) that uses it's own encryption algorithm and it must communicate
with a C# object that must mimic the encryption function. If you don't
use the proper implementation of the Chr and Asc functions in your C#
component, you'll never be able to decipher the encrypted data from the
VB component.

Here are the true implementations of the Asc and Chr functions:

internal static string Chr(int p_intByte)
{
if( (p_intByte < 0) || (p_intByte > 255) )
{
throw new ArgumentOutOfRangeException("p_intByte", p_intByte,
"Must be between 0 and 255.");
}
byte[] bytBuffer = new byte[]{(byte) p_intByte};
return Encoding.GetEncoding(1252).GetString(bytBuffer);
}

internal static int Asc(string p_strChar)
{
if( p_strChar.Length != 1 )
{
throw new ArgumentOutOfRangeException("p_strChar", p_strChar,
"Must be a single character.");
}
char[] chrBuffer = {Convert.ToChar(p_strChar)};
byte[] bytBuffer = Encoding.GetEncoding(1252).GetBytes(chrBuffer);
return (int) bytBuffer[0];
}

I hope this answers the question once and for all and puts an end to
the huge amount of misinformation that exists out there on this
subject.

Darrell Sparti, MCSD
Bikers Against Child Abuse National Webmaster
(e-mail address removed)
www.bacausa.com
Because No Child Should Live In Fear
 
J

Jon Skeet [C# MVP]

Darrell Sparti said:
There have been many postings about this subject on this newsgroup.
Unfortunately, they're incorrect. You can't just cast a value in C#
and have it work for all ASCII characters. Nor can you use the ASCII
encoding as some have suggested.

You can for *ASCII* characters. Don't forget that ASCII only extends to
126 or 127 (I can never remember whether 127 is considered to be part
of it or not; it's not particularly important though as it's
unprintable).
The undocumented truth is, Microsoft uses the Western European encoding
in these functions.

If you mean the VB.NET functions, it's perfectly well documented, and
it's not the Western European encoding - it's whatever the default
encoding is for the thread.

From the docs for Asc:

<quote>
Asc returns the code point, or character code, for the input character.
This can be 0 through 255 for single-byte character set (SBCS) values
and -32768 through 32767 for double-byte character set (DBCS) values.
The returned value depends on the code page for the current thread,
which is contained in the ANSICodePage property of the TextInfo class.
TextInfo.ANSICodePage can be obtained by specifying
System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
</quote>

And from the docs for Chr:

<quote>
Chr uses the Encoding class in the System.Text namespace to determine
if the current thread is using a single-byte character set (SBCS) or a
double-byte character set (DBCS). It then takes CharCode as a code
point in the appropriate set. The range can be 0 through 255 for SBCS
characters and -32768 through 65535 for DBCS characters. The returned
character depends on the code page for the current thread, which is
contained in the ANSICodePage property of the TextInfo class.
TextInfo.ANSICodePage can be obtained by specifying
System.Globalization.CultureInfo.CurrentCulture.TextInfo.ANSICodePage.
If you don't believe me, use 137 in your VB Chr
function then compare the C# output if you just cast it or use the
ASCII encoding. You'll see they don't match! You can even do a quick
loop and check all the values from 0 to 255 and you'll see that there
are many that won't match the VB function's output.

And that's what I'd expect, as ASCII doesn't have any values above 127,
and Unicode 128-159 is not the same as most ANSI code pages for the
same range.
Now if you're doing simple stuff, that maybe OK but if your writing
components in one language and expect to communicate with components
written in the other, you're going to have a real problem. Case in
point, you've got an older VB6 object (or VB.Net object for that
matter) that uses it's own encryption algorithm and it must communicate
with a C# object that must mimic the encryption function. If you don't
use the proper implementation of the Chr and Asc functions in your C#
component, you'll never be able to decipher the encrypted data from the
VB component.

Here are the true implementations of the Asc and Chr functions:

<snip>

Those would be fine if the thread's default code page is 1252, but
otherwise it's not correct.

I've had a bit of an experiment, and unfortunately the behaviour varies
depending on whether you're using .NET 1.1 or .NET 2.0, which doesn't
help matters. For instance, try the following program:

Option Strict On

Imports Microsoft.VisualBasic
Imports System
Imports System.Threading
Imports System.Globalization

Public Class Test

Shared Sub Main()
Thread.CurrentThread.CurrentCulture = new CultureInfo(7194)
Dim x As Char = Chr (240)
Console.WriteLine (AscW(x))
End Sub
End Class

Using .NET 1.1, this prints 240. Using .NET 2.0 it prints 1088. I've no
idea what it would do on VB6.

Changing the current culture of the thread makes a difference in *some*
situations but not others, which is plain bizarre.

Fortunately, C# is considerably more consistent in these matters. If
you need to interoperate with legacy VB code, I'd strongly suggest you
make sure you know *exactly* what that VB code is going to produce in
terms of actual encodings, including what happens in various cultures.
Once you know that, getting the C# side to work should be easy...
I hope this answers the question once and for all and puts an end to
the huge amount of misinformation that exists out there on this
subject.

Personally I think it just added to the misinformation, I'm afraid...
 
B

Bob Grommes

This is a perfect example of why the real solution to coding components that
will work in cross-language envrionments is to not use language-specific
libraries that are, after all, mostly there for backward compatibility
anyway. Use DirectCast() in VB in the same way you'd use C# casting and you
should get the same results. Should, at any rate; you'd have to test to be
sure there isn't some kind of tap dance going on under the hood. You never
know with VB.

It's all a matter of perspective. If you're used to VB6 and think that's
the "right" way that everything should work then you'll use VB.NET
constructs that produce those "correct" results and then rail against C# for
it's "incorrect" results.

What is "correct" for mixed language projects and components is that which
uses the framework and CLR without embellishment. What is "correct" for
porting legacy code to .NET -- at least as a first step -- might, arguably,
be to use compatibility functions. But to steer the best course in any
situation, you have to step back from a parochial viewpoint and look at the
bigger picture of how your components and apps will interact with the rest
of the managed world.

--Bob
 
J

Jon Skeet [C# MVP]

Bob Grommes said:
This is a perfect example of why the real solution to coding components that
will work in cross-language envrionments is to not use language-specific
libraries that are, after all, mostly there for backward compatibility
anyway. Use DirectCast() in VB in the same way you'd use C# casting and you
should get the same results. Should, at any rate; you'd have to test to be
sure there isn't some kind of tap dance going on under the hood. You never
know with VB.

It's all a matter of perspective. If you're used to VB6 and think that's
the "right" way that everything should work then you'll use VB.NET
constructs that produce those "correct" results and then rail against C# for
it's "incorrect" results.

What is "correct" for mixed language projects and components is that which
uses the framework and CLR without embellishment. What is "correct" for
porting legacy code to .NET -- at least as a first step -- might, arguably,
be to use compatibility functions. But to steer the best course in any
situation, you have to step back from a parochial viewpoint and look at the
bigger picture of how your components and apps will interact with the rest
of the managed world.

I think it can only be correct to use the compatibility functions if
you're absolutely sure about what they do. Unfortunately, having
experimented with the 2.0 and 1.1 implementations of Chr and Asc, it's
far from obvious to me exactly what they do when the current thread's
culture changes. Sometimes they seem "sticky" (taking the encoding of
the thread which first calls them) and sometimes they don't - and as
I've said, the results seem to vary depending on the version of the
framework used. (This could be an issue with the beta of 2.0, of
course.)

Hopefully the VB6 semantics are better defined, so that anyone wanting
to interoperate with data produced by VB6 can do so in a precise
manner.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top