I wanted to share this with anyone who may stumble across this thread.
X.690 specifies a BMP string is the 2-octet canonical form (specified
in ISO/IEC 10646-1) [1] of UCS (also specified in ISO/IEC 10646-1)
[2]. Note that ASN.1 does not include an "Endianness" marker in its
octet stream [3].
ISO 10646 prefers Big Endianness, though it is not standardized [4].
According to Microsoft, CodePage 1200 is Unicode, and CodePage 1201 is
Unicode Big Endian [5].
In memory, the byte[] from the ASN.1 sequence is [00 77 00 105 00 99
00 114 00 111 115 00 111 00 102 00 116 ...]. This is a Big Endian
Serialization [6] from Microsoft's Cryptograhic Service Provider.
When interpreted as Unicode, I receive a non printable string. When
interpreted as Big Endian Unicode (CodePage 1201), I receive
'Microsoft'.
So, a test of byte[] (Value[0]) is required to properly return the
string. The return will be either
Encoding.GetEncoding(1200).GetString(Value); or
Encoding.GetEncoding(1201).GetString(Value). This is expected, since
ASN.1 is a Presentation Layer protocol - it is up to the
implementation to interpret the stream at the Application Layer.
Simply using Encoding.Unicode does not produce expected results in
every case, since Unicode implies Little Endianness when using C#.
The same is true for UCS (sometime referred to as UCS-4, the 4-octet
canonicalization of ISO 10646). The two conversions of interest are
below. CP 65005 is little endian; 65006 is big endian:
Encoding.GetEncoding(65005).GetString(byte[])
Encoding.GetEncoding(65006).GetString(byte[])
Microsoft defaults to little endian, so UTF-16 uses CP 1200; and
UTF-32 defaults to CP 65005. Interestingly, when decoding a Digital
Certificate, Microsoft's own CSP encodes some data using ASN.1's BMP
String - but it is Big Endian.
Jeff
[1] X.690, p. 16, Section 8.21.8
[2] X.690, p. 16, Section 8.21.9
[3] X.690, p. 16, Section 8.21.9. Note 2b.
[4] ISO-10646, Section 6.3
[5]
http://msdn2.microsoft.com/en-US/lib...odinginfo.aspx
[6] RFC 2781, Section 3.1