MultiByteToWideChar in .NET - Multibyte to Unicode conversion

G

groups

I have a C# application which needs to convert MultiByte strings to
Unicode.
However, I cannot get MultiByteToWideChar to behave as expected within
..net.
I have declared it as follows:

[DllImport("Kernel32", CharSet = CharSet.Auto)]
static extern Int32 MultiByteToWideChar(
UInt32 codePage,
UInt32 dwFlags,
[In, MarshalAs(UnmanagedType.LPStr)] String lpMultiByteStr,
Int32 cbMultiByte,
[Out, MarshalAs(UnmanagedType.LPWStr)] StringBuilder lpWideCharStr,

Int32 cchWideChar);

And am using it as follows:

private string ConvertToUnicode( string str, uint codepage)
{
int l = str.Length;
int i = 0;
i = MultiByteToWideChar( codepage, 0, str, -1, null, 0);
StringBuilder wideStr = new StringBuilder(i);
i = MultiByteToWideChar( codepage, 0, str, -1, wideStr,
wideStr.Capacity);
string s = wideStr.ToString();
return s;
}

If I initialize a C# string with the following bytes: 43, 3A, 5C, 83,
88, 83, 45, 83, 52, 83, 5C, 00 and use the ConvertToUnicode function
above with codepage 932 (Japanese), i get garbage (C:\???E?R?\).
However, using a pure .NET solution (below) I get the correct string
(C:\ヨウコソ):

private string MultibyteToUnicodeNETOnly( string str, int codepage)
{
byte[] source = MCBSToByte(str);
Encoding e1 = Encoding.GetEncoding(codepage);
Encoding e2 = Encoding.Unicode;
byte[] target = Encoding.Convert( e1, e2, source);
return e2.GetString( target);
}

private byte[] MCBSToByte(string s)
{
byte[] b = new byte[s.Length];
int i = 0 ;
foreach( char c in s)
b[ i++] = (byte)c;
return b;
}

Any insights on a way to get MultiByteToWideChar to work, or a better
solution? Thanks in advance.
 
M

Michael \(michka\) Kaplan [MS]

What is wrong with the pure managed solution? It is even better and faster
in 2.0....


--
MichKa [Microsoft]
NLS Collation/Locale/Keyboard Technical Lead
Globalization Infrastructure, Fonts, and Tools
Blog: http://blogs.msdn.com/michkap

This posting is provided "AS IS" with
no warranties, and confers no rights.

I have a C# application which needs to convert MultiByte strings to
Unicode.
However, I cannot get MultiByteToWideChar to behave as expected within
..net.
I have declared it as follows:

[DllImport("Kernel32", CharSet = CharSet.Auto)]
static extern Int32 MultiByteToWideChar(
UInt32 codePage,
UInt32 dwFlags,
[In, MarshalAs(UnmanagedType.LPStr)] String lpMultiByteStr,
Int32 cbMultiByte,
[Out, MarshalAs(UnmanagedType.LPWStr)] StringBuilder lpWideCharStr,

Int32 cchWideChar);

And am using it as follows:

private string ConvertToUnicode( string str, uint codepage)
{
int l = str.Length;
int i = 0;
i = MultiByteToWideChar( codepage, 0, str, -1, null, 0);
StringBuilder wideStr = new StringBuilder(i);
i = MultiByteToWideChar( codepage, 0, str, -1, wideStr,
wideStr.Capacity);
string s = wideStr.ToString();
return s;
}

If I initialize a C# string with the following bytes: 43, 3A, 5C, 83,
88, 83, 45, 83, 52, 83, 5C, 00 and use the ConvertToUnicode function
above with codepage 932 (Japanese), i get garbage (C:\???E?R?\).
However, using a pure .NET solution (below) I get the correct string
(C:\????):

private string MultibyteToUnicodeNETOnly( string str, int codepage)
{
byte[] source = MCBSToByte(str);
Encoding e1 = Encoding.GetEncoding(codepage);
Encoding e2 = Encoding.Unicode;
byte[] target = Encoding.Convert( e1, e2, source);
return e2.GetString( target);
}

private byte[] MCBSToByte(string s)
{
byte[] b = new byte[s.Length];
int i = 0 ;
foreach( char c in s)
b[ i++] = (byte)c;
return b;
}

Any insights on a way to get MultiByteToWideChar to work, or a better
solution? Thanks in advance.
 
M

Mattias Sjögren

[DllImport("Kernel32", CharSet = CharSet.Auto)]

There's no point in specifying CharSet.Auto here since there's only
one MultiByteToWideChar function.

[In, MarshalAs(UnmanagedType.LPStr)] String lpMultiByteStr,

This should be a byte[] instead of a String.

If I initialize a C# string with the following bytes: 43, 3A, 5C, 83,
88, 83, 45, 83, 52, 83, 5C, 00

That's your problem, you shouldn't use a string to store byte values.
A string is already Unicode in .NET so the conversion has already
taken place.

Any insights on a way to get MultiByteToWideChar to work, or a better
solution?

Any reason you don't want to use the Encoding classes?


Mattias
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top