Unicode

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

If I call "€".getBytes() in Java I get 0x80.
If I call myUTF8Encoding.GetBytes("€") in C# I get 0xe2, 0x82, 0xac.

In case you can't read it the chararacter in double quotes is a Euro
symbol.

I need to pass strings to Java via JNI so I need to have the same
Unicode bytes.

I know this is something to do usng different character sets but I
can't work out how to get C# to return a single byte 0x80 for the Euro
symbol.

Similar thing happens for the Yen symbol.
 
If I call "¤".getBytes() in Java I get 0x80.
If I call myUTF8Encoding.GetBytes("¤") in C# I get 0xe2, 0x82, 0xac.

In case you can't read it the chararacter in double quotes is a Euro
symbol.

I need to pass strings to Java via JNI so I need to have the same
Unicode bytes.

I know this is something to do usng different character sets but I
can't work out how to get C# to return a single byte 0x80 for the Euro
symbol.

Encoding.Default.GetBytes("¤")

Till
 
If I call "?".getBytes() in Java I get 0x80.

The answer is not to call String.getBytes(), but
String.getBytes(encodingName) in Java. Otherwise you're just using the
default encoding, which is rarely a good idea.
If I call myUTF8Encoding.GetBytes("?") in C# I get 0xe2, 0x82, 0xac.

In case you can't read it the chararacter in double quotes is a Euro
symbol.

I need to pass strings to Java via JNI so I need to have the same
Unicode bytes.

I know this is something to do usng different character sets but I
can't work out how to get C# to return a single byte 0x80 for the Euro
symbol.

It's not your C# code which is the problem here - it's your Java code.
 
If I call "€".getBytes() in Java I get 0x80.
If I call myUTF8Encoding.GetBytes("€") in C# I get 0xe2, 0x82, 0xac.
This is because from C# you ask for UTF8.
 
Solved it. This was a bit of a red herring - I had called
getBytes("UTF-8") to check the contents but a fault in my NUnit test
was showing just 0x80 - the results from an old test. Once I'd spotted
this I was able to find my problem and I've got Java/C# Unicode strings
passing bothways with no problem. Thanks all for your help anyway.
 
Back
Top