UTF8 Decoder

G

Guest

I am using the following code to decode a UTF string from an XML element.
All the fonts being used is Arial Unicode MS and the UTF characters are not
corrupt as they display correctly within the raw XML page. The code below
works fine with the majority of characters, with the exception of various
characters such as, U+00FC and U+00FE Any assistance would be greatly
appreciated.

Luke

Dim MessageString As String
Dim MessageBuffer() As Byte
Dim MessageChar() As Char
Dim Decoder As System.Text.Decoder
Dim UTF8Code As String
'
MessageString = CurrentMessage.Value
'
' Decode the UTF bytes that are displayed within the XML Service
MessageBuffer = System.Text.Encoding.UTF8.GetBytes(MessageString.ToCharArray)

ReDim MessageChar(MessageBuffer.Length)
Decoder = System.Text.Encoding.UTF8.GetDecoder()
Decoder.GetChars(MessageBuffer, 0, MessageBuffer.Length, MessageChar, 0)
'Loop through the Char() array
UTF8Code = String.Empty
For Each Character As Char In MessageChar
' Format for RTB output
UTF8Code &= "\u" & Convert.ToUInt32(Character).ToString() & "?"
Next Character
MessageString = UTF8Code
 
J

Jon Skeet [C# MVP]

Luke said:
I am using the following code to decode a UTF string from an XML element.
All the fonts being used is Arial Unicode MS and the UTF characters are not
corrupt as they display correctly within the raw XML page. The code below
works fine with the majority of characters, with the exception of various
characters such as, U+00FC and U+00FE Any assistance would be greatly
appreciated.

Luke

Dim MessageString As String
Dim MessageBuffer() As Byte
Dim MessageChar() As Char
Dim Decoder As System.Text.Decoder
Dim UTF8Code As String
'
MessageString = CurrentMessage.Value
'
' Decode the UTF bytes that are displayed within the XML Service
MessageBuffer = System.Text.Encoding.UTF8.GetBytes(MessageString.ToCharArray)

ReDim MessageChar(MessageBuffer.Length)
Decoder = System.Text.Encoding.UTF8.GetDecoder()
Decoder.GetChars(MessageBuffer, 0, MessageBuffer.Length, MessageChar, 0)
'Loop through the Char() array
UTF8Code = String.Empty
For Each Character As Char In MessageChar
' Format for RTB output
UTF8Code &= "\u" & Convert.ToUInt32(Character).ToString() & "?"
Next Character
MessageString = UTF8Code

It's not entirely clear what you're seeing that you don't expect.

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top