Display Unicode characters on Winforms

B

Bill Nguyen

I'm getting data from a mySQL database (default char set = UTF-8).
I need to display data in Unicode but got only mongolian characters like
this: Phạm Thị Ngọc

I changed the textbox font to Arial Unicode MS but still not working.

Do I need conversion of data stored in mySQL database before displaying?
Thanks

Bill
 
H

Herfried K. Wagner [MVP]

Bill Nguyen said:
I'm getting data from a mySQL database (default char set = UTF-8).
I need to display data in Unicode but got only mongolian characters like
this: Phạm Thị Ngọc

I changed the textbox font to Arial Unicode MS but still not working.

Do I need conversion of data stored in mySQL database before displaying?

Windows Forms controls cannot directly convert the character entities like
'ạ' to the appropriate character. You may want to replace the string
"&#<number>;" with the value of 'ChrW(<number>)' or simply do not encode the
characters in the database using that way.
 
B

Bill nguyen

Herfried;
I did not encode data. It must be part of the ISP procedure.
The text are displayed correctly with browsers, both IE and Firefox.
It's gonna be a big task trying the convert those <number> with ChrW because
they are mixing with characters all over.

Bill
 
B

Bill nguyen

Herfried;

I don't know if this will work, but I need help to try it:
here's sample of the text string

"Nghiên Cứu - Phê Bình"

I need to read each byte in the text string, then use chrW to convert it to
Unicode.

I tried chrW(ascW(textString)) but it only converts the 1st letter.

Is there a function to read all bytes in the text string in 1 pass?
Thanks

Bill
 
J

Jay B. Harlow [MVP - Outlook]

Bill,
You could use a RegEx to convert the char escape codes to chars.

You could implement what Herfried suggested with something like:

Const input As String = "Nghiên Cứu - Phê Bình"

Const pattern As String = "\&\#\d{4}\;"
Static parser As New Regex(pattern, RegexOptions.Compiled)
Dim output As String = parser.Replace(input, AddressOf
MatchEvaluator)

Private Function MatchEvaluator(ByVal input As Match) As String
Dim value As String = input.Value.Substring(2, 4)
Return ChrW(CInt(value))
End Function


Does the 7913 represent a 4 digit decimal or hexidecimal number? You may
need to change the call to CInt accordingly...

--
Hope this helps
Jay B. Harlow [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net


| Herfried;
|
| I don't know if this will work, but I need help to try it:
| here's sample of the text string
|
| "Nghiên Cứu - Phê Bình"
|
| I need to read each byte in the text string, then use chrW to convert it
to
| Unicode.
|
| I tried chrW(ascW(textString)) but it only converts the 1st letter.
|
| Is there a function to read all bytes in the text string in 1 pass?
| Thanks
|
| Bill
|
|
|
| | >> I'm getting data from a mySQL database (default char set = UTF-8).
| >> I need to display data in Unicode but got only mongolian characters
like
| >> this: Phạm Thị Ngọc
| >>
| >> I changed the textbox font to Arial Unicode MS but still not working.
| >>
| >> Do I need conversion of data stored in mySQL database before
displaying?
| >
| > Windows Forms controls cannot directly convert the character entities
like
| > 'ạ' to the appropriate character. You may want to replace the
| > string "&#<number>;" with the value of 'ChrW(<number>)' or simply do not
| > encode the characters in the database using that way.
| >
| > --
| > M S Herfried K. Wagner
| > M V P <URL:http://dotnet.mvps.org/>
| > V B <URL:http://classicvb.org/petition/>
|
|
 
B

Bill Nguyen

Jay;

If you look at the string again, you'll see that not only the 4-digit group
that needs to be translated but also other characters as well: (those in
squared brackets as below):

Nghi[ê]n Cứu - Ph[ê ]B[ì]nh

I'm using phpWebsite and mySQL database from an ISP (IpowerWeb.com).
Input text is Unicode when a webpage is created/updated.
The text string above is stored in mySQL table instead.
I gues I have to convert the text back to Unicode to view/edit then put it
back. mySQL probably converts the text to the above format by itself.

Any suggestion on how to accomplish this?

Thanks again

Bill


Jay B. Harlow said:
Bill,
You could use a RegEx to convert the char escape codes to chars.

You could implement what Herfried suggested with something like:

Const input As String = "Nghiên Cứu - Phê Bình"

Const pattern As String = "\&\#\d{4}\;"
Static parser As New Regex(pattern, RegexOptions.Compiled)
Dim output As String = parser.Replace(input, AddressOf
MatchEvaluator)

Private Function MatchEvaluator(ByVal input As Match) As String
Dim value As String = input.Value.Substring(2, 4)
Return ChrW(CInt(value))
End Function


Does the 7913 represent a 4 digit decimal or hexidecimal number? You may
need to change the call to CInt accordingly...

--
Hope this helps
Jay B. Harlow [MVP - Outlook]
.NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net


| Herfried;
|
| I don't know if this will work, but I need help to try it:
| here's sample of the text string
|
| "Nghiên Cứu - Phê Bình"
|
| I need to read each byte in the text string, then use chrW to convert it
to
| Unicode.
|
| I tried chrW(ascW(textString)) but it only converts the 1st letter.
|
| Is there a function to read all bytes in the text string in 1 pass?
| Thanks
|
| Bill
|
|
|
| | >> I'm getting data from a mySQL database (default char set = UTF-8).
| >> I need to display data in Unicode but got only mongolian characters
like
| >> this: Phạm Thị Ngọc
| >>
| >> I changed the textbox font to Arial Unicode MS but still not working.
| >>
| >> Do I need conversion of data stored in mySQL database before
displaying?
| >
| > Windows Forms controls cannot directly convert the character entities
like
| > 'ạ' to the appropriate character. You may want to replace the
| > string "&#<number>;" with the value of 'ChrW(<number>)' or simply do
not
| > encode the characters in the database using that way.
| >
| > --
| > M S Herfried K. Wagner
| > M V P <URL:http://dotnet.mvps.org/>
| > V B <URL:http://classicvb.org/petition/>
|
|
 
J

Jay B. Harlow [MVP - Outlook]

Bill,
I would extend the pattern to also match the square brackets also, then
modify the MatchEvaluator function to behave according to either the first
escape sequence or the second escape sequence...



--
Hope this helps
Jay B. Harlow [MVP - Outlook]
..NET Application Architect, Enthusiast, & Evangelist
T.S. Bradley - http://www.tsbradley.net


| Jay;
|
| If you look at the string again, you'll see that not only the 4-digit
group
| that needs to be translated but also other characters as well: (those in
| squared brackets as below):
|
| Nghi[ê]n Cứu - Ph[ê ]B[ì]nh
|
| I'm using phpWebsite and mySQL database from an ISP (IpowerWeb.com).
| Input text is Unicode when a webpage is created/updated.
| The text string above is stored in mySQL table instead.
| I gues I have to convert the text back to Unicode to view/edit then put it
| back. mySQL probably converts the text to the above format by itself.
|
| Any suggestion on how to accomplish this?
|
| Thanks again
|
| Bill
|
|
| message | > Bill,
| > You could use a RegEx to convert the char escape codes to chars.
| >
| > You could implement what Herfried suggested with something like:
| >
| > Const input As String = "Nghiên Cứu - Phê Bình"
| >
| > Const pattern As String = "\&\#\d{4}\;"
| > Static parser As New Regex(pattern, RegexOptions.Compiled)
| > Dim output As String = parser.Replace(input, AddressOf
| > MatchEvaluator)
| >
| > Private Function MatchEvaluator(ByVal input As Match) As String
| > Dim value As String = input.Value.Substring(2, 4)
| > Return ChrW(CInt(value))
| > End Function
| >
| >
| > Does the 7913 represent a 4 digit decimal or hexidecimal number? You may
| > need to change the call to CInt accordingly...
| >
| > --
| > Hope this helps
| > Jay B. Harlow [MVP - Outlook]
| > .NET Application Architect, Enthusiast, & Evangelist
| > T.S. Bradley - http://www.tsbradley.net
| >
| >
| > | > | Herfried;
| > |
| > | I don't know if this will work, but I need help to try it:
| > | here's sample of the text string
| > |
| > | "Nghiên Cứu - Phê Bình"
| > |
| > | I need to read each byte in the text string, then use chrW to convert
it
| > to
| > | Unicode.
| > |
| > | I tried chrW(ascW(textString)) but it only converts the 1st letter.
| > |
| > | Is there a function to read all bytes in the text string in 1 pass?
| > | Thanks
| > |
| > | Bill
| > |
| > |
| > |
| > | | > | >> I'm getting data from a mySQL database (default char set = UTF-8).
| > | >> I need to display data in Unicode but got only mongolian characters
| > like
| > | >> this: Phạm Thị Ngọc
| > | >>
| > | >> I changed the textbox font to Arial Unicode MS but still not
working.
| > | >>
| > | >> Do I need conversion of data stored in mySQL database before
| > displaying?
| > | >
| > | > Windows Forms controls cannot directly convert the character
entities
| > like
| > | > 'ạ' to the appropriate character. You may want to replace the
| > | > string "&#<number>;" with the value of 'ChrW(<number>)' or simply do
| > not
| > | > encode the characters in the database using that way.
| > | >
| > | > --
| > | > M S Herfried K. Wagner
| > | > M V P <URL:http://dotnet.mvps.org/>
| > | > V B <URL:http://classicvb.org/petition/>
| > |
| > |
| >
| >
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top