MSXML and UTF-8 chinese characters

K

K

I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to retrieve
the data in a node. The CString was then display in CListCtrl. For the
traditional chinese characters, they were shown correctly, but for
simplified characters, I encounted many "?", but some characters were
correct.

if (MSXML::NODE_ELEMENT == pChild->nodeType)
{
MSXML::IXMLDOMNamedNodeMapPtr pAttrs = pChild->attributes;
MSXML::IXMLDOMNodePtr pAttr;

pAttr = pAttrs->getNamedItem(L"id");
CString id = OLE2T(pAttr->text);

MSXML::IXMLDOMNodePtr pWording = pChild->firstChild;
CString wording = OLE2T(pWording->text);

//add the wording to language
pMessageLanguage->m_wordingList.insert(MessageWordingListPair(id,
wording) );

}
 
J

Jochen Kalmbach

K said:
I've an XML file in UTF-8.
It contains some chinese characters ( both simplified chinese and
traditional chinese).

In loading the XML file with MSXML parser, I used the below code to
retrieve the data in a node. The CString was then display in
CListCtrl. For the traditional chinese characters, they were shown
correctly, but for simplified characters, I encounted many "?", but
some characters were correct.

You should compile with UNICODE and _UNICODE defined!
Or you have to convert the unicode to MBCS...

--
Greetings
Jochen

Do you need a memory-leak finder ?
http://www.codeproject.com/tools/leakfinder.asp
 
M

MerkX Zyban

K,

Does your XML file begin with the following line?

<?xml version="1.0" encoding="UTF-8" ?>

If not, add this line and see what happens. If you do have this line (or
you add it) and still have problems, then you may be using characters that
Windows cannot support or your fonts cannot display (i.e. traditional
Chinese).

Windows supports Unicode up to version 2.1 only. The XML parser converts
your XML source to UTF-16 and parsed internally. When the XML parser sees
the line above it will convert your XML file from UTF-8 with no loss of
information. However, without this line (specifically without the encoding
clue) the system default ANSI code page will be used when converting to
UTF-16.

Even with this line, you may still have characters that your fonts can't
display, however no loss in the conversion to/from UTF-8 will occur.

Hope this helps (and I hope I know what I'm talking about :)

-MerkX
 
K

K

My project was compiling as UNICODE build, and my XML was begin with the
<?xml ... ?> line, but my problem is still persist.

After reading in the node in MSXML, can I use the macro OLE2T then assign it
to a CStirng ??

What does CSTring store internally ?? I'm using VS.NET to compile my
projects.

I can see and edit the xml file in DreamWaver, so the fonts must be
supported by my system. However, after loading up the XML file by MSXML, and
get the node, and assigned to a CString, and display it out, the problem
happends, for some simplified chinese becomes "?", but some are okay.
 
M

Mihai N.

After reading in the node in MSXML, can I use the macro OLE2T then
assign it to a CStirng ??

What does CSTring store internally ?? I'm using VS.NET to compile my
projects.
CString stores ANSI in an ANSI application and Unicode in a UNICODE app.
If you app. is Unicode, there is no need to use

But question marks are usualy the result of bad code page conversions.
Are you sure there are no conversions happening
(maybe in m_wordingList.insert, or in MessageWordingListPair)?

Mihai
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top