UTF-8 encoding problem

S

shreshth.luthra

Hi All,

I am having a GUI which accepts a Unicode string and searches a given
set of xml files for that string.

Now, i have 2 XML files both of them saved in UTF-8 format, having
characters of different language.

Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.

Now, when i search for some different langauge character in that
directory using a third party GUI for desktop search, it shows that the
charcter exist in the first file (in which XML declation was also
there), but not in the second file (having only BoM)

Initilally i thought that the problem is mainly because of UTF-8 being
supporting both MultiBye and Unicode, but could not find much on it.

Please help.

Regards,
Shreshth
 
J

Jay B. Harlow

Shreshth,
Although both of them are having UTF-8 as BoM, but only first file is
having UTF-8 defined in XML declration at the top of the XML file as
well.
What does the second file have in its XML declaration (what specifically
does its declaration look like)?

Sounds like you have a bug in the application that wrote the second Xml
file.

I suspect (hope) when that application created the Xml (the XmlWriter) it
encoded the characters per what the Xml declaration states. I would then
expect (but not hope) when it (the underlying text writer) wrote the file,
it "transposed" (read mangled) the correctly encoded characters into UTF-8.
I consider this double transposition to be bad, very bad.
 
S

shreshth.luthra

By xml declaration at the beginning of the file,i mean to say the XML
Declaration having the "encoding" attribute at the begining of file
(Encoding = UTF-8, do not remeber the exact format). It is the same as
MSDN says.

Do you still mean to say the same in that case as well.
Actually i am not not able to understand completely what exact you want
to say.

By the way, XML write here is Notepad.

Thanks for your reply.

Shreshth
 
J

Jay B. Harlow

Shreshth
By xml declaration at the beginning of the file,i mean to say the XML
Declaration having the "encoding" attribute at the begining of file
(Encoding = UTF-8, do not remeber the exact format). It is the same as
MSDN says.
Yes, but what specifically does your file say (cut & paste the one from your
file into your response to this message)... Alternatively email them to me.
By the way, XML write here is Notepad.
Ah! There's the rub!

What I am saying is the "encoding" of your physical file (the one on disk)
is different then the logical file (the xml itself). (My example may have
been backwards, but the net effect is the same, the characters are not
encoded to what you think they are).

It sounds like your physical file is UTF-8, while I'm concerned your logical
file is whatever, where whatever is the text you blindly copied from an MSDN
article.
 
S

shreshth.luthra

Hi Jay,

<?xml version="1.0" encoding="UTF-8" ?>
This is the XML Declaration i was speaking about.

Rest of the file is the same as normal XML file.

I will try what you have told me in the office tomorrow but one thing i
can tell you right now is that I have already tried the same file
(having only BoM and not XML declaration)
by saving it in UTF-16 LE and UTF-16 BE.

And my third party desktop search works with both of them.

Only problem is with the UTF-8 format.
Thanks.

Shreshth
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top