XML file and UTF8

P

Peter Holschbach

Hi,

I have a UTF8 coded XML file, where I have to translate some text and save
it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
.....
XmlNode node = ..... // find the right node
node.InnerText = "°C"; // this is the short story, the translated text
is read out of an excel file
.....
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

The result is a XML file with the "°" symbol as a 3 byte ANSI coded value
(EF BF BD) not like in the original file coded as a 2 byte UTF8 value (C2
B0).

What can I do to store the XML file in UTF8 ?
 
M

Martin Honnen

Peter said:
I have a UTF8 coded XML file, where I have to translate some text and
save it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
....
XmlNode node = ..... // find the right node
node.InnerText = "°C"; // this is the short story, the translated
text is read out of an excel file

If the original XML is UTF-8 encoded, why don't you simply call
xmlDoc.Save("xml_output.xml");
? That way the encoding is certainly not changed.
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

That should also save as UTF-8.
The result is a XML file with the "°" symbol as a 3 byte ANSI coded
value (EF BF BD) not like in the original file coded as a 2 byte UTF8
value (C2 B0).

ANSI coded? Which ANSI code page would encode "°" with those three bytes?
Are you sure when you set the InnerText that you insert the character
'°'? Maybe when you read from Excel somehow decoding already does not do
what you want.
 
D

Dude

Hi,

I have a UTF8 coded XML file, where I have to translate some text and save
it with another file name. The result shall be UTF8 coded.
Here what I did:

XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load("xml_input.xml");
....
XmlNode node = .....         // find the right node
node.InnerText = "°C";      // this is the short story, the translated text
is read out of an excel file
....
StreamWriter sw = new StreamWriter("xml_output.xml", false,Encoding.UTF8);
xmlDoc.Save(sw);
sw.Close();

The result is a XML file with the "°" symbol as a 3 byte ANSI coded value
(EF BF BD) not like in the original file coded as a 2 byte UTF8 value (C2
B0).

What can I do to store the XML file in UTF8 ?

xmlDoc.Save("xml_output.xml");

The default is UTF8
 
P

Peter Holschbach

Hi Martin,
If the original XML is UTF-8 encoded, why don't you simply call
xmlDoc.Save("xml_output.xml");
? That way the encoding is certainly not changed.

That is what I did first :). Same result as doing it in this way.
That should also save as UTF-8.

Yes, this is what I have expected.
ANSI coded? Which ANSI code page would encode "°" with those three bytes?
Are you sure when you set the InnerText that you insert the character '°'?
Maybe when you read from Excel somehow decoding already does not do what
you want.

For sure the text in Excel is not in UTF8.
And I have try it with the example code too (using the string "°"). As far
as I understood the string in C# is not coded in UTF8.

thx
Peter
 
B

Bjørn Brox

Peter Holschbach skrev:
Hi,
....

The result is a XML file with the "°" symbol as a 3 byte ANSI coded
value (EF BF BD) not like in the original file coded as a 2 byte UTF8
value (C2 B0).
Isn't "EF BF BD", Unicode Character 'REPLACEMENT CHARACTER' U+FFFD
(displayed as a question mark in a black diamond) in some systems used
to tell that it could not decode a multibyte-encoded text correctly or
the text is damaged?

It sound's more that is is your tool to check the result that is wrong
or configured to expect another encoding.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top