Encoding problem

D

Demon News

I'm trying to do a transform (Using XmlTransform class in c#) and in the
Transform I'm specifying the the output xsl below:

<xsl:blush:utput method="xml" encoding="UTF-8" indent="no"/>

the resulting xml ends up with the following declaration:

<?xml version="1.0" encoding="utf-16"?>

changing the encoding to utf-16, is there something I'm doing wrong? Is it
possible to make the resulting xml declaration omit the encoding all
together?

thanks
 
M

Martin Honnen

Demon said:
I'm trying to do a transform (Using XmlTransform class in c#) and in the
Transform I'm specifying the the output xsl below:

<xsl:blush:utput method="xml" encoding="UTF-8" indent="no"/>

the resulting xml ends up with the following declaration:

<?xml version="1.0" encoding="utf-16"?>

It depends on where you transform to, if you transform to a stream then
the Transform method should use the encoding the <xsl:blush:utput> element
specifies. If you transform to a string then UTF-16 is the encoding.
See
<http://msdn.microsoft.com/library/d...ide/html/cpconInputsOutputsToXslTransform.asp>
for details.
 
D

Demon News

Thanks for your help and the link. I've changed my code to use a
MemoryStream and it works perfectly.
 
D

Demon News

Actually, Now I'm playing with it, using a Memory stream I can't get it to
come out as anything other than UTF-8. Even if I change the encoding to
UTF-16 or anything else it still comes out as UTF-8 now. Any idea what's
happening?
 
P

Pascal Schmitt

Demon said:
Actually, Now I'm playing with it, using a Memory stream I can't get it to
come out as anything other than UTF-8. Even if I change the encoding to
UTF-16 or anything else it still comes out as UTF-8 now. Any idea what's
happening?

The generated XML won't change when you change the Encoding after the
transform...
It may be written to disk as UTF-16, but the string that is written
contains the characters "encoding='utf-8'"...
 
M

Martin Honnen

Demon said:
Actually, Now I'm playing with it, using a Memory stream I can't get it to
come out as anything other than UTF-8. Even if I change the encoding to
UTF-16 or anything else it still comes out as UTF-8 now. Any idea what's
happening?

Not really, you need to show some code and explain in detail where you
change the encoding, where you transform to, how you look at the result
then. For instance if you transform to a stream and then want to have a
string to look at you obviously need to read out the bytes in the stream
and convert them to a string.

Here is a short C# example:

using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;

public class Test2005090601 {
public static void Main (string[] args) {
string[] encodings = { "UTF-8", "UTF-16", "ISO-8859-1" };
XmlDocument xslStylesheet = new XmlDocument();
xslStylesheet.Load(@"test2005090601Xsl.xml");
XmlNamespaceManager namespaceManager = new
XmlNamespaceManager(xslStylesheet.NameTable);
namespaceManager.AddNamespace("xsl",
"http://www.w3.org/1999/XSL/Transform");
XmlAttribute encodingAttribute =
xslStylesheet.SelectSingleNode("/xsl:stylesheet/xsl:blush:utput/@encoding",
namespaceManager) as XmlAttribute;
XslTransform xsltProcessor = new XslTransform();
foreach (string encoding in encodings) {
MemoryStream memStream = new MemoryStream();
encodingAttribute.Value = encoding;
xsltProcessor.Load(xslStylesheet);
xsltProcessor.Transform(xslStylesheet, null, memStream, null);
memStream.Position = 0;
byte[] resultBytes = new byte[memStream.Length];
memStream.Read(resultBytes, 0, (int)memStream.Length);
string resultString =
System.Text.Encoding.GetEncoding(encoding).GetString(resultBytes);
Console.WriteLine("Transformation result is: {0}.", resultString);
Console.WriteLine();
}
}
}

For simplicity I have choosen a simple stylesheet producing a static
result and run it against itself:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:blush:utput method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="/">
<test xml:lang="de">abcäöü€</test>
</xsl:template>

</xsl:stylesheet>

Now when I run that program in a Windows console the output is

Transformation result is: ?<?xml version="1.0" encoding="utf-8"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: ?<?xml version="1.0" encoding="utf-16"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: <?xml version="1.0" encoding="iso-8859-1"?>
<test xml:lang="de">abcäöü?</test>.

So the encoding in the XML declaration written to the stream is as
specified by the stylesheet in the <xsl:blush:utput>.
 
D

Demon News

I don't quite understand, you seem to be changing the encoding after the
transform. Surely the point of specifying the encoding attribute in the
output element is that it outputs to the specified encoding at point of
transform and changes the encoding attribute in the resulting xml
declaration. If not why bother with an encoding attribute in the xsl:blush:utput
element of the transform?

The code I've been using is:

// sXMLDoc is the xml string to transform
// sXSLPath is the path to the xslt transform file
public string Transform(string sXMLDoc, string sXSLPath)
{
//read XML creating the XPathDocument
XmlTextReader trXML = new XmlTextReader(new StringReader(sXMLDoc));
XPathDocument xpDoc = new XPathDocument(trXML);

// Create the Transform and Resolver objects for later use
XslTransform xslt = new XslTransform ();
XmlUrlResolver xur = new XmlUrlResolver();

// Load the xsl from the given path
// NB Has to be loaded from a path otherwise resolving include statements
won't work
XmlDocument xslDoc = new XmlDocument();
xslDoc.Load(sXSLPath);

// Load the transform into the transform class using the Resolver
// to resolve any includes
xslt.Load(xslDoc,xur,this.GetType().Assembly.Evidence);

//create the output stream (must be a stream to take account of the
xsl:blush:utput attributes
System.IO.Stream str = new MemoryStream();

//Transform it writing to the output stream
//using the resolver again for any extra external references
xslt.Transform(xpDoc,null,str,xur);

// Make sure to Flush and reset the position before reading
str.Flush();
str.Position = 0;
StreamReader sr = new StreamReader(str);
string xmlOut = sr.ReadToEnd();

//get result
return xmlOut;
}




Martin Honnen said:
Demon said:
Actually, Now I'm playing with it, using a Memory stream I can't get it
to
come out as anything other than UTF-8. Even if I change the encoding to
UTF-16 or anything else it still comes out as UTF-8 now. Any idea what's
happening?

Not really, you need to show some code and explain in detail where you
change the encoding, where you transform to, how you look at the result
then. For instance if you transform to a stream and then want to have a
string to look at you obviously need to read out the bytes in the stream
and convert them to a string.

Here is a short C# example:

using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;

public class Test2005090601 {
public static void Main (string[] args) {
string[] encodings = { "UTF-8", "UTF-16", "ISO-8859-1" };
XmlDocument xslStylesheet = new XmlDocument();
xslStylesheet.Load(@"test2005090601Xsl.xml");
XmlNamespaceManager namespaceManager = new
XmlNamespaceManager(xslStylesheet.NameTable);
namespaceManager.AddNamespace("xsl",
"http://www.w3.org/1999/XSL/Transform");
XmlAttribute encodingAttribute =
xslStylesheet.SelectSingleNode("/xsl:stylesheet/xsl:blush:utput/@encoding",
namespaceManager) as XmlAttribute;
XslTransform xsltProcessor = new XslTransform();
foreach (string encoding in encodings) {
MemoryStream memStream = new MemoryStream();
encodingAttribute.Value = encoding;
xsltProcessor.Load(xslStylesheet);
xsltProcessor.Transform(xslStylesheet, null, memStream, null);
memStream.Position = 0;
byte[] resultBytes = new byte[memStream.Length];
memStream.Read(resultBytes, 0, (int)memStream.Length);
string resultString =
System.Text.Encoding.GetEncoding(encoding).GetString(resultBytes);
Console.WriteLine("Transformation result is: {0}.", resultString);
Console.WriteLine();
}
}
}

For simplicity I have choosen a simple stylesheet producing a static
result and run it against itself:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:blush:utput method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="/">
<test xml:lang="de">abcäöü?</test>
</xsl:template>

</xsl:stylesheet>

Now when I run that program in a Windows console the output is

Transformation result is: ?<?xml version="1.0" encoding="utf-8"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: ?<?xml version="1.0" encoding="utf-16"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: <?xml version="1.0" encoding="iso-8859-1"?>
<test xml:lang="de">abcäöü?</test>.

So the encoding in the XML declaration written to the stream is as
specified by the stylesheet in the <xsl:blush:utput>.
 
D

Demon News

Typical, as soon as I post, I find out the problem... My XSLT had an include
and the included xslt had an output encoding attribute specified of UTF-8
which was over riding what ever I put into the parent xslt.

Thanks for the help.


Martin Honnen said:
Demon said:
Actually, Now I'm playing with it, using a Memory stream I can't get it
to
come out as anything other than UTF-8. Even if I change the encoding to
UTF-16 or anything else it still comes out as UTF-8 now. Any idea what's
happening?

Not really, you need to show some code and explain in detail where you
change the encoding, where you transform to, how you look at the result
then. For instance if you transform to a stream and then want to have a
string to look at you obviously need to read out the bytes in the stream
and convert them to a string.

Here is a short C# example:

using System;
using System.IO;
using System.Xml;
using System.Xml.Xsl;

public class Test2005090601 {
public static void Main (string[] args) {
string[] encodings = { "UTF-8", "UTF-16", "ISO-8859-1" };
XmlDocument xslStylesheet = new XmlDocument();
xslStylesheet.Load(@"test2005090601Xsl.xml");
XmlNamespaceManager namespaceManager = new
XmlNamespaceManager(xslStylesheet.NameTable);
namespaceManager.AddNamespace("xsl",
"http://www.w3.org/1999/XSL/Transform");
XmlAttribute encodingAttribute =
xslStylesheet.SelectSingleNode("/xsl:stylesheet/xsl:blush:utput/@encoding",
namespaceManager) as XmlAttribute;
XslTransform xsltProcessor = new XslTransform();
foreach (string encoding in encodings) {
MemoryStream memStream = new MemoryStream();
encodingAttribute.Value = encoding;
xsltProcessor.Load(xslStylesheet);
xsltProcessor.Transform(xslStylesheet, null, memStream, null);
memStream.Position = 0;
byte[] resultBytes = new byte[memStream.Length];
memStream.Read(resultBytes, 0, (int)memStream.Length);
string resultString =
System.Text.Encoding.GetEncoding(encoding).GetString(resultBytes);
Console.WriteLine("Transformation result is: {0}.", resultString);
Console.WriteLine();
}
}
}

For simplicity I have choosen a simple stylesheet producing a static
result and run it against itself:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">

<xsl:blush:utput method="xml" encoding="UTF-8" indent="yes" />

<xsl:template match="/">
<test xml:lang="de">abcäöü?</test>
</xsl:template>

</xsl:stylesheet>

Now when I run that program in a Windows console the output is

Transformation result is: ?<?xml version="1.0" encoding="utf-8"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: ?<?xml version="1.0" encoding="utf-16"?>
<test xml:lang="de">abcäöü?</test>.

Transformation result is: <?xml version="1.0" encoding="iso-8859-1"?>
<test xml:lang="de">abcäöü?</test>.

So the encoding in the XML declaration written to the stream is as
specified by the stylesheet in the <xsl:blush:utput>.
 
M

Martin Honnen

Demon said:
I don't quite understand, you seem to be changing the encoding after the
transform.

No, I first change the encoding in the stylesheet and then do the
transform and simple do that for several example encodings to
demonstrate that the encoding in the stylesheet document passed to
XslTransform is what determines the encoding in the XML declaration of
the output written to the stream.

It is a simple test case which keeps the demonstration short, otherwise
I would have to include three different stylesheets for that demonstration.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top