MSXML leaves out encoding when using .NET

J

Jeroen

We're using MSXML to transform the XML document we have to an XHTML
file using an XSLT. Now the problem is that the dotnet implementation
we made does something subtly different from the commandline call to
MSXML. The problem is that the dotnet variant leaves out a piece of
info on the charset, leading to the browser going to a default encoding
instead of the wanted UTF-8.

MSXML2.DOMDocument40Class stylesheet = new
MSXML2.DOMDocument40Class();
stylesheet.async = false;
source.validateOnParse = false;
stylesheet.load(xsls);
string s = source.transformNode(stylesheet);
System.IO.TextWriter file = System.IO.File.CreateText("path.html");
file.Write(s);


Note that the xslt has a line:
<xsl:blush:utput method="html" indent="yes" encoding="UTF-8" />

This code creates a meta tag different from the commandline version:
<META http-equiv="Content-Type" content="text/html">

Whereas the commandline version of MSXML nicely outputs.
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">

Anyone have a clue how to do this? Do I need a
CreateProcessingInstruction for the stylesheet?
 
S

sloan

I think you have another option in DotNet... rather than MSXML2 library.

Here is some code I found as a starter:


public class XMLtoXSLTransformWrapper
{

string debugMsg=null;

public void DoTranslation(string xmlFile, string xslFile, string
outputFile)
{

try
{

//Create a new XslTransform object.
XslTransform xslt = new XslTransform();

//Load the stylesheet.
xslt.Load(xslFile);

//Create a new XPathDocument and load the XML data to be transformed.
XPathDocument mydata = new XPathDocument(xmlFile);

//Create an XmlTextWriter which outputs to the console.
//XmlWriter writer = new XmlTextWriter(Console.Out);

//Transform the data and send the output to the console.
//xslt.Transform(mydata,null,writer, null);
xslt.Transform (xmlFile, outputFile);
}
catch (Exception ex)
{
debugMsg = ex.Message;
Console.WriteLine (debugMsg);

}


}



public XMLtoXSLTransformWrapper()
{
//
// TODO: Add constructor logic here
//
}
}
 
M

Martin Honnen

Jeroen said:
We're using MSXML to transform the XML document we have to an XHTML
file using an XSLT.

Why do you use MSXML with a managed .NET application? With .NET 1.x you
should use System.Xml.Xsl.Xsl.Transform, with .NET 2.0 you should use
System.Xml.Xsl.Xsl.CompiledTransform for XSLT transformations.
string s = source.transformNode(stylesheet);

You get a string result with transformNode,
Note that the xslt has a line:
<xsl:blush:utput method="html" indent="yes" encoding="UTF-8" />

This code creates a meta tag different from the commandline version:
<META http-equiv="Content-Type" content="text/html">

and a string is simply a sequence of Unicode characters that does not
have an encoding. Encoding matters on the byte level, with a COM
application using MSXML you could use transformNodeToObject and
transform to a stream, that way MSXML writes out a charset parameter as
needed. But with .NET you should not use MSXML at all, I doubt its
transformNodeToObject will work with a .NET stream implementation. You
can simply run the transformation with XslTransform or
XslCompiledTransform where the Transform method has various overloads
directly writing to a file or stream.
 
J

Jeroen

Why do you use MSXML with a managed .NET application? With .NET 1.x you
should use System.Xml.Xsl.Xsl.Transform, with .NET 2.0 ...

(We do .net 1.x) Unfortunately, we had serious performance issues with
the dotnet xslt processing libraries. When we encountered those
problems we found through some searching that we could use MSXML
instead. It has been working fine and fast, only the encoding problem
remains.

The weird thing is that MSXML called from the commandline to parse the
xslt does something different (>better) than when called with the code
posted above. The commandline call looks like this:

msxsl.exe data.xml stylesheet.xslt
 
M

Marc Gravell

Yeah; the compiled transforms in 2.0 are quite a bit better...

Without more info, I wouldn't presume to say for sure... but in a number of
cases I *have* seen, the reported performance problems between 1.1, 2.0 and
MSXML were actually more a case of a band-aid - meaning that the xslt itself
simply wasn't written very well, and the different implementations just
highlighted / exacerbated the problem - reworking some of the xslt to
included e.g. Munchean grouping can make a huge difference.

Marc
 
J

Jeroen

Thanks Marc, that gives hope and more incentive to switch to
studio2005/dotnet2.

As a followup on the original problem; I have been trying some new ways
to get msxml to include the charset option in one way or the other. My
latest attempt was to add this line of code...

stylesheet.createProcessingInstruction("xml", "version=\"1.0\"
encoding=\"UTF-8\"");

....which did not solve my problem but still seems the way to look. So
here's a new (rather noob) subquestion, which might help me in my
current quest:

*Does anyone know of a good overview for these processinginstructions??*
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top