Non-ascii characters in VS.NET service

M

Mike Schilling

I've created a simple .NET 1.1 web service using VS.NET 2003: it has one
method that takes a string parameter. It iterates through the input string,
turning each character into hex and appending it to an output string, and
returns the result.

I now send this service SOAP messages containing non-ASCII characters in the
field that becomes the input string. Each SOAP message has an XML header
that correctly describes the format of the non-ASCII characters. (I've
tried both iso-8859-1 and utf-8).

For some reason, each XML character that's non-ASCII has been turned into a
question mark "?" in the input string. Actually, a UTF-8 character that
contains two bytes becomes two question marks. This happens before any of
my code (or any code VS.NET is willing to show me) runs, so I'm at a loss to
know how to investigate it. Question marks are often generated when trying
to represent a character in a character set that doesn't contain it, but in
this case the target is a C# string, which can represent any Unicode
character.

I'd appreciate any insights about this.
 
J

Jon Skeet [C# MVP]

Mike Schilling said:
I've created a simple .NET 1.1 web service using VS.NET 2003: it has one
method that takes a string parameter. It iterates through the input string,
turning each character into hex and appending it to an output string, and
returns the result.

How is it turning the character into hex?
I now send this service SOAP messages containing non-ASCII characters in the
field that becomes the input string. Each SOAP message has an XML header
that correctly describes the format of the non-ASCII characters. (I've
tried both iso-8859-1 and utf-8).

What do you mean by "an XML header"? It should just be in the XML
delcaration.
For some reason, each XML character that's non-ASCII has been turned into a
question mark "?" in the input string. Actually, a UTF-8 character that
contains two bytes becomes two question marks.

That suggests that whatever's producing the XML file is wrong, *or*
that you're looking at the XML in an inappropriate editor. How are you
looking at the XML?
 
M

Mike Schilling

Jon Skeet said:
How is it turning the character into hex?

ret += "0x" + ((int)s).ToString("X");
What do you mean by "an XML header"? It should just be in the XML
delcaration.

Exactly. Each SOAP message specifies the correct encoding in its XML
declaration, as shown below.
That suggests that whatever's producing the XML file is wrong, *or*
that you're looking at the XML in an inappropriate editor. How are you
looking at the XML?

<?xml version="1.0" encoding="iso-8859-1"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="
http://www.w3.org/2001/XMLSchema"
xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<ToHex xmlns="http://tempuri.org/">
<s>aéìæf</s>
</ToHex>
</soap:Body>
</soap:Envelope>

NNTP is likely to garble the non-ASCII characters, but in hex the string
inside the <s> tags is

141 351 354 346 146

verified by using od -b.

In iso-8859-1, these are respectively

a, e with an acute accent, i with a grave accent, ae, f

And I can parse the same file locally and observe that it's correct (e.g.
with the program below). It only acts oddly when processed by the web
service.

using System;
using System.Xml;

namespace XMLParser
{
class ParseXML
{
static void Main(string[] args)
{
XmlDocument doc = new XmlDocument();
doc.Load("c:\\java\\toHex.xml");
dumpStrings(doc);
Console.WriteLine("<Done>");
}

private static void dumpStrings(XmlNode node)
{
if (node is XmlCharacterData)
{
Console.Out.WriteLine(node.Value);
}
else
{
for (XmlNode child = node.FirstChild;
child != null;
child = child.NextSibling)
{
dumpStrings(child);
}
}
}
}
}
 
J

Jon Skeet [C# MVP]

And I can parse the same file locally and observe that it's correct (e.g.
with the program below). It only acts oddly when processed by the web
service.

That's pretty odd :(

I've passed non-ASCII characters in web services before with no
problems... this is very odd.

Do you have a solution with just a web service and just a test app that
I could have a look at?
 
M

Mike Schilling

Jon Skeet said:
That's pretty odd :(

I've passed non-ASCII characters in web services before with no
problems... this is very odd.

Do you have a solution with just a web service and just a test app that
I could have a look at?

Here's the web service:

using System;
using System.Collections;
using System.ComponentModel;
using System.Data;
using System.Diagnostics;
using System.Web;
using System.Web.Services;

namespace HexString
{
public class Service1 : System.Web.Services.WebService {
public Service1() {
InitializeComponent();
}

private IContainer components = null;

private void InitializeComponent() { }

protected override void Dispose( bool disposing ) {
if(disposing && components != null) {
components.Dispose();
}
base.Dispose(disposing);
}

[WebMethod]
public string ToHex(String s) {
String ret = "";
for (int i = 0; i < s.Length; i++) {
ret += "0x" + ((int)s).ToString("X");
if (i < s.Length - 1)
ret += ", ";
}
return ret;
}
}
}

and here is the client

using System;
using System.Text;
using System.Net;

namespace HexStringClient {
class Client {
[STAThread]
static void Main(string[] args) {
WebClient wc = new WebClient();
byte[] bytes = Encoding.GetEncoding("iso-8859-1").GetBytes(doc);

try {
wc.Headers.Add("SOAPAction", "\"http://tempuri.org/ToHex\"");
wc.Headers.Add("content-type", "text/xml");
byte [] response =
wc.UploadData("http://localhost/HexString/Service1.asmx",
"POST", bytes);
Console.Out.WriteLine(Encoding.ASCII.GetString(response));
}
catch (Exception ex) {
Console.Out.WriteLine(ex);
}
}

static String doc =
"<?xml version='1.0' encoding='iso-8859-1'?>\n" +
"<soap:Envelope xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'\n"
+
"xmlns:xsd='http://www.w3.org/2001/XMLSchema'\n" +
"xmlns:soap='http://schemas.xmlsoap.org/soap/envelope/'>\n" +
"<soap:Body>\n" +
"<ToHex xmlns='http://tempuri.org/'>\n" +
" <s>a\u00e9\u00ec\u00e6f</s>\n" +
"</ToHex>\n" +
"</soap:Body>\n" +
"</soap:Envelope>";
}
}
 
J

Jon Skeet [C# MVP]

Mike Schilling said:
Here's the web service:

Got it :)

The SOAP handler is using the content type at the HTTP level to decode
the data. If you change your client content type line to:

wc.Headers.Add("content-type", "text/xml; charset=ISO-8859-1");

then it works.
 
M

Mike Schilling

Jon Skeet said:
Got it :)

The SOAP handler is using the content type at the HTTP level to decode
the data. If you change your client content type line to:

wc.Headers.Add("content-type", "text/xml; charset=ISO-8859-1");

then it works.

So it does. Thanks.

Now, how did you figure this out?
 
J

Jon Skeet [C# MVP]

Mike Schilling said:
So it does. Thanks.

Now, how did you figure this out?

With the help of your app, I put a break point in the method. Go up the
stack a few levels, have a look at the HttpRequest involved, look at
what it thinks the content encoding is, and hope :)
 
M

Mike Schilling

Jon Skeet said:
With the help of your app, I put a break point in the method. Go up the
stack a few levels, have a look at the HttpRequest involved, look at
what it thinks the content encoding is, and hope :)

Odd. Using VS.NET 2003, the only call stack I get is

hexstring.dll!HexString.Service1.ToHex(string s = "a???f") Line 57 C#
<non-user code>

But I can get the HttpRequest from the current stack frame and see that its
encoding is UTF-8. OK, let's chage the client to send a UTF-8 string but
leave the ciontent encoding unspecified. No, that still fails.

Trying a few more things gives this reults:

Column 1: Encoding specified in content-type header
Column 2: Value of HttpRequest.ContentEncoding
Column 3. Apparent effective encoding

<none> UTF8 ASCII
UTF8 UTF8 UTF8
ISO-8859-1 ISO-8859-1 ISO-8859-1
ASCII ASCII ASCII

I don't entirely understand line 1, but I do know how to solve the problem.
Thanks!
 
J

Jon Skeet [C# MVP]

Mike Schilling said:
Odd. Using VS.NET 2003, the only call stack I get is

hexstring.dll!HexString.Service1.ToHex(string s = "a???f") Line 57 C#
<non-user code>

You need to show the non-user code in order to get further up the
stack.
But I can get the HttpRequest from the current stack frame and see that its
encoding is UTF-8. OK, let's chage the client to send a UTF-8 string but
leave the ciontent encoding unspecified. No, that still fails.

Trying a few more things gives this reults:

Column 1: Encoding specified in content-type header
Column 2: Value of HttpRequest.ContentEncoding
Column 3. Apparent effective encoding

<none> UTF8 ASCII
UTF8 UTF8 UTF8
ISO-8859-1 ISO-8859-1 ISO-8859-1
ASCII ASCII ASCII

I don't entirely understand line 1, but I do know how to solve the problem.
Thanks!

That's very odd...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top