XmlTextReader.ReadString doesn't parse \x0095 character correctly

G

Guest

I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the \x0095 are interpreted literally.

However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is that \x0095 is a delimiter and I want to use the Split function to break the string up into an array, like this: -

string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
J

Jon Skeet [C# MVP]

I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string,
the \x0095 are interpreted literally.

However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong?

Sure - you're assuming that C# source code escape sequences and XML
escape sequences are the same. They're not.

If you want to include Unicode character 0x95 in your XML, use •
 
N

Nick Malik

the "\x" hex designator is C# syntax, not XML syntax.

To place Hex 95 into XML, you will need to use an XML code.
•

However, I don't know if that character is valid in XML. You may need to
encode the entire string in Base64.

--- Nick

"David@[email protected]"
I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the
\x0095 are interpreted literally.
However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is
that \x0095 is a delimiter and I want to use the Split function to break the
string up into an array, like this: -
string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
G

Guest

Thanks Jon/Nick - that's very helpful. Unfortunately the xml file is being passed to me from another system, so I can't control what's in it. All I want to do is break this string into it's designated parts. Unless either of you have any other suggestions, I'll probably look at Regex.Split to see if that will allow me to split on a multi-character delimiter (although I've a feeling the backslash might complicate matters...).

Thanks again.

David.

Nick Malik said:
the "\x" hex designator is C# syntax, not XML syntax.

To place Hex 95 into XML, you will need to use an XML code.
•

However, I don't know if that character is valid in XML. You may need to
encode the entire string in Base64.

--- Nick

"David@[email protected]"
I have an xml file that, for example, contains the following element: -

<data>\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4</data>

If I use XmlTextReader.ReadString() to read this data into a string, the
\x0095 are interpreted literally.
However, the following code works fine: -

string a = "\x0095 blah1 \x0095 blah2 \x0095 blah3 \x0095 blah4";

Can anyone please explain what I'm doing wrong? The point of all this is
that \x0095 is a delimiter and I want to use the Split function to break the
string up into an array, like this: -
string[] b = a.Split(new char[] {'\x0095'});

Thanks in advance.

David.
 
J

Jon Skeet [C# MVP]

David said:
Thanks Jon/Nick - that's very helpful. Unfortunately the xml file is
being passed to me from another system, so I can't control what's in
it. All I want to do is break this string into it's designated parts.
Unless either of you have any other suggestions, I'll probably look
at Regex.Split to see if that will allow me to split on a
multi-character delimiter (although I've a feeling the backslash
might complicate matters...).

Backslash will complicate it, but shouldn't do so *that* much.

One option would be to convert the file first, turning \x0095 into
• everywhere, and *then* load it in.
 
G

Guest

I went for the Regex.Split option in the end - it was quite straightforward, I just needed to escape the backslash. So, it was @"\\x0095".
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top