XmlTextReader with NetworkStream bug?

M

Matt Stephens

I am trying to use the XmlTextReader class with a NetworkStream to help
simplify parsing xml nodes. My app listens for a TCP connection on port
6000. I am using HyperTerminal to connect and manually type the xml for
testing purposes, however after 2 or 3 characters the XmlTextReader throws
an XmlException with it's message property saying 'The data at the root
level is invalid. Line 1, position 1.'.

If i substitute the NetworkStream for a FileStream then everything works
fine. I then wrote a MyStream class derived from Stream and in all my
overrides just delegated to a Stream object passed into the constructor and
wrote out to the console what methods the XmlTextReader was calling and what
the parameters and return codes were. The only difference was that the
FileStream objects Read method can read the entire file in one go wheras the
NetworkStream only gets 1 character per read, the bytes returned are exactly
the same up until the XmlException is thrown. Is this a bug with
XmlTextReader or am i just missing something?

The following is the code i am experimenting with:

IPAddress address = Dns.Resolve("localhost").AddressList[0];
TcpListener listener = new TcpListener(address,6000);
listener.Start();
TcpClient client = listener.AcceptTcpClient(); // Blocks until
connection made

NetworkStream netstream = client.GetStream();

XmlTextReader xmlreader = new XmlTextReader(netstream);
try
{
while(xmlreader.EOF==false)
{
if(xmlreader.Read()==true)
{
Console.WriteLine("Line: {0} Pos: {1} Name:
{2}",xmlreader.LineNumber,xmlreader.LinePosition,xmlreader.Name);
}
}
}
catch(XmlException e)
{
Console.WriteLine("woops! Line:{0} Pos:{1}
Msg:{2}",e.LineNumber,e.LinePosition,e.Message);
}

Thanks in advance for any help or ideas anyone can give,

Matt Stephens
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
I am trying to use the XmlTextReader class with a NetworkStream to help
simplify parsing xml nodes. My app listens for a TCP connection on port
6000. I am using HyperTerminal to connect and manually type the xml for
testing purposes, however after 2 or 3 characters the XmlTextReader throws
an XmlException with it's message property saying 'The data at the root
level is invalid. Line 1, position 1.'.

It's possible that HyperTerminal is sending telnet negotiation stuff
down its stream, which would certainly confuse the XmlTextReader. I
suggest you get rid of the XmlTextReader bit for the moment and just
see what HyperTerminal is spitting down the pipe by just reading from
the NetworkStream and printing out the byte values of what it receives.
 
M

Matt Stephens

Thanks Jon, I did try exactly what you suggested and saw exactly the ascii
bytes that i typed being sent from HyperTerminal to my application
NetworkStream.
Following that i wrote a Stream derived class which just delegated all
method/property access to the original stream implementation but output to
the console what was being called and what parameters/return codes were.
First thing that XmlTextReader does is gets the CanSeek property, for
FileStreams this returns True so it then gets the file length and for my
small xml file it reads the whole lot in one go. For the NetworkStream
CanSeek returns False and so XmlTextReader tries to read blocks of 4096
bytes at a time. I can type the first 3 bytes into HyperTerminal for which i
can see that 3 calls to Stream.Read() occur and the bytes read match the
first 3 bytes from the file but an exception occurs! Incidentally i also
tried pasting the data into HyperTerminal and i still only get 3 bytes in
before the exception occurs but it does it with 2 reads from the
NetworkStream.
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
Thanks Jon, I did try exactly what you suggested and saw exactly the ascii
bytes that i typed being sent from HyperTerminal to my application
NetworkStream.

Good - that's a good start.
Following that i wrote a Stream derived class which just delegated all
method/property access to the original stream implementation but output to
the console what was being called and what parameters/return codes were.
First thing that XmlTextReader does is gets the CanSeek property, for
FileStreams this returns True so it then gets the file length and for my
small xml file it reads the whole lot in one go. For the NetworkStream
CanSeek returns False and so XmlTextReader tries to read blocks of 4096
bytes at a time. I can type the first 3 bytes into HyperTerminal for which i
can see that 3 calls to Stream.Read() occur and the bytes read match the
first 3 bytes from the file but an exception occurs! Incidentally i also
tried pasting the data into HyperTerminal and i still only get 3 bytes in
before the exception occurs but it does it with 2 reads from the
NetworkStream.

What exception occurs? Still the same one you had before?
 
M

Matt Stephens

Yeah same XmlException with the message 'The data at the root
level is invalid. Line 1, position 1.'.
I'm going to try changing my Stream derived class to hardcode the bytes it
returns and try returning them a byte at a time or blocks of bytes at a time
just to see if that has any change on how XmlTextReader behaves. If you have
any more ideas i'd be glad to hear them.
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
Yeah same XmlException with the message 'The data at the root
level is invalid. Line 1, position 1.'.
I'm going to try changing my Stream derived class to hardcode the bytes it
returns and try returning them a byte at a time or blocks of bytes at a time
just to see if that has any change on how XmlTextReader behaves. If you have
any more ideas i'd be glad to hear them.

If you manage to do that and get a reproducible test case, I'll
certainly have a look at that - beyond that, I'm a bit stuck I'm afraid
:(
 
M

Matt Stephens

Well, i think it's a bug with XmlTextReader! I now have a stream class that
i've written and i pass it some xml in it's constructor.
It's implemented so that the Read() method tries to copy as many characters
as possible to the buffer passed in and this works fine as the entire xml
string is read by XmlTextReader in one go. A simple change to the Read()
method so that it will only return 1 byte for each read operation reporduces
the problem i had with the NetworkStream class, the XmlTextReader gets 3
bytes and then throws an exception. Simply changing the number of bytes
returned on each read from 1 to 2 makes the XmlTextReader happy, at least in
this simple test case.

If anyone wants to see the code i can post it.

What's the best way of reporting this to microsoft?
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
Well, i think it's a bug with XmlTextReader! I now have a stream class that
i've written and i pass it some xml in it's constructor.
It's implemented so that the Read() method tries to copy as many characters
as possible to the buffer passed in and this works fine as the entire xml
string is read by XmlTextReader in one go. A simple change to the Read()
method so that it will only return 1 byte for each read operation reporduces
the problem i had with the NetworkStream class, the XmlTextReader gets 3
bytes and then throws an exception. Simply changing the number of bytes
returned on each read from 1 to 2 makes the XmlTextReader happy, at least in
this simple test case.

If anyone wants to see the code i can post it.

Yes please - or if it's fairly large, feel free to mail it to me.
What's the best way of reporting this to microsoft?

In theory, now that I'm an MVP I'm in a slightly better position to
report bugs. I have yet to test that theory, but I'm happy to use this
as a test case if you like (when I've had a look for myself, of
course).
 
M

Matt Stephens

Ok well here's the code, simple 1 file console app. Be glad to hear your
thoughts.

using System;
using System.Xml;
using System.IO;
using System.Xml.Serialization;
using System.Net;
using System.Net.Sockets;
using System.Text;

namespace XMLStreamParse
{
class MyStream : System.IO.Stream
{
private byte[] _xml; // XML formatted as ascii bytes
private long _pos; // Current position in the
'stream'
private int _maxbytes; // Max bytes to transfer in any one call
to the Read() method

public MyStream(string somexml,int maxbytes)
{
_xml = Encoding.ASCII.GetBytes(somexml); // Copy the xml
string but as bytes since thats what the stream functions use
_pos=0; // Initialise the position to the start
_maxbytes=maxbytes; // Initialise the max bytes to
transfer in one go
}
override public bool CanRead
{
get
{
Console.WriteLine("CanRead");
return true;
}
}
override public bool CanWrite
{
get
{
Console.WriteLine("CanWrite");
return false;
}
}
override public bool CanSeek
{
get
{
Console.WriteLine("CanSeek");
return false;
}
}
override public void Flush()
{
Console.Write("Flush");
}
override public long Length
{
get
{
long length = _xml.Length;
Console.Write("Length: returns {0}",length);
return length;
}
}
public override long Position
{
get
{
Console.WriteLine("Position: returns {0}",_pos);
return _pos;
}
set
{
Console.WriteLine("Position: set to {0}",value);
_pos = value;
}
}
override public int Read(byte[] buffer,int offset,int count)
{
Console.WriteLine("Read() offset={0}
count={1}",offset,count);
if(count > _maxbytes)
{
count = _maxbytes; // Limit the number of bytes
that can be transfered in any one call to this method
}

// Copy the data to the output buffer...
int nread;
for(nread=0 ; nread<count && _pos < _xml.Length ; nread++)
{
buffer[offset+nread] = _xml[_pos++];
}

// Output to the console the actual bytes transferred...
int n;
for(n=0 ; n<nread ; n++)
{
Console.Write("\t{0}",buffer[offset+n]);
}
Console.WriteLine();
return nread;
}
public override long Seek(long offset, SeekOrigin origin)
{
Console.WriteLine("Seek(offset {0}, origin {1}): returns
{2}",offset,origin,_pos);
return _pos;
}
public override void SetLength(long value)
{
Console.WriteLine("SetLength( {0} )",value);
}
public override void Write(byte[] buffer, int offset, int count)
{
Console.WriteLine("Write(buf, offset {0}, count
{1})",offset,count);
}
public override int ReadByte()
{
Console.WriteLine("ReadByte");
int n = base.ReadByte();
Console.WriteLine("\treturns {0}",n);
return n;
}
public override void WriteByte(byte value)
{
Console.WriteLine("WriteByte");
base.WriteByte(value);
Console.WriteLine("\tvalue={0}",value);
}

}
/// <summary>
/// Summary description for XMLStreamParseApp.
/// </summary>
class XMLStreamParseApp
{
/// <summary>
/// The main entry point for the application.
/// </summary>
[STAThread]
static void Main(string[] args)
{
string s = "<?xml version=\"1.0\"
?>\r\n<rootnode><innernode fred=\"boo\"><innerinnernode
attrib=\"x\"/></innernode><innernode fred=\"hello\"/></rootnode>";

MyStream mystream = new MyStream(s,1); // *** Change
the second parameter from 1 to 2 or higher for the XmlStreamReader to work
correctly

XmlTextReader xmlreader = new XmlTextReader(mystream);
// Give XmlTextReader my stream class to work with

try
{
// Simple loop through the nodes in the xml stream..
while(xmlreader.EOF==false)
{
if(xmlreader.Read()==true)
{
Console.WriteLine("Line: {0} Pos: {1}
Name: {2}",xmlreader.LineNumber,xmlreader.LinePosition,xmlreader.Name);

}
}
}
catch(XmlException e)
{
Console.WriteLine("woops! Line:{0} Pos:{1}
Msg:{2}",e.LineNumber,e.LinePosition,e.Message);
}
}
}
}
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
Ok well here's the code, simple 1 file console app. Be glad to hear your
thoughts.

Gosh. Very odd. Well, I've certainly reproduced it. I'll have a closer
look before submitting it, but it does definitely look like a bug.

It should be possible to work round it by putting your own buffering
stream (which never returns only a single byte unless it's at the end
of the stream) in between the XmlTextReader and the NetworkStream, but
it's a bit unsatisfactory to say the least :(
 
M

Matt Stephens

Yeah i had the same thoughts but since i don't really know what the problem
with XmlTextReader is, who's to say it wont exhibit similar issues depending
on where it is in it's processing logic and how many bytes it gets?!
This is the first thing i've looked at doing in .NET and i've got to get
asynchronous socket handling with multiple connections and Services figured
out. I know that .NET is the way forward but i feel like it will be easier
to drop back to C++ and ATL in the short term!!

I'll be very interested to know what response you get from Microsoft about
this though. Thanks for taking the time to look.

Matt Stephens
 
T

Tian Min Huang

Hello Matt,

Thanks for your post.
it returns and try returning them a byte at a time or blocks of bytes at a
time just to see if that has any change on how XmlTextReader behaves.

Please let us know the result and post a sample project which is able to
reproduce the problem. I will be glad to check it on my side and report it
to our Developer Team if confirmed.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
J

Jon Skeet [C# MVP]

Matt Stephens said:
Yeah i had the same thoughts but since i don't really know what the problem
with XmlTextReader is, who's to say it wont exhibit similar issues depending
on where it is in it's processing logic and how many bytes it gets?!

Absolutely - it would be an ugly and unsatisfactory workaround at best.
This is the first thing i've looked at doing in .NET and i've got to get
asynchronous socket handling with multiple connections and Services figured
out. I know that .NET is the way forward but i feel like it will be easier
to drop back to C++ and ATL in the short term!!
:)

I'll be very interested to know what response you get from Microsoft about
this though. Thanks for taking the time to look.

No problem - I'll let you know if I get anywhere. (I see MS is now
taking an interest in the thread themselves anyway - if there's an
appropriate response in the near future I'll leave it there rather than
end up with potentially two items in their tracking system.)
 
T

Tian Min Huang

Dear Matt,

Thanks a lot for your information. I reproduced the problem on my side, and
did not find any report on this issue. I am now contacting our Developer
Team to check it.

Please feel free to let me know if you have any problems or concerns.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
T

Tian Min Huang

Dear Matt,

After further research on this issue, I'd like to share the following
information with you:

By default, XmlTextReader's encoding is UTF8. In this case, the stream uses
encoding ASCII. So, you should use ASCII as the encoding for XmlTextReader.
When I use the following code, it now works properly without any exception.

//--------------code snippet-----------------------
NameTable nt = new NameTable();
XmlNamespaceManager nsmanager = new XmlNamespaceManager(nt);
XmlParserContext context = new
XmlParserContext(nt,nsmanager,null,XmlSpace.None,System.Text.Encoding.ASCII)
;
XmlTextReader xmlreader = new
XmlTextReader(mystream,XmlNodeType.Document,context);
//-----------------end of----------------------------

Please check it on your side.

Have a nice day!

Regards,

HuangTM
Microsoft Online Partner Support
MCSE/MCSD

Get Secure! -- www.microsoft.com/security
This posting is provided "as is" with no warranties and confers no rights.
 
J

Jon Skeet [C# MVP]

Tian Min Huang said:
After further research on this issue, I'd like to share the following
information with you:

By default, XmlTextReader's encoding is UTF8. In this case, the stream uses
encoding ASCII. So, you should use ASCII as the encoding for XmlTextReader.

But ASCII is entirely compatible with UTF-8, so it should work fine
anyway.
When I use the following code, it now works properly without any exception.

//--------------code snippet-----------------------
NameTable nt = new NameTable();
XmlNamespaceManager nsmanager = new XmlNamespaceManager(nt);
XmlParserContext context = new
XmlParserContext(nt,nsmanager,null,XmlSpace.None,System.Text.Encoding.ASCII)
;
XmlTextReader xmlreader = new
XmlTextReader(mystream,XmlNodeType.Document,context);
//-----------------end of----------------------------

That works, but it also works fine if you specify Encoding.UTF8 instead
of Encoding.ASCII - in other words, it's not *which* encoding you
specify, but that you've already specified the encoding at all, so it
doesn't need to try to find it.

This still definitely looks like a bug to me.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top