XML Document memory usage

N

not_a_commie

Here is my goal:

1. Read in a (really large) UTF8 XML document.
2. Put a certain element in that document into a byte[] still encoded
in UTF8.
3. Avoid ever converting the document or any of its significant
children into UTF16.

If I load my document using an XmlReader passed into XDocument.Load
(or XmlDocument.Load) does it leave it in the original format until I
call some method that returns a string? Is there some type of reader
that would have this behavior?

As far as extracting the element, my first thought was to use an
XmlWriter into a MemoryStream (with the above assumption that my
document hadn't already been converted into UTF16). However, I can't
get an appropriately-sized byte[] out of it that way without making a
copy after I've written the data. Hence, I've used at least 2x memory
anyway with that approach. Any ideas?
 
M

Martin Honnen

not_a_commie said:
Here is my goal:

1. Read in a (really large) UTF8 XML document.
2. Put a certain element in that document into a byte[] still encoded
in UTF8.
3. Avoid ever converting the document or any of its significant
children into UTF16.

If I load my document using an XmlReader passed into XDocument.Load
(or XmlDocument.Load) does it leave it in the original format until I
call some method that returns a string? Is there some type of reader
that would have this behavior?

XmlReader exposes .NET strings (e.g. the Value property), as do XmlNode
(e.g. InnerText or Value) as do XElement/XDocument (e.g. Value
property). You are not dealing with bytes at all that way.
So whether the original XML document is UTF-8 or UTF-16 or another
encoding, all those APIs give you .NET strings (which are sequences of
UTF-16 encoded characters). If you want the original bytes then you need
a FileStream.
 
N

not_a_commie

Yes, I recognize that all that everything is exposed as strings in the
XML classes. I'm asking: does the XML classes store the data
internally as strings as well? In other words, does it build up utf16
strings as it parses the input?
 
M

Martin Honnen

not_a_commie said:
Yes, I recognize that all that everything is exposed as strings in the
XML classes. I'm asking: does the XML classes store the data
internally as strings as well? In other words, does it build up utf16
strings as it parses the input?

I have never looked at the implementation but I am sure the DOM
(XmlDocument/XmlElement/XmlNode) and LINQ to XML
(XDocument/XElement/XNode) store .NET strings, it does not make any
sense to store bytes in different encodings and to decode any time a
Value property or Name property is accessed.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top