Optimizations to Reduce Memory Consumption and Minimize Paging

M

Mike S

I'm working on a .NET application that was under development by a
previous developer, who is no longer working with the company. The
program basically extracts records from a source database and outputs
them in an XML document, mapping the source records into a third-party
XML schema. The mapping process is handled by C# code that was
auto-generated by Altova MapForce -- code was added to extract a subset
of records from the source database into a temporary database, and the
MapForce code is then executed against this subset, to prevent
unnecessary processing. When this subset contains tens of thousands of
records (which is common for the databases the application will run
against), the performance of the auto-generated mapping code is
horrendous. The developer who had originally worked on this application
knew this was unacceptable and he modified parts of the auto-generated
code to increase performance -- this sped things up considerably.
However, when I look at the untouched auto-generated code and compare
it to his optimized version, I'm wondering how it actually runs better.
The (much simplified) original auto-generated code did something like
this:

// code starts here
using System.Xml;

// this method kicks off the mapping process
public void Run(string targetFilename)
{
XmlDocument targetDoc;

targetDoc = new XmlDocument();
targetDoc.LoadXml("<root></root>");

// call functions to do mapping
loopCustomers(targetDoc);
loopOrders(targetDoc);
// ...corresponding loop*() functions for the other DB tables...

}

public void loopCustomers(XmlDocument doc)
{
// CustomerNodeSet is a collection of XmlNodes filled by a DB query
CustomerNodeSet cs = new CustomerNodeSet("SELECT * FROM
[Customers]");
while (cs.hasNextNode())
{
doc.DocumentElement.AppendChild(cs.getNextNode());
}

}

public void loopOrders(XmlDocument doc)
{
// same business as loopCustomers()
}

In the optimized version, he changed the loop*() methods. For example,
loopCustomers() was modified as follows:

public void loopCustomers(XmlDocument doc)
{
// CustomerNodeSet is a collection of XmlNodes filled by a DB query
CustomerNodeSet cs = new CustomerNodeSet("SELECT * FROM
[Customers]");

// keep track of records we've seen
int numRecords;
// for naming temp files
int tempFileID;

while (cs.hasNextNode())
{
doc.DocumentElement.AppendChild(cs.getNextNode());

// where the magic happens
++numRecords;
if (numRecords == 2000)
{
// save what's in the XmlDocument so far to a temp file...
++tempFileID;
doc.Save("C:\tmpCustomer" + tempFileID + ".xml");

//...and try to reclaim the memory those 2000 records were
occupying
doc = null;
GC.Collect();

// reset numRecords so that this can happen again after the next
2000 records are read
numRecords = 0;
} // end if (numRecords == 2000)

} // end while (cs.hasNextNode())

}

Now, from my understanding of how garbage collection works in .NET, I
would expect that lines that attempt to reclaim memory will never
actually reclaim the memory doc was using. I base this on the simple
observation that the parameter "doc" is passed by-value. Because of
this, the call

loopCustomers(targetDoc);

that appears in Run() will pass a copy of a reference to targetDoc for
the "doc" argument in loopCustomers(). So when doc is set to null in
loopCustomers(), only the *copy* of the reference (the one on the
stack) is set to null, but targetDoc is still non-null. Thus, when the
garbage collection is forced to happen with GC.Collect(), targetDoc
will never be collected and in fact can't be collected at anytime
inside the loopCustomers() method, because targetDoc in Run() will be
non-null.

The obvious solution is to pass the XmlDocument to loopCustomers() by
reference instead of by value, so that only one reference to it will
exist while loopCustomers() is running, thereby allowing it to be
collected inside loopCustomers().

However....Since my collegue's code *did* in fact use pass-by-value
semantics, shouldn't his "optimized" version have run just as slow (or
slower) than the unoptimized version, since his XmlDocument object
would never be collected while inside of one of the loop*() functions?
The XmlDocument would keep growing and growing with each successive
call to AppendChild(), just as it did in the unoptimized version, but
it would never be freed since targetDoc in Run() would still be holding
a reference.

Am I missing something here? I would definitely expect his code to run
as least as poorly as the unoptimized code, if not slower -- I simply
can't see how it turns out that it's faster...
 
B

Barry Kelly

Mike S said:
However, when I look at the untouched auto-generated code and compare
it to his optimized version, I'm wondering how it actually runs better.

Without a compiling sample, it's hard to investigate properly. Have you
tried profiling the two side by side (even simply Performance Monitor
will do), and checking how much time is being spent in the GC?

The most obvious "optimizations" appear to be the calls to GC.Collect(),
which are not recommended unless you know the memory size of the whole
application and you know that you've just made a significant fraction of
the total object graph unreachable. To find out if that actually did the
trick, you'd need to profile.

I've never had to use GC.Collect() to get good performance, myself; I
have, however, sometimes used object pooling for large objects (i.e.
objects allocated on the LOB, the Large Object Heap).

-- Barry
 
M

Mike S

Barry said:
Without a compiling sample, it's hard to investigate properly. Have you
tried profiling the two side by side (even simply Performance Monitor
will do), and checking how much time is being spent in the GC?

The auto-generated code comprises around 420,000 lines of code, and I
had trouble whittling it down to a smaller compilable program that
still exhibits the behavior, but I'm going to see if I can't come up
with something. As for profiling the code, I'm not sure how to approach
this, at least as it concerns the full running application - on a
high-end machine the optimized version (I've been told) took 6 hours to
run, whereas the unoptimized version was *so* inefficient that it
wouldn't even run to completion - instead it consistently crashed
Windows because it would consume all available physical and virtual
memory. Because the "optimized" version takes so long to run and still
wastes a lot of memory and the other version would just crash the OS, I
don't see how I can accurately profile them.
The most obvious "optimizations" appear to be the calls to GC.Collect(),
which are not recommended unless you know the memory size of the whole
application and you know that you've just made a significant fraction of
the total object graph unreachable. To find out if that actually did the
trick, you'd need to profile.

It seems the optimizations he added work, but I can't understand why.
My big thing is that the calls to GC.Collect() probably aren't freeing
much, if any, memory, because the object that he intends to collect
with that call still has a reference pointing to it (the problem is
that the object's reference is passed to a function by value, so that
when he sets it to null in the function and calls GC.Collect(), there
is still a strong reference to that object in the calling function, so
the object can't be collected). That makes me think that ultimately,
the memory usage for his version should be roughly the same as the
unoptimized version - it seems like magic that it runs faster and
doesn't eat up all available memory.

In the unoptimized version, the XML document in memory has a node
appended to it for every corresponding record in the source database.
This amounts to tens of thousands of records being appended to the XML
document object, and AFAIK it's impossible for the garbage collector to
reclaim any of that memory, because the object that contains the XML
document is not set to null until after *every* record has been
appended to it and the XML data has been saved to disk. The XML
document just grows and grows in memory.

To prevent the XML document from constantly growing without any memory
being freed in the meantime, he modified the algorithm so that for
every 2000 records appended to the XML document, the program would
immediately write out those 2000 records to a temporary file, set the
reference to the XML document object to null and call GC.Collect() in
hopes of collecting the object (i.e. removing those 2000 records from
memory). Then he re-allocated a new XML document object on the heap and
appended records to it until it filled up with 2000 records again. The
problem is that the reference to the XML document that he sets to null
is a *copy* of the original reference, because it was passed by value
to the function which sets it to null and calls GC.Collect().
Therefore, in effect, the program should never be able reclaim the
memory occupied by the last 2000 records (because there is still a
reference to it somewhere); instead it should behave like the
unoptimized version - where the XML document would just keep growing.

Sorry for the long-windedness and the lack of a compilable example. I
will try to whip up a minimal example, but I hope that the explanations
above can somewhat clarify the situation.
 
M

Marc Gravell

I'm assuming doc does actually get set back to something? It isn't shown, so
it's hard to see what the given code is actually doing.

If it is reassigned, then the answer is probably that there are a whole pile
of XmlDocuments kicking around; the calling function can see the first
(only), so the rest can be GCd every 2000 rows.

If is isn't reassigned (i.e. it actually just calls .LoadXml() with a small
fragment), then this causes the XmlDocument to de-reference a pile of nodes,
which can then be GCd.

Personally, I think that there are some pretty severe design issues here
handling that amount of data in an XmlDocument; if you really want to look
at performance, I would recommend seeing if any of the xml handling would
lend itself to XmlReader and XmlWriter; trickier to use, but vastly more
efficient. http://thedailywtf.com/ would describe this as very
"enterprisey", but the simple fact is that in-memory DOM trees are not
designed for this volume of data.

Marc
Marc
 
M

Mike S

Marc said:
I'm assuming doc does actually get set back to something? It isn't shown, so
it's hard to see what the given code is actually doing.

My fault. It does reassign doc after calling GC.Collect().
If it is reassigned, then the answer is probably that there are a whole pile
of XmlDocuments kicking around; the calling function can see the first
(only), so the rest can be GCd every 2000 rows.

I'm thinking that since doc was passed by value, the GC.Collect() can't
collect the memory doc was referring to anyway, since the calling
function still holds a reference to what doc pointed to before the
reassignment. The way the code is set up, you don't return back to the
calling function until after the XmlDocument has been fully constructed
in memory, so I would think the node tree before the assignment never
gets freed, and additionally the reassignment causes a new tree to be
allocated, so if anything it should be wasting more memory that code
that would just blindly append new nodes to a single XmlDocument
instance....Then again, the disparities between what I think should
happen and what does happen could be due to something in the other
half-a-million lines of code ;-)
If is isn't reassigned (i.e. it actually just calls .LoadXml() with a small
fragment), then this causes the XmlDocument to de-reference a pile of nodes,
which can then be GCd.

Personally, I think that there are some pretty severe design issues here
handling that amount of data in an XmlDocument; if you really want to look
at performance, I would recommend seeing if any of the xml handling would
lend itself to XmlReader and XmlWriter; trickier to use, but vastly more
efficient. http://thedailywtf.com/ would describe this as very
"enterprisey", but the simple fact is that in-memory DOM trees are not
designed for this volume of data.

I agree that the approach we've taken was poorly devised. It
(thankfully) wasn't my idea, but now because of time constraints and
so on and so forth I'm stuck with it. To rework the auto-generated code
to deal with XmlReaders and XmlWriters would take too long at this
point, as would doing away with the auto-generated stuff altogether and
writing the code from scratch (which I would actually prefer, but it
ain't gonna happen). Since I am stuck with the current algorithm, I've
been wondering how I can optimize it (if at all) without falling way
behind schedule; however, these posts were more of a question of why
the original optimized code ran faster than the unoptimized code, when
on closer inspection, it would seem the optimized code wouldn't be
freeing any more memory than the unoptimized version (i.e. the
pass-by-value thing). On that note, it would seem the least I should do
is make the formal parameter "doc" pass-by-reference, so that when it
gets set to to null in the loop*() functions, the GC.Collect() will
actually be able to collect it.
 
B

Barry Kelly

Mike S said:
The auto-generated code comprises around 420,000 lines of code, and I
had trouble whittling it down to a smaller compilable program that
still exhibits the behavior, but I'm going to see if I can't come up
with something.

The thing is, with the code you posted, 'doc' never gets assigned a new
value, so the next time around the loop it should throw a
NullReferenceException. So, the code has been whittled a little too
much.

Also, with a save/null/GC.Collect() cycle in that function, it is as you
say. The reference to the object is passed by value, so the whole tree
can't get collected. Looking at it, it looks awfully like the given
functions discard all but the first 2000 rows of data from the tables.
As for profiling the code, I'm not sure how to approach
this, at least as it concerns the full running application - on a
high-end machine the optimized version (I've been told) took 6 hours to
run, whereas the unoptimized version was *so* inefficient that it
wouldn't even run to completion - instead it consistently crashed
Windows because it would consume all available physical and virtual
memory. Because the "optimized" version takes so long to run and still
wastes a lot of memory and the other version would just crash the OS, I
don't see how I can accurately profile them.

Can you try and take out this whittled bit, and write your own harness
around it, that just works with this DB & XML loading? That's what I'd
try and do.

Like Marc says, I'd recommend checking out XmlReader and XmlWriter if
you need forward-only access. However, if you're interacting with a
third-party XML processor for .NET, you probably can't take that option.

-- Barry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top