help me use less memory!

R

roger_27

I've got a problem. I load a huge datatable, with 111 columns. This
datatable runs a System.OutOfMemory exception if its bigger than about
280,000 rows.

that's fine. I've come to accept that.


when I fill a datatable up to near its max limit of 280k rows, use of
program memory goes from about 100 MB, to 1 GIG!

that's also fine. I don't really have much choice there.

but then I go through a loop like this:

//dtResults is the HUGE datatable.

//I clone the datatable a few times. this makes new datatables
//with the same columns.
DataTable dtRejectedResults = new DataTable();
dtRejectedResults = dtResults.Clone();

DataTable dtExcludedResults = new DataTable();
dtExcludedResults = dtResults.Clone();

DataTable dtExcludedResults2 = new DataTable();
dtExcludedResults2 = dtResults.Clone();

DataTable dtExcludedResults3 = new DataTable();
dtExcludedResults3 = dtResults.Clone();

DataTable dtREGS = new DataTable();
dtREGS = dtResults.Clone();

DataTable dtREGSNCOABad = new DataTable();
dtREGSNCOABad = dtResults.Clone();



//now I loop through the HUGE datatable, and place the rows in separate,
//smaller datatables.
//***this is where the memory usage jumps from 1 gig, to 1.3 gigs.
//***why does it grow 300 MB in here????
for (int k = 0; k < dtResults.Rows.Count; k++)
{
if (dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "C" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "F" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "G" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "K")
{
DataRow row3 = dtREGSNCOABad.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtREGSNCOABad.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-1")
{
DataRow row3 = dtRejectedResults.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtRejectedResults.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-2")
{
DataRow row3 = dtExcludedResults.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-3")
{
DataRow row3 = dtExcludedResults2.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults2.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-4")
{
DataRow row3 = dtExcludedResults3.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults3.Rows.Add(row3);
}
else
{
DataRow row3 = dtREGS.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtREGS.Rows.Add(row3);
}
}


//this doesn't free up anything.
dtResults.clear()







so I tried removing rows as they are being added, like this


//dtResults is the HUGE datatable.

//I clone the datatable a few times. this makes new datatable
//with the same columns.
DataTable dtRejectedResults = new DataTable();
dtRejectedResults = dtResults.Clone();

DataTable dtExcludedResults = new DataTable();
dtExcludedResults = dtResults.Clone();

DataTable dtExcludedResults2 = new DataTable();
dtExcludedResults2 = dtResults.Clone();

DataTable dtExcludedResults3 = new DataTable();
dtExcludedResults3 = dtResults.Clone();

DataTable dtREGS = new DataTable();
dtREGS = dtResults.Clone();

DataTable dtREGSNCOABad = new DataTable();
dtREGSNCOABad = dtResults.Clone();



//now I loop through the HUGE datatable, and place the rows in separate,
//smaller datatables based on certain columns. just a bunch of ifs.
//***this is where the memory usage jumps from 1 gig, to 1.3 gigs.
//***why does it grow 300 MB in this loop ?????
for (int k = 0; k < dtResults.Rows.Count; k++)
{
if (dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "C" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "F" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "G" ||
dtResults.Rows[k]["Flag"].ToString().TrimEnd(' ') == "K")
{
DataRow row3 = dtREGSNCOABad.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtREGSNCOABad.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-1")
{
DataRow row3 = dtRejectedResults.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtRejectedResults.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-2")
{
DataRow row3 = dtExcludedResults.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-3")
{
DataRow row3 = dtExcludedResults2.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults2.Rows.Add(row3);
}
else if (dtResults.Rows[k]["PACKAGE"].ToString().TrimEnd(' ') == "-4")
{
DataRow row3 = dtExcludedResults3.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtExcludedResults3.Rows.Add(row3);
}
else
{
DataRow row3 = dtREGS.NewRow();
row3.ItemArray = dtResults.Rows[k].ItemArray;
dtREGS.Rows.Add(row3);
}

//***added this here
dtResults.Rows.Remove(dtResults.Rows[k]);
k = k - 1;

}


//memory usage doesnt change.






is there any ideas? or tips? or anything that someone can provide me to make
this more memory-efficient??

I know what I am doing that is making memory grow, I am taking a huge
datatable, looping through it all, making copies of those EXACT rows and
putting them into new datatables.

so yeah mem usage would grow, but then I call it to .clear() and it stays
the same. I've tried .dispose, gc.collect. nothing seems to change that.

so is there some other way I can do this using and use less memory??

thanks.

Roger.
 
P

Peter Duniho

roger_27 said:
[...]
is there any ideas? or tips? or anything that someone can provide me to make
this more memory-efficient??

I know what I am doing that is making memory grow, I am taking a huge
datatable, looping through it all, making copies of those EXACT rows and
putting them into new datatables.

so yeah mem usage would grow, but then I call it to .clear() and it stays
the same. I've tried .dispose, gc.collect. nothing seems to change that.

so is there some other way I can do this using and use less memory??

It is basically impossible to offer specific advice, especially for a
question like this, without a concise-but-complete code example that
reliably reproduces the problem.

However, some general points can be made:

-- You need to keep in mind how you are measuring the memory usage.
Stating memory usage without specifics about how that usage is being
determined is useless.

-- If, as I suspect, you are simply look at memory usage figures in
the Task Manager program, you have to keep in mind that that number has
only a very tenuous connection to the memory your program is actually
using. .NET is managing unmanaged memory usage on your behalf, and just
because you've freed up space, perhaps even LOTS of space, in the
managed heap, that doesn't mean .NET will give it back to the OS right away.

-- In the same way that .NET manages memory independently of your
specific code, so too does a class like DataTable manage its internal
data structures independently of your specific code. I haven't looked
at the implementation of DataTable, but removing rows from a table may
or may not actually result in lower memory usage. It's quite common, in
fact, for dynamically-variable data structures like that to only ever
grow, and never contract, especially if the underlying storage is
array-based.

Note that the real issue is whether the memory consumption you're seeing
has any real impact on real-world performance. Your own process will be
limited to a specific amount of virtual address space used regardless,
and assuming your consumption of memory is almost entirely through .NET,
it won't matter with respect to that question how .NET manages the
unmanaged memory allocations.

In terms of overall system performance, your process having 1.3GB (for
example) of committed memory allocation, even if your own code has
released a significant portion of that (assuming it has), may or may not
be a real problem. As long as your own application doesn't wind up
increasing memory usage back to that number, the physical memory pages
that were used to represent that portion of your virtual address space
will wind up swapped out to the disk, and unless you're running low on
disk space, won't have any effect at all on other processes running on
the system.

If you want more specific details on your exact scenario, then you'll
need to look more closely at how your process is using memory, as well
as use the right tools to do so. In particular, you need a memory
profiler to look at exactly what your _managed_ allocation state
actually looks like. If it turns out that the storage for the original
DataTable isn't released even after you call Clear(), it may be you need
to discard the object altogether. But of course, there's no point in
bothering with that until you've verified that a) the DataTable object
doesn't give up allocations it's made earlier (you'll have to examine
the source code, or the IL via Reflector, to determine that), and b)
that the DataTable object is in fact the source of all of the extra
memory usage (you need a memory profiler to tell you that…I hear the Red
Gate one is good, though I haven't used it myself).

Oh, and as far as your original question about how to "make this more
memory-efficient", there may be things that can be done. But the first
step is to identify the problem. Looking at the code, there does not
appear to be any obvious low-hanging fruit. Things like calls to
ToString() and Trim() are definitely adding to the problem, since they
create new objects for each method call, but it's not clear there's any
point in trying to come up with alternatives to those, even if that's
possible at all (well, for Trim() there definitely is…but the call to
ToString() seems likely to be required, without a lot of additional work).

Pete
 
G

Gregory A. Beamer

is there any ideas? or tips? or anything that someone can provide me
to make this more memory-efficient??

Why is all of this work being done in .NET. It looks like much of the
filtering you are attempting can be done using a variety of queries and
populating numerous tables rather than returning 280K rows, then rejecting
and excluding over and over again.

Another question: Why do you have to hold all of the rejects and exclusions
in a separate table? Can you not get them when the user wants to look at
them?

Unless this is a utility app, which could be run in firehose mode most
likely, you have far more data than a user could EVER interact with.

Peace and Grace,

--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 
R

roger_27

Thanks for the response.

This is actually part of a utility billing program I wrote, and all of the
separate data tables are necessary. they get written out as data to be
transformed into PDFs and then printed as utility bills for quite a few
cities in the California area.

so I definitely need them all.

I guess I should have explained that all I really was doing was looking
through the task manager and looking at the memory use. I'm pretty sure it's
that loop because when I insert a breakpoint, I see that before the loop I am
still at 1 gig, and when the loop is over I'm at 1.3, but I guess as you have
stated that doesn't necessarily mean that it's using more. correct?

I tried calling GC.Collect() to no avail.


Thank you for all that input,

I guess I will just look around for some kind of memory profiler to see
where the hangup is.

Peter Duniho said:
roger_27 said:
[...]
is there any ideas? or tips? or anything that someone can provide me to make
this more memory-efficient??

I know what I am doing that is making memory grow, I am taking a huge
datatable, looping through it all, making copies of those EXACT rows and
putting them into new datatables.

so yeah mem usage would grow, but then I call it to .clear() and it stays
the same. I've tried .dispose, gc.collect. nothing seems to change that.

so is there some other way I can do this using and use less memory??

It is basically impossible to offer specific advice, especially for a
question like this, without a concise-but-complete code example that
reliably reproduces the problem.

However, some general points can be made:

-- You need to keep in mind how you are measuring the memory usage.
Stating memory usage without specifics about how that usage is being
determined is useless.

-- If, as I suspect, you are simply look at memory usage figures in
the Task Manager program, you have to keep in mind that that number has
only a very tenuous connection to the memory your program is actually
using. .NET is managing unmanaged memory usage on your behalf, and just
because you've freed up space, perhaps even LOTS of space, in the
managed heap, that doesn't mean .NET will give it back to the OS right away.

-- In the same way that .NET manages memory independently of your
specific code, so too does a class like DataTable manage its internal
data structures independently of your specific code. I haven't looked
at the implementation of DataTable, but removing rows from a table may
or may not actually result in lower memory usage. It's quite common, in
fact, for dynamically-variable data structures like that to only ever
grow, and never contract, especially if the underlying storage is
array-based.

Note that the real issue is whether the memory consumption you're seeing
has any real impact on real-world performance. Your own process will be
limited to a specific amount of virtual address space used regardless,
and assuming your consumption of memory is almost entirely through .NET,
it won't matter with respect to that question how .NET manages the
unmanaged memory allocations.

In terms of overall system performance, your process having 1.3GB (for
example) of committed memory allocation, even if your own code has
released a significant portion of that (assuming it has), may or may not
be a real problem. As long as your own application doesn't wind up
increasing memory usage back to that number, the physical memory pages
that were used to represent that portion of your virtual address space
will wind up swapped out to the disk, and unless you're running low on
disk space, won't have any effect at all on other processes running on
the system.

If you want more specific details on your exact scenario, then you'll
need to look more closely at how your process is using memory, as well
as use the right tools to do so. In particular, you need a memory
profiler to look at exactly what your _managed_ allocation state
actually looks like. If it turns out that the storage for the original
DataTable isn't released even after you call Clear(), it may be you need
to discard the object altogether. But of course, there's no point in
bothering with that until you've verified that a) the DataTable object
doesn't give up allocations it's made earlier (you'll have to examine
the source code, or the IL via Reflector, to determine that), and b)
that the DataTable object is in fact the source of all of the extra
memory usage (you need a memory profiler to tell you that…I hear the Red
Gate one is good, though I haven't used it myself).

Oh, and as far as your original question about how to "make this more
memory-efficient", there may be things that can be done. But the first
step is to identify the problem. Looking at the code, there does not
appear to be any obvious low-hanging fruit. Things like calls to
ToString() and Trim() are definitely adding to the problem, since they
create new objects for each method call, but it's not clear there's any
point in trying to come up with alternatives to those, even if that's
possible at all (well, for Trim() there definitely is…but the call to
ToString() seems likely to be required, without a lot of additional work).

Pete
.
 
R

roger_27

This helps, but doesn't exactly fix everything. I did this:


//perform a datatable select statement
if (dtResults.Rows.Count > 0)
{
dtCityModestoNCOABad = GetDTFromFilter(dtResults, "flag =
'C' OR flag = 'F' OR flag = 'G' OR flag = 'K'");
dtRejectedResults = GetDTFromFilter(dtResults, "PACKAGE =
'-1'");
dtExcludedResults = GetDTFromFilter(dtResults, "PACKAGE =
'-2'");
dtExcludedResults2 = GetDTFromFilter(dtResults, "PACKAGE =
'-3'");
dtExcludedResults3 = GetDTFromFilter(dtResults, "PACKAGE =
'-4'");
dtCityModesto = GetDTFromFilter(dtResults, "(PACKAGE <> '-1'
AND PACKAGE <> '-2' AND PACKAGE <> '-3' AND PACKAGE <> '-4' flag = 'C' AND
flag = 'F' AND flag = 'G' AND flag = 'K')");
}


//this method will filter out the results
private DataTable GetDTFromFilter(DataTable dtSource, string
filterExp)
{

DataRow[] resultRows = dtSource.Select(filterExp);

DataSet returnDataSet = new DataSet();
returnDataSet.Merge(resultRows);

if (returnDataSet.Tables.Count == 0)
{
DataTable dtTempt = new DataTable();
dtTempt = dtSource.Clone();
returnDataSet.Tables.Add(dtTempt);
}

return returnDataSet.Tables[0];

}


After viewing the task manager I see this uses about 50-60MB less memory.
This would be (i think) because it creates the objects, but when a method
call completes, the runtime knows that any resources from that method call
can be disposed of. It didnt reduce the usage by half or 3/4 like I wanted,
but this might work out for me.
 
P

Peter Duniho

roger_27 said:
[...]
After viewing the task manager I see this uses about 50-60MB less memory.
This would be (i think) because it creates the objects, but when a method
call completes, the runtime knows that any resources from that method call
can be disposed of. It didnt reduce the usage by half or 3/4 like I wanted,
but this might work out for me.

As I said before, you can't use Task Manager to know what your actual
memory usage is. Just because the objects are no longer reachable, and
just because they have been garbage-collected, that doesn't mean .NET is
immediately going to give that memory back to Windows.

The real question is whether you have any specific performance or
allocation-failure issues. If you don't, then there's nothing to worry
about. If you do, then you need to quantify those issues so that you
can effectively measure attempts to solve those issues.

Pete
 
G

Gregory A. Beamer

As I said before, you can't use Task Manager to know what your actual
memory usage is. Just because the objects are no longer reachable, and
just because they have been garbage-collected, that doesn't mean .NET is
immediately going to give that memory back to Windows.

The real question is whether you have any specific performance or
allocation-failure issues. If you don't, then there's nothing to worry
about. If you do, then you need to quantify those issues so that you
can effectively measure attempts to solve those issues.

Memory in .NET is a hard subject to teach someone who comes from a
different background, as the paradigm is so different. When I first read
Richter's treatise on how memory was collected, I thought it was daft.
After some time working with .NET, and better understanding the mechanism,
it makes sense. But it is still a major mental hurdle for most people I
teach it to.

Peace and Grace,

--
Gregory A. Beamer (MVP)

Twitter: @gbworld
Blog: http://gregorybeamer.spaces.live.com

*******************************************
| Think outside the box! |
*******************************************
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top