Improving DataSet.Merge() performance

G

Guest

Hi. My users want to be able to be compare data. The data for each object to
be compared is in identical DataSets so we decided to use DataSet.Merge() and
then bind the resultant DataSet to a grid. By appending a number to the
column name in each DataSet and nominating one column to be the "key" column
we can rely on the Merge functionality to create a single DataTable with a
column for each property of each object (see code below).

The problem is the Merge is very slow. I am wondering if this is because we
are using keyed data with completely different schemas (as far as Merge is
concerned). Is there anything I can do to speed things up?

There is quite a lot of data (~500 rows, 50 columns per DataTable BEFORE
merging) but I have profiled the code and the cost of the Merge is a lot
higher than I would have expected.

Thanks

kh

// code example (snipped for clarity)
public static DataSet CombineDataSets( string keyField, DataSet[] dataSets)
{
for( int dsIndex = 0; dsIndex < dataSets.Length; ++dsIndex )
{
// get reference to table
DataTable currentDataTable = dataSets[ dsIndex ].Tables[ 0 ];

// add a primary key so merging works correctly
currentDataTable.PrimaryKey = new DataColumn[] {
currentDataTable.Columns[ keyField ] };

// number columns so they are unique when the tables are merged
for( int colIndex = 0; colIndex < currentDataTable.Columns.Count;
++colIndex )
if( !currentDataTable.Columns[ colIndex ].ColumnName.Equals(
keyField ) )
currentDataTable.Columns[ colIndex ].ColumnName =
currentDataTable.Columns[ colIndex ].ColumnName + (dsIndex + 1).ToString();
}

// merge
DataSet ds = new DataSet();
for( int dsIndex = 0; dsIndex < dataSets.Length; ++dsIndex )
ds.Merge( dataSets[ dsIndex ] );

}
 
K

Kevin Yu [MSFT]

Hi kh,

As far as I know, there is nothing we can do to spped up the merge process,
if you have totally different schemas for two DataSets. So what I suggest
is to write a method which compares the two DataSets row by row and column
by column according to the key column value. It might be faster than merge
two different DataSets.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
G

Guest

Thanks Kevin. Actually I was using DataSet.Merge() to solve performance
problems previously observed using the method you suggest, but it looks as
though my original code was poorly written in terms of performance (see the
post "Performance: searching and updating" on this channel).

kh
 
S

Sahil Malik

Though there might be an alternate way to skin this cat.

So what do u have - 2 datasets with no PK in them, but same structures, and
you wanna compare them .. right? Lemme know and we can try and work out a
better solution.


- Sahil Malik
http://codebetter.com/blogs/sahil.malik/
 
G

Guest

Hi Sahil. I suppose an example is in order....

I have two tables, each with a PK and each with identical schemas. However,
they may or may not contain the same key values. Where they have matching key
values the user needs the data displayed side by side. Where there is no
matching key the data with the missing key is null.

Consider the following simplistic example (best viewed in fixed width font):

Table1:

| Name* | DataA | DataB |
|--------|---------|---------|
| Fred | 100 | 25 |
| Jane | 80 | 42 |

Table2:

| Name* | DataA | DataB |
|--------|---------|---------|
| Fred | 99 | 24 |
| Anne | 183 | 13 |

Output (column order is unimportant):

| Name* | DataA1 | DataA2 | DataB1 | DataB2 |
|--------|----------|----------|----------|----------|
| Fred | 100 | 99 | 25 | 24 |
| Jane | 80 | <null> | 42 | <null> |
| Anne | <null> | 183 | <null> | 13 |

So in the Output table the set of values in the Name (PK) column is the
union of the entries in the Name column in the source tables, and if there
are [n] non-PK columns in each source table then there will be [(n * m) + 1]
columns in the output table (where [m] is the number of source tables).

Does this make sense? The output table does not share a schema with the
source tables but they are related. I need a single output table so that it
can be bound to a suitable control in the UI.

Feel free to ask more questions if the problem is not clear.

Thanks

kh
 
K

Kevin Yu [MSFT]

Hi kh,

Based on the structure you have provided, I think it hard to optimize any
more. So we can keep your current solution. Or let wait to see if some
community members have better solutions.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
G

Guest

Sahil/Kevin

Thanks for your interest. After several iterations and plenty of profiling I
have a handcrafted solution that performs well enough. I will come back to it
when I have more time.

One additional requirement was that if a column does not vary at all between
all source DataSets it should be collapsed to a single column. This has
proved more challenging!

kh
 
K

Kevin Yu [MSFT]

Yes, kh.

To collapsing the column that does not vary at all between all source
DataSet, we have to write more complex logic to achieve this.

Thanks for sharing your experience with all the people here. If you have
any questions, please feel free to post them in the community.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top