Move Data from table to Array

P

Peter

Is there a fast way to move data from DataTable into double array, or do I have to spin through every record and column?

I have five tables and I need to merge these tables column wise, each table will have to same amount of records but might have different amount of columns.

So I need the first record form table1 and table2 and table3 and table4 and table5 to be in record 1 in my merged array or table.
Second record form table1 and table2 and table3 and table4 and table5 to be in record 2 in my merged array or table.
....
....
....

the final array or table might have 2000 rows and 7000 columns.

Thanks


Peter
 
B

Bruce Wood

So I need the first record form table1 and table2 and table3 and table4 and table5 to be in record 1 in my merged array or table.
Second record form table1 and table2 and table3 and table4 and table5 to be in record 2 in my merged array or table.
...

Where is your data coming from?

Can't you just specify this on the SELECT clause when you fetch the
data, e.g.:

SELECT * FROM TABLE1, TABLE2, TABLE3, TABLE4, TABLE5
WHERE TABLE1.KEY = TABLE2.KEY AND TABLE2.KEY = TABLE3.KEY AND ...

or something like that?

If not, I would propose that the time it takes you to populate your
DataTable / Array in memory will probably pale in comparison to the
time it takes to get the stuff out of the database in the first place.
The only way to know for sure is to write the simple-minded algorithm
(loops within loops) and then run your program through a profiler to
see where it's spending its time.
 
K

Kevin Yu [MSFT]

Hi Peter,

Like Bruce said, it's better to do the merge when you select from the data
source. If you really need to move them to an array, you have to go through
each row in the DataTable. The DataRow has an ItemArray property which
returns an array of object that contains the values for each column.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
P

Peter

Kevin Yu said:
Hi Peter,

Like Bruce said, it's better to do the merge when you select from the data
source. If you really need to move them to an array, you have to go
through
each row in the DataTable. The DataRow has an ItemArray property which
returns an array of object that contains the values for each column.

HTH.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."

Data is coming from a Excel spreadsheet
each worksheet within the spreadsheet can have variable number of columns
and the column names can be anything, so basically I have to merge each
spreadsheet to the main spreadsheet, each Row1 from each worksheet will be
in the Row1 of the main worksheet. But I can not use Excel because I might
end up with up to 7000 columns.

Somehow I have to get the all of the worksheets data moved into an array
very fast.
 
B

Bruce Wood

Data is coming from a Excel spreadsheet
each worksheet within the spreadsheet can have variable number of columns
and the column names can be anything, so basically I have to merge each
spreadsheet to the main spreadsheet, each Row1 from each worksheet will be
in the Row1 of the main worksheet. But I can not use Excel because I might
end up with up to 7000 columns.

Somehow I have to get the all of the worksheets data moved into an array
very fast.

Well, everything is relative. In your case, I suspect that getting the
data out of the existing spreadsheets and writing it to wherever it has
to go will be (relatively) much slower than your merge operation.

The only way to know for sure is to take a shot at writing the merge
and, as I said, run your program under a profiler to see where it
spends most of its time. If the merge turns out to be a bottleneck,
then tune it, but I suspect that it won't be.
 
P

Peter

Bruce Wood said:
Well, everything is relative. In your case, I suspect that getting the
data out of the existing spreadsheets and writing it to wherever it has
to go will be (relatively) much slower than your merge operation.

The only way to know for sure is to take a shot at writing the merge
and, as I said, run your program under a profiler to see where it
spends most of its time. If the merge turns out to be a bottleneck,
then tune it, but I suspect that it won't be.
Retreiving data from the Excel spreadsheet is relatively very fast, but
merging data into the array takes abut 10 minutes, compared to about 10
secods to retreive the data from the spreadsheet.

Here's my merge code.

for(int n = 0; i < importData.Count; n++)
{
mergedTable = (DataTable)importData.Item[n];

for(int i = 1; i < maxRecords; i++)
{
col = holdCol;
for(int c = 1; c < mergedTable.Columns.Count - 1; i++)
{
val = Convert.ToDouble(mergedTable.Rows[c]);
dataArray[i - 1, col] = val;
col++; // next column
}
}
}

So I need code that will move data from a table into an array very fast.
 
K

Kevin Yu [MSFT]

Hi Peter,

From the code, I can see that you're combining all the records in the
DataTables like the following:

data from DataTable1
data from DataTable2
data from DataTable3
........

In this case, if the schema of the tables are the same, when filling the
DataTable, you can simply use three DataAdapters to fill the same
DataTable. For example, filling it 3 times, like

DA1.Fill(dt);
DA2.Fill(dt);
DA3.Fill(dt);

will put data from 3 tables to the same DataTable.

Kevin Yu
=======
"This posting is provided "AS IS" with no warranties, and confers no
rights."
 
P

Peter

Unfortunately the schema is different for each table, number of columns and
column names are different and on top of if the Master table will contain
only numbers, but the sheets might contain text columns which I have to
remove.
(Data comes from a spreadsheet supplied by a user)

So I think I need a routine that can move data from DataTable to an array
very fast.
 
B

Bruce Wood

The only thing I see in your code that would account for the 10 minutes
is the Convert.ToDouble() call. In what form are you getting the data
from the Excel spreadsheets?
 
P

Peter

Bruce Wood said:
The only thing I see in your code that would account for the 10 minutes
is the Convert.ToDouble() call. In what form are you getting the data
from the Excel spreadsheets?

Here's sample data (delimited by spaces)

7/1/04 0:00 80 80.4143219 41.76826859 3620 85.99505615 4.173601627 722.7902832 674.1519775
7/1/04 1:00 80 80.41130066 41.76562119 3619.800049 85.99694824 4.63732338 777.8297119 730.9456177
7/1/04 2:00 80 79.91899872 41.76296997 3620 85.99486542 4.051301479 724.458313 654.1055908
7/1/04 3:00 80 80.48442078 41.76031876 3619.857178 85.99700165 4.282075405 720.6036987 700.1008301
7/1/04 4:00 80 80.36474609 41.75767136 3620 85.99711609 4.218424797 724.3544312 676.8447876
7/1/04 5:00 80 80.23025513 41.75502014 3620 85.99019623 4.297987461 723.5835571 691.6959839
7/1/04 6:00 80 80.43131256 41.75236893 3620 85.98691559 4.190071583 721.6431885 673.8577881
 
B

Bruce Wood

Sorry, my question was too terse. What I was wondering was this: I
assume that you're calling Excel's objects in order to get the values
of the cells. How do those objects deliver the values? Are they strings
(internally)? Or are they already doubles and so the Convert isn't
necessary?

In other words, in the line

val = Convert.ToDouble(mergedTable.Rows[c]);

what is the data type of mergedTable.Rows[c]?

To find out whether the Convert call is a problem, do a little
experiment: change the Convert.ToDouble call from this:
val = Convert.ToDouble(mergedTable.Rows[c]);
to simply this:
val = 0.0;
and try running your program. If the merge operation completes very
quickly, you know that the slow bit was the Convert. If it still takes
a long time, then let us know here and we'll try to figure out why.
 
P

Peter

Bruce Wood said:
Sorry, my question was too terse. What I was wondering was this: I
assume that you're calling Excel's objects in order to get the values
of the cells. How do those objects deliver the values? Are they strings
(internally)? Or are they already doubles and so the Convert isn't
necessary?

In other words, in the line

val = Convert.ToDouble(mergedTable.Rows[c]);

what is the data type of mergedTable.Rows[c]?

To find out whether the Convert call is a problem, do a little
experiment: change the Convert.ToDouble call from this:
val = Convert.ToDouble(mergedTable.Rows[c]);
to simply this:
val = 0.0;
and try running your program. If the merge operation completes very
quickly, you know that the slow bit was the Convert. If it still takes
a long time, then let us know here and we'll try to figure out why.


After I changed val = 0.0 the merge took less than 1 second.

couple other things I've tried:
instead of val = Convert.ToDouble(mergedTable.Rows[c]);
changed it to:
mergedTable.Rows[c] = mergedTable.Rows[c]
it took 4 mintes almost 50% improvement, but still slow

I also tried:
Object o;

instead of val = Convert.ToDouble(mergedTable.Rows[c]);
changed it to:
o = mergedTable.Rows[c];
it took 4 minutes.


Most of the time the values in the spreadsheet are doubles, but sometimes
cells have blanks or text, plus the spreadsheet comes from users so the
spreadsheet potentialy could have anything.
 
B

Bruce Wood

OK, here's a really nasty way to do it. See if it at least takes you
down to 4 minutes like the other assignments. Given that even assigning
to an object takes 4 minutes, I doubt that you're going to do better
than that. As you can see, the bottleneck is something in the
assignment, or maybe the indexing into the mergeTable. Either way,
there's not much you can do to eliminate those things. Try this:

double val = 0.0;
object o = mergedTable.Rows[c];
try
{
val = (double)o;
}
catch (InvalidCastException)
{
val = Convert.ToDouble(o);
}

This is, technically speaking, bad programming practice: using an
exception for normal program flow. However, it does have the advantage
that the normal case (the object is already a double) passes with
minimum fuss, while the exceptional case (there is something else in
the object that needs to be converted) takes longer, but then it's
exceptional.

The Microsoft-sanctioned way to do this is to test first, like this:

double val = 0.0;
object o = mergedTable.Rows[c];
if (o is double)
{
val = (double)o;
}
else
{
val = Convert.ToDouble(o);
}

You might want to try this (as a preferred solution), but I suspect
that it will be slower because your program has to check twice to see
if the object is a double (once on the "if" and once on the cast).

Another thing you might try is to cut down on the amount of indexing
you're doing. Right now, you're finding the i'th rows in the table
again and again for every column. You could do this:

for (int i = 0; i < ... )
{
DataRow row = mergedTable.Rows;
for (int c = 0; c < ... )
{
object o = row[c];
...
}
}

At least then you cut down on indexing into the row list over and over
again. That might cut a minute or so off your time.

Any way you cut it, you're not going to get this running in a few
seconds, but you can do better than ten minutes.
 
P

Peter

The following statement shaved off about 30 seconds of the time,
DataRow row = mergedTable.Rows;

Plus with your other suggestion the process went from 10 minutes to 3.17
minutes, this is with 623 columns and 201 records. Should be interesting
with 2000 records and 7000 columns. Event at 3 minutes it's still not very
good, but a lot better than 10 minutes. I wonder if there's a faster
solution using unmanaged C++, I just don't know how to pass the tables from
..NET to C++ function.

Thank You for all of your help.
 
B

Bruce Wood

I doubt that you will get it much faster. At this point I would start
looking at the highest-level design of the process: Is this something
that you have to do over and over? Can you batch it up and run it
overnight to have the results ready in the morning? Can you do it in
the background while the user is doing something else? Stuff like that.
I think you've squeezed all you can out of the process itself, or at
least determined that, no matter what, the user is going to have to
wait, so maybe it's time to think at the grand architectural scale
about how you can fit this in with minimum inconvenience to the user.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top