DataAdapter.Update performance?

G

Guest

Please redirect me to a better newsgroup if this one isn't correct.

I have a major performance problem using a SqlDataAdapter's Update() to
record a datatable into our database. And it's no easy feat. We basically
record data for about 2-3 minutes, generating about 4000 records on a
particular table. When it comes time to update the dataset, it can take
between 1-2 minutes for the Update() routine to run. The table contains only
three fields (unique id of parent, position, and data value). In the
database, we can already have on the order of 4 million records or so,
easily. the unique id of parent field has an index on it.

Basically, I'm stuck on how to solve this. One option would be to log each
piece of data as it appears, but we're optimizing that to the millisecond
because this is dealing with radiation detection and our process loop needs
to be as spry as possible. Without an index on the parent ID, it noticeably
decreases, but then retrieving the data becomes problematic when it searches
through 5 million records for a set of 36 matching a particular id. We need
it quick on the processing/reporting end too.

So, I'm stuck. Does anybody have any suggestions on how to optimize
something like this? We'd like it to be under 10 seconds, and even that may
be annoying in the field. Appreciate any help I can get.
 
L

LaserMan

David -

You are bottlenecking on your index. Try sorting the records before
inserting them into the database. This could improve the cache
coherency of the index's B+-tree pages.

It sounds like your having lots of cache misses on the index to me. You
probably need a lot more memory so that the database can cache more
index pages.

Look at how B+-trees are implemented, because in all likelihood this is
what your index is. At 5 million records, you've got several levels of
keys. At some point, each add requires several physical disk seeks
which take a few milliseconds to complete. If your cache is so small
that it has to seek over 5 levels, which is entirely likely, do the
math: 4000 * 6ms * 5 = 120 seconds.
 
L

LaserMan

David -

You are bottlenecking on your index. Try sorting the records before
inserting them into the database. This could improve the cache
coherency of the index's B+-tree pages.

It sounds like your having lots of cache misses on the index to me. You
probably need a lot more memory so that the database can cache more
index pages.

Look at how B+-trees are implemented, because in all likelihood this is
what your index is. At 5 million records, you've got several levels of
keys. At some point, each add requires several physical disk seeks
which take a few milliseconds to complete. If your cache is so small
that it has to seek over 5 levels, which is entirely likely, do the
math: 4000 * 6ms * 5 = 120 seconds.
 
G

Guest

I'm sorry, I should have specified. We're forced to use the 1.1 framework for
now. Only have Visual Studio 2003. We may be upgrading in about 5-6 months,
but this one can't wait that long...
 
C

Carl Daniel [VC++ MVP]

David Harris said:
Please redirect me to a better newsgroup if this one isn't correct.

I have a major performance problem using a SqlDataAdapter's Update() to
record a datatable into our database. And it's no easy feat. We basically
record data for about 2-3 minutes, generating about 4000 records on a
particular table. When it comes time to update the dataset, it can take
between 1-2 minutes for the Update() routine to run. The table contains
only
three fields (unique id of parent, position, and data value). In the
database, we can already have on the order of 4 million records or so,
easily. the unique id of parent field has an index on it.

Basically, I'm stuck on how to solve this. One option would be to log each
piece of data as it appears, but we're optimizing that to the millisecond
because this is dealing with radiation detection and our process loop
needs
to be as spry as possible. Without an index on the parent ID, it
noticeably
decreases, but then retrieving the data becomes problematic when it
searches
through 5 million records for a set of 36 matching a particular id. We
need
it quick on the processing/reporting end too.

So, I'm stuck. Does anybody have any suggestions on how to optimize
something like this? We'd like it to be under 10 seconds, and even that
may
be annoying in the field. Appreciate any help I can get.

Forget about DataAdapter.Update and write your own code.

I've found that the best performance comes using SQL like this:

-- assume the following table definition
-- create table t(c1 int not null primary key, c2 int not null, c3 int not
null);

insert into t (c1,c2,c3)
exec('
select 1 as c1, 2 as c2, 3 as c3;
select 4,5,6;
select 7,8,9;
-- ...
');

insert into t(c1,c2,c3)
exec('
-- ...
);

Batch up 10-20 rows in each insert/exec block, and 10-20 insert/exec blocks
into a single SQL batch. Commit the transaction after each batch, unless
you need the entire insert to go in one transaction. Experiment with
different block/batch sizes to find the one that gives best performance for
your table.

You should easily be able to insert 4000 small records in just a few
seconds.

You ARE running on a full-strength version of SQL Server, right? (Standard
Ed. or better over a network connection).

-cd
 
G

Guest

Well, I redid the "with index, without index" test, with different results.
With an index, it takes about 3 more seconds on average than without. The
data is already sorted as it is inserted by nature of how the data is taken.

Thanks for the post though, it inspired me to go through and optimize all
the indexes we currently have in the DB, which has given us several extra ms
in speed in the data acquisition process.

David
 
G

Guest

Carl Daniel said:
Forget about DataAdapter.Update and write your own code.
<snip>

This may well be the route I take if Greg's links don't fan out. My first
language was C, so loop unrolling was an obvious optimization. I'd much
rather have a "maintainable" solution though, if possible. Or rather, code
that someone else could look at and know what it's doing easily. Or I might
suck it up and comment.
-- create table t(c1 int not null primary key, c2 int not null, c3 int not
null);

Ok, the table I am using has no primary key. The table definition is
(ParentID varchar(60) not null, Position int not null, DataValue int not
null). Would putting a primary key speed anything up at all? I'm mainly
concerned with space, because this can easily grow to several hundred million
records, which would be a lot of BigInts (scared to use ints in case we
exceed 2 billion records). Our hard drive size is limited to 10GB total (see
below for why).
You ARE running on a full-strength version of SQL Server, right? (Standard
Ed. or better over a network connection).

Unfortunately, no. We're stuck using SQL Express 2005, which has shown to be
fairly fast but obviously not like full strength. The reason for this is that
we have 12-15 machines, each which need to be able to enter secure government
facilities (we deal with radiation detection). Thus, they need their own data
storage on the box itself. Our computers for it are handhelds currently
limited to 10GB. I would love to have a network and a central repository, but
most sites do not allow wireless or even infrared or bluetooth, much less
actual networking. I currently have to use a sneakernet using a USB drive to
sync data between the central server and the handheld machines. So, we don't
have a lot working for us in all this. :) Hence half my frustration into
trying to make a decent app under those conditions... :)

Thanks though, if nothing else works I'll definitely be making custom batches.
 
C

Carl Daniel [VC++ MVP]

David Harris said:
This may well be the route I take if Greg's links don't fan out. My first
language was C, so loop unrolling was an obvious optimization. I'd much
rather have a "maintainable" solution though, if possible. Or rather, code
that someone else could look at and know what it's doing easily. Or I
might
suck it up and comment.


Ok, the table I am using has no primary key. The table definition is
(ParentID varchar(60) not null, Position int not null, DataValue int not
null). Would putting a primary key speed anything up at all?

Adding a primary key wouldn't help at all. At best it would have no effect.
At worst it would significantly slow things down.
I'm mainly
concerned with space, because this can easily grow to several hundred
million
records, which would be a lot of BigInts (scared to use ints in case we
exceed 2 billion records). Our hard drive size is limited to 10GB total
(see
below for why).


Unfortunately, no. We're stuck using SQL Express 2005, which has shown to
be
fairly fast but obviously not like full strength. The reason for this is
that
we have 12-15 machines, each which need to be able to enter secure
government
facilities (we deal with radiation detection). Thus, they need their own
data
storage on the box itself. Our computers for it are handhelds currently
limited to 10GB. I would love to have a network and a central repository,
but
most sites do not allow wireless or even infrared or bluetooth, much less
actual networking. I currently have to use a sneakernet using a USB drive
to
sync data between the central server and the handheld machines. So, we
don't
have a lot working for us in all this. :) Hence half my frustration into
trying to make a decent app under those conditions... :)

Ouch! One thing you might want to check into (or likely have already?) -
abandon the database entirely for the data collection app and just write out
flat files. You can then load the files into your central repository using
a tool like bcp.
Thanks though, if nothing else works I'll definitely be making custom
batches.

Good luck!

-cd
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top