DataTable Bizarreness

M

Mike Labosh

I have this giant table in our database that represents a dumping ground for
file imports. It's 44 columns wide and nominally about 30 million records,
indexed in blocks by the file the records came from.

I also have this incredibly elaborate class library that cleans the data in
various columns (you would not believe the trash that comes in on the
PhoneNumber field)

One of the things that has been bugging me for a long time is this issue
here:

In my cleaner app, I have dozens of methods that do really heavy string
manipulation that all pretty much go like this:

1. Get a DataTable from the raw-data dumping-ground table via a
SqlDataAdapter
2. Loop over the Rows collection cleaning the values based on business rules
3. Put the values into the actual entity tables where they belong, via a
SqlDataAdapter

But here's the whammy:

Since the DataTable was populated from a DataAdapter, the DataRows all think
they came from the database, and rightly so. But after my cleaning
algorithms, the cleaned values represents INSERTs to their respective
tables. But when I say da.Update(dt), the DataAdapter sees a DataTable with
*modified* values, not *new* values, and instinctively runs its
UpdateCommand. But I'm not doing updates, I'm doing inserts. So I keep
finding myself having to LIE to the DataAdapter by putting a stored
procedure that does INSERTs into the DataAdapter's UpdateCommand.

It works, but I don't like it. It seems like a perverse filthy hack.

Anyone have a better idea?

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"
 
K

Ken Tucker [MVP]

Hi,

Sounds like you need to merge the records into the other table.
http://www.windowsformsdatagridhelp.info/default.aspx

Ken
--------------
I have this giant table in our database that represents a dumping ground for
file imports. It's 44 columns wide and nominally about 30 million records,
indexed in blocks by the file the records came from.

I also have this incredibly elaborate class library that cleans the data in
various columns (you would not believe the trash that comes in on the
PhoneNumber field)

One of the things that has been bugging me for a long time is this issue
here:

In my cleaner app, I have dozens of methods that do really heavy string
manipulation that all pretty much go like this:

1. Get a DataTable from the raw-data dumping-ground table via a
SqlDataAdapter
2. Loop over the Rows collection cleaning the values based on business rules
3. Put the values into the actual entity tables where they belong, via a
SqlDataAdapter

But here's the whammy:

Since the DataTable was populated from a DataAdapter, the DataRows all think
they came from the database, and rightly so. But after my cleaning
algorithms, the cleaned values represents INSERTs to their respective
tables. But when I say da.Update(dt), the DataAdapter sees a DataTable with
*modified* values, not *new* values, and instinctively runs its
UpdateCommand. But I'm not doing updates, I'm doing inserts. So I keep
finding myself having to LIE to the DataAdapter by putting a stored
procedure that does INSERTs into the DataAdapter's UpdateCommand.

It works, but I don't like it. It seems like a perverse filthy hack.

Anyone have a better idea?

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"
 
C

Cor Ligthert

Mike,

Reading this I get first the idea, why is Mike using a datatable and not
just a datareader and process all rows one by one and insert/update them
with a command.nonquery.

It will be at least faster.

Cor
 
M

Mike Labosh

Reading this I get first the idea, why is Mike using a datatable and not
just a datareader and process all rows one by one and insert/update them
with a command.nonquery.

Because mikey doesn't like holding onto the server while chewing on several
hundred thousand string manipulations. I prefer to select the stuff, chew
on in and then batch-update it.

We don't do any OLTP stuff here at all. We're a batch-shop. Every single
thing we do here is a giant heap of several hundred thousand rows. Doing
little transactions here would just fill the network with peanut butter.
--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"
 
C

Cor Ligthert

Mike,
Because mikey doesn't like holding onto the server while chewing on
several hundred thousand string manipulations. I prefer to select the
stuff, chew on in and then batch-update it.
Using a dataadapter or whatever does give you at least the same amount of
network traffic as a datareader and a command.execute non query.

Those are both used by the dataadapter.

The only effect will probably be that with an Ethernet network, using the
way I told, probably you will have less collisions (and with that a higher
performance), because you are now giving all the data in one large batch,
what is forever bad for this kind of LAN's.

I hope this helps,

Cor
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top