Dataset. Most efficient approach.

B

bob

Hi,
My app needs to read a text file, compare the data to that already
stored in the DB and generate a text file of the differences.

The UI displays the text file data and the db data in a series of
Datagridviews.
Piclk a row in the text file master table dgv and the other dgvs move
to the appropriate records so you can eyeball the differences before
generating the difference file.

I have used a dataset that contains two 'arms'
The text file arm and the database arm.
BindingSources are used to make the dataset relationships available
to the code.

Works OK but when I feed it the full text file it takes a long time to
do the comparison work.
Profiling shows the most expensive part is moving through the dataset
with the bindingSource.Movenext.

So I figured that maybe using the threadpool to do the iteration and
comparison would speed things up.

Sort of
while(position + 1 < bindingSource.Count)
{
assign all the bindingsources and the dataset to a helperclass object
'u'
ThreadPool.QueueUserWorkItem(new WaitCallback (MyComparisionFunction),
u);
MasterBindingSource.MoveNext();
position++;
}

However the comparision code fails when it tries to get one of the
datarelaionships back from one of the bindingsources. "Relation not
found"
Intellisense indicates it is there but you can't argue with the
executing code.
Maybe it is lost in the casting from object to helper class .

Anyway I started to get the feeling that maybe my whole approach is
inefficient.

So before I go completely off into the weeds what is the general
opinion on how to handle this task?

I am starting to think that maybe two collections of widgets maybe the
way to go. Then use tree views instead of dgvs

textfile -> Collection A.
db -> Collection B. (one B widget for each widget in A)
for( int i =0;i<A.Count;i++)
{
if (A.items != B.items
AddtoDiffFile(A.Items)
}

Thanks
Bob
 
N

Nicholas Paldino [.NET/C# MVP]

Bob,

Are you using a database server for this? You might want to consider
uploading the contents of the text file into a temp tabl on the server and
then find the differences using a query. I would say that it has the
potential of being much faster than you doing all the parsing and comparison
on the client side.

This would especially be the case if the text file is a delimited or
positional file of some sort, as you can use the bulk loader on SQL Server
(if that is what you are using) to import the data.
 
M

mark4asp

Hi,
My app needs to read a text file, compare the data to that already
stored in the DB and generate a text file of the differences.

The UI displays the text file data and the db data in a series of
Datagridviews.
Piclk a row in the text file master table dgv and the other dgvs move
to the appropriate records so you can eyeball the differences before
generating the difference file.

I have used a dataset that contains two 'arms'
The text file arm and the database arm.
BindingSources are used to make the dataset relationships available
to the code.

Works OK but when I feed it the full text file it takes a long time to
do the comparison work.
Profiling shows the most expensive part is moving through the dataset
with the bindingSource.Movenext.

So I figured that maybe using the threadpool to do the iteration and
comparison would speed things up.

Sort of
while(position + 1 < bindingSource.Count)
{
assign all the bindingsources and the dataset to a helperclass object
'u'
ThreadPool.QueueUserWorkItem(new WaitCallback (MyComparisionFunction),
u);
MasterBindingSource.MoveNext();
position++;
}

However the comparision code fails when it tries to get one of the
datarelaionships back from one of the bindingsources. "Relation not
found"
Intellisense indicates it is there but you can't argue with the
executing code.
Maybe it is lost in the casting from object to helper class .

Anyway I started to get the feeling that maybe my whole approach is
inefficient.

So before I go completely off into the weeds what is the general
opinion on how to handle this task?

I am starting to think that maybe two collections of widgets maybe the
way to go. Then use tree views instead of dgvs

textfile -> Collection A.
db -> Collection B. (one B widget for each widget in A)
for( int i =0;i<A.Count;i++)
{
if (A.items != B.items
AddtoDiffFile(A.Items)
}

Thanks
Bob


I've never found datasets to be very efficient in using resources. I
prefer to use a custom list like a List<T>, but there are better
Generics such as DataBindingList<T> (but I think these are all custom).

I can't understand Nicholas' solution as you specifically say the
differences need to be eye-balled by the client.

How big are these datasets anyway? You could do it entirely on the
client by downloading the data to javascript arrays or objects and
posting back only the differences using AJAX.
 
B

bob

Hi Mark,
My take on Nicholas's solution is that the database engine is more
efficent at set based comparisions than client code iterating through
the dataset. The result set could be displayed giving an ordinary list
of the differences.
On the client side I too lean towards lists of custom objects.

This project started life as a quick and dirty proof of concept.
The main concern was to see if the resultant 'difference' files would
be accepted by another app that maintains the database further
downstream.

The idea of the client being able to view the differences was based on
the notion that there wouldn't be too many differences. It was a bit
of a shock to to find a large disparity between the two sets.
So the point and click advantages of the linked Datagridviews, while
pretty, is not much use.
30000 parent records, approx 70% differences.

Thanks for your thoughts.
Regards
Bob
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top