Retreiving Rows Related to In-Memory DataTable

D

Danny Shisler

I'm trying to work out the most efficient method of returning rows related
to an in-memory DataTable and could do with some advice.

The situation is that I am loading records in from external files directly
into a ADO.net DataTable. These are frequently very large (c.100,000
records).

Once loaded, I perform a certain amount of preprocessing and then retrive
related values from lookup tables in a SQL Server 2000 database. The lookup
tables themselves can be very large (in one case running to millions of
rows) and often have validity dates on individual rows

The original DataTable is stored in DataSet and I'd like to be able to
retrieve related rows from the lookup tables and store them in a related
DataTable in the DataSet so I can perform some further processing (and often
then retrieve further information). Obviously, because of the size of some
of the lookup tables, I don't want to retrieve the entire tables.

I can't work out an effecient way of acheiving this. I do have complete
control over the SQL Server so if something like SQLXML would help then I
could install it.

Any help would be gratefully received.

Thanks
 
W

William Ryan

Danny, if you have to use huge recordsets all at once, speed is going to be
an issue. The best advice I can give is to use highly normalized data (so
you neither pull over nor store anything you don't need), and try to work
with as little of it as possible at any given time. If you look at the way
streaming works over the net, or buffered streams that you may work with,
you pull little pieces of what you need as you need them. The alternative
may to download the whole file, but that takes a Lot longer.

For Huge records, XML isn't going to automagically solve anything, and in
virutally any instance I can think of, it'd be much slower than a well
designed db with small tables, and solid contiguous indices.

Break up what you can and really focus on getting only what you need. This
will cause you to connect more frequently than if you just connected once
and got everythign you need, but millions of records will take a while to
pull (your Network Admin probably won't talk to you anymore), and may take
up so much space on the client machine that performance will become a deal
breaker. Filter, sift, index, tune, normalize and write the most precise
queries you can....if you have to deal with that much data. And buy a lot
of RAM and strong processors.

HTH,

Bill
 
B

Bernie Yaeger

Hi Danny,

There may be a few ways to handle this, depending on the circumstances.

Have you looked at filtered rows, datatable selects, filtered dataviews?
Also, as Bill mentioned, gather as little as you need - select only the
columns you need to work with.

Also, consider creating temp tables with select into sp's and joins, perhaps
even first creating a temp table from your datatable in sql server so that a
join can be fashioned on the server.

HTH,

Bernie Yaeger
 
D

Danny Shisler

Thanks for your response(s) - I agree with the general sentiments - filter
retrieve etc. In a sense that's what my original question was about.

If I had a newly imported DataTable of 100,000 rows containing references to
purchases of 5000 distinct components out of a list of 1,000,000 possible
components (this isn't the exact type of data but it's a useful analogy),
what I'm trying to do is retrieve only those 5000 components and associated
reference data from the lookup table. Ideally I'd like to be able to do a
join between the in memory DataTable and the SQL Server Lookup table which
isn't possible using current technology (that I'm aware of). Creating a
temp table, is something I've considered, although I had hoped there was a
better way of doing.it. The application is server-based, so network usage
is less of an issue than if it were a client application however it's still
very inefficient to retrieve the complete table.

I could put all the logic in SQL Server, but this is the 2nd version of this
application (the 1st of which was written purely in SQL) and I'm keen to be
able to take advantage of .net features for key parts of the data-processing
and application configuration. Unfortunately, I can't wait for Yukon.

There may not be another answer to my question, but in any case thanks again
for your help.

Danny
 
B

Bernie Yaeger

Hi Danny,

OK, let's talk about the practical need you currently have. Here's how I do
it:
I create an array or arraylist of the 5000 let's say 'id numbers'; I then
read those into a longstring (any ol' string - it can be so big you needed
worry about it) with delimiters in a loop:
Dim toarraystring As String

Dim acount As Integer = 0

For i = 0 To toarray.Length - 1

acount += 1

toarraystring += Chr(39) & Trim(toarray(i)) & Chr(39) & ","

If i <> toarray.Length - 1 Then ' ie, it isn't the last item

If acount > 20 Then

acount = 0

toarraystring += vbCrLf

End If

End If

Next

toarraystring = Mid(toarraystring, 1, toarraystring.Length - 1) & ")"

Then I create an sp (after deleting it in code) like this:

Dim dcmd As New SqlCommand()

dcmd = New SqlCommand("sp_dropsp_passarray", oconn) ' drop the sp

dcmd.CommandType = CommandType.StoredProcedure

Try

dcmd.ExecuteNonQuery()

Catch ex As Exception

MessageBox.Show(ex.message)

End Try

Dim creationstring As String

creationstring = "CREATE PROCEDURE sp_passarray AS " _

& "if exists (select * from information_schema.tables where table_name = " _

& "'a_rsumtable')" & vbCrLf & "drop table a_rsumtable" & vbCrLf _

& "select imcacct, pubcode, invnum, inv_dt, brname, " _

& "(case when inv_dt + 31 > getdate() then balance else 0 end) as under31, "
_

& "(case when inv_dt + 61 > getdate() and inv_dt + 31 <= getdate() then
balance else 0 end) as over30, " _

& "(case when inv_dt + 91 > getdate() and inv_dt + 61 <= getdate() then
balance else 0 end) as over60, " _

& "(case when inv_dt + 121 > getdate() and inv_dt + 91 <= getdate() then
balance else 0 end) as over90, " _

& "(case when inv_dt + 121 <= getdate() then balance else 0 end) as over120"
_

& " into a_rsumtable from a_r where imcacct" _

& " in (" & toarraystring & vbCrLf _

& "order by pubcode, imcacct"

sqladapt.SelectCommand = New SqlCommand(creationstring, oconn)

Try

sqladapt.SelectCommand.ExecuteNonQuery() ' create the sp

Catch ex As Exception

MessageBox.Show(ex.Message)

End Try

Dim ocmd As New SqlCommand()

ocmd = New SqlCommand("sp_passarray", oconn) ' execute the sp

ocmd.CommandType = CommandType.StoredProcedure

Try

ocmd.ExecuteNonQuery()

Catch ex As Exception

MessageBox.Show(ex.message)

End Try

Now this is what I've done: I've created a string that passes in the data of
an array into an sp that looks like 'invnum in ('1234','5456') etc so that
in effect I now capture only those with invnums in that group - like
filtering on the results of the array. If you're wondering, both the string
variable and the sp size can accommodate almost anything up to several gig,
so size is never an issue.

Let me know if this helps.

Bernie
 
D

Danny Shisler

Bernie,

Thanks very much for that.

That's certainly a much better solution than anything I've come up with so
far.

One question: Is it necessary to create the sp and the temp table? Is there
any reason why I shouldn't just create the select statement, execute it and
read the results directly into a DataTable?

Thanks again

Danny
 
D

Danny Shisler

William,

Thanks for your response. I've actually responded to both your and Bernie's
responses in the other thread,.

Danny
 
B

Bernie Yaeger

Hi Danny,

No; no reason to do anything but create the select. The sp and/or the temp
table are simply different ways of persisting the data, but they are not
necessary to achieve your goals.

Bernie
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top