Searching a table

  • Thread starter Thread starter Geoff Jones
  • Start date Start date
Hi Bernie

I hope I'm not putting you to too much trouble. You've been invaluable
already!

At the moment, I'm getting a list of approximately 10,000 ID's i.e. primary
keys from another process which give me the rows in the 200,000 total. As I
said previously, I was originally getting these rows by using a
Table.Rows.Find. From these 10,000 or so rows, I'm trying to filter them
down to those which satisfy a PRODUCT field criteria e.g. PRODUCT IN
('3454', '324', '7855', etc.)

Indeed, my next step may be to do something like (forgive the pseudo code):

WHERE PRODUCT = '2345' AND COST = '3.45'

where COST is another field.

I had considered the possibility of using multiple tables i.e. searching a
table for rows, creating a new table, searching this new table with a new
criteria etc. etc. in the same way that Cor has suggested. I'm interested in
both techniques i.e. cloning and SQL type coding. What are the advantages
and disadvantages of both?

Thanks again for your continuing help.

Geoff
 
Hi Geoff,

Not putting me out at all; glad to help.

But I'm still confused. You originally said this:
I have a list of about 100 numbers which correspond to the ID field
and also another 40 say numbers corresponding to the numbers in the
PRODUCT field.
Now you mentioned 10000 - quite a difference.

In any event, that may not be an issue. You say now you have these 10,000
rows - in a dataset/datatable? I have a code sample below that's very fast
and might serve as a template for you. It fills a dataset and then makes a
dataview from the datatable and then filters the rows as necessary, using 2
criteria (replace these with product and cost).

Let me know if we're getting somewhere.

Regards,

Bernie
Dim oconn As New SqlConnection("data source=d5z0071;initial
catalog=imc;integrated security=sspi;")

Dim ocmd As New SqlCommand("select * from prod order by bipad", oconn)

Dim oda As New SqlDataAdapter(ocmd)

Dim ods As New DataSet("prod")

Try

oconn.Open()

Catch ex As Exception

MessageBox.Show(ex.Message)

End Try

oda.Fill(ods, "prod")

Dim vue As New DataView(ods.Tables(0))

vue.Sort = "bipad"

vue.RowFilter = "bipad in ('18772','18770','01043') and issuecode =
'200304'"

Dim irow As DataRow

Dim irows As DataRow()

Dim i As Integer

For i = 0 To vue.Count - 1

MessageBox.Show(vue(i)("bipad"))

MessageBox.Show(vue(i)("issuecode"))

MessageBox.Show(vue(i)("usprice"))

Next

oconn.Close()
 
Hiya Bernie

The difference i.e. 100 as opposed to 10000 was merely the range of
possibilities. Sorry for the confusion.

The rows are in a DataSet; but also in a table - which is stored in the
DataSet.

The code looks excellent. I'll give it a try tomorrow.

Geoff
 
HI Bernie

Thanks for the code. I've learned a lot in the last few days with your and
others help. I do like the idea of using "IN" but unfortunately it does seem
to be slower than my original idea. I've written some code based on Cor's
which demonstrates the difference:

The first bit of code filters the dataset using "IN" and the last bit does
it using a "Find". When I ran it this morning, the SQL code took 41 seconds
whilst the Find took 0.03 seconds. Quite a difference.

Obviously, I couldn't use the same technique i.e. "Find", from the outset,
using a non-primary key (because it needs a primary key to work).

Maybe the best approach is a mixture of both techniques?

Geoff

P.S. Code is below:

Dim dt As New DataTable

Private Sub Form1_Load(ByVal sender _

As System.Object, ByVal e As System.EventArgs) _

Handles MyBase.Load

dt.Columns.Add("ID")

dt.Columns.Add("PRODUCT")

dt.Columns.Add("COST")

For i As Integer = 0 To 200000

dt.Rows.Add(dt.NewRow)

dt.Rows(i)(0) = i.ToString

Dim rg As Random = New Random(i)

Dim rn As Integer = CInt(rg.Next(0, 200000))

dt.Rows(i)(1) = rn.ToString

rg = New Random(rn)

rn = CInt(rg.Next(0, 200000))

dt.Rows(i)(2) = rn.ToString

Next

End Sub

Private Sub Button1_Click(ByVal sender As System.Object, _

ByVal e As System.EventArgs) Handles Button1.Click

Dim start As DateTime = DateTime.Now

Dim numOfRows As Integer = dt.Rows.Count

Dim vue As DataView = New DataView(dt)

vue.Sort = "ID"

vue.RowFilter = "ID IN
('139211','139215','139214','139213','139212','139211','139210','139259','139258','139257','139251','139255','139254','139253','139252','139251','139250','139249','139248','139247','139241','139245','139244','139243','139242','139241','139240','139239','139238','139237','139231','139235','139234','139233','139232','139231','139230','139229','139228','139227','139221','139225','139224','139223','139222','139221','139220','139219','139218','139217','139211','139215','139214','139213','139212','139211','139210','139209','139208','139207','139201','139205','139204','139203','139202','139201','139200','139199','139198','139197','139191','139195','139194','139193','139192','139191','139190','139189','139188','139187','139181','139185','139184','139183','139182','139181','139180','139179','139178','139177','139171','139175','139174','139173','139172','139171','139170','139119','139118','139117','139111','139115','139114','139113','139112','139111','139110','139159','139158','139157','139151','139155','139154','139153','139152','139151','139150','139149','139148','139147','139141','139145','139144','139143','139142','139141','139140','139139','139138','139137','139131','139135','139134','139133','139132','139131','139130','139129','139128','139127','139121','139125','139124','139123','139122','139121','139120','139119','139118','139117','139111','139115','139114','139113','139112','139111','139110','139109','139108','139107','139101','139105','139104','139103','139102','139101','139100','139099','139098','139097','139091','139095','139094','139093','139092','139091','139090','139089','139088','139087','139081','139085','139084','139083','139082','139081','139080','139079','139078','139077','139071','139075','139074','139073','139072','139071','139070','139019','139018','139017','139011','139015','139014','139013','139012','139011','139010','139059','139058','139057','139051','139055','139054','139053','139052','139051','139050','139049','139048','139047','139041','139045','139044','139043','139042','139041','139040','139039','139038','139037','139031','139035','139034','139033','139032','139031','139030','139029','139028','139027','139021','139025','139024','139023','139022','139021','139020','139019','139018','139017','139011','139015','139014','139013','139012','139011','139010','139009','139008','139007','139001','139005','139004','139003','139002','139001','139000','138999','138998','138997','138991','138995','138994','138993','138992','138991','138990','138989','138988','138987','138981','138985','138984','138983','138982','138981','138980','138979','138978','138977','138971','138975','138974','138973','138972','139275','139274','139273','139272','139271','139270','139219','139218','139217')"

DataGrid1.DataSource = dt

Dim finish As DateTime = DateTime.Now

Dim time As TimeSpan = finish.Subtract(start)

Debug.WriteLine("Time taken = " & time.TotalSeconds)

DataGrid1.DataSource = dt

dt.PrimaryKey = New DataColumn() {dt.Columns("ID")}

start = DateTime.Now

Dim newTable As DataTable = dt.Clone

Dim str1() As String = {"139211", "139215", "139214", "139213", "139212",
"139211", "139210", "139259", "139258", "139257", "139251", "139255",
"139254", "139253", "139252", "139251", "139250", "139249", "139248",
"139247", "139241", "139245", "139244", "139243", "139242", "139241",
"139240", "139239", "139238", "139237", "139231", "139235", "139234",
"139233", "139232", "139231", "139230", "139229", "139228", "139227",
"139221", "139225", "139224", "139223", "139222", "139221", "139220",
"139219", "139218", "139217", "139211", "139215", "139214", "139213",
"139212", "139211", "139210", "139209", "139208", "139207", "139201",
"139205", "139204", "139203", "139202", "139201", "139200", "139199",
"139198", "139197", "139191", "139195", "139194", "139193", "139192",
"139191", "139190", "139189", "139188", "139187", "139181", "139185",
"139184", "139183", "139182", "139181", "139180", "139179", "139178",
"139177", "139171", "139175", "139174", "139173", "139172", "139171",
"139170", "139119", "139118", "139117", "139111", "139115", "139114",
"139113", "139112", "139111", "139110", "139159", "139158", "139157",
"139151", "139155", "139154", "139153", "139152", "139151", "139150",
"139149", "139148", "139147", "139141", "139145", "139144", "139143",
"139142", "139141", "139140", "139139", "139138", "139137", "139131",
"139135", "139134", "139133", "139132", "139131", "139130", "139129",
"139128", "139127", "139121", "139125", "139124", "139123", "139122",
"139121", "139120", "139119", "139118", "139117", "139111", "139115",
"139114", "139113", "139112", "139111", "139110", "139109", "139108",
"139107", "139101", "139105", "139104", "139103", "139102", "139101",
"139100", "139099", "139098", "139097", "139091", "139095", "139094",
"139093", "139092", "139091", "139090", "139089", "139088", "139087",
"139081", "139085", "139084", "139083", "139082", "139081", "139080",
"139079", "139078", "139077", "139071", "139075", "139074", "139073",
"139072", "139071", "139070", "139019", "139018", "139017", "139011",
"139015", "139014", "139013", "139012", "139011", "139010", "139059",
"139058", "139057", "139051", "139055", "139054", "139053", "139052",
"139051", "139050", "139049", "139048", "139047", "139041", "139045",
"139044", "139043", "139042", "139041", "139040", "139039", "139038",
"139037", "139031", "139035", "139034", "139033", "139032", "139031",
"139030", "139029", "139028", "139027", "139021", "139025", "139024",
"139023", "139022", "139021", "139020", "139019", "139018", "139017",
"139011", "139015", "139014", "139013", "139012", "139011", "139010",
"139009", "139008", "139007", "139001", "139005", "139004", "139003",
"139002", "139001", "139000", "138999", "138998", "138997", "138991",
"138995", "138994", "138993", "138992", "138991", "138990", "138989",
"138988", "138987", "138981", "138985", "138984", "138983", "138982",
"138981", "138980", "138979", "138978", "138977", "138971", "138975",
"138974", "138973", "138972", "139275", "139274", "139273", "139272",
"139271", "139270", "139219", "139218", "139217"}

For Each st As String In str1

Dim row As DataRow = dt.Rows.Find(st)

If newTable.Rows.Find(st) Is Nothing Then

newTable.ImportRow(row)

End If

Next

finish = DateTime.Now

time = finish.Subtract(start)

Debug.WriteLine("Time taken = " & time.TotalSeconds)

DataGrid1.DataSource = newTable

End Sub
 
Geoff,
Obviously, I couldn't use the same technique i.e. "Find", from the outset,
using a non-primary key (because it needs a primary key to work).
Actually you could have!

You should be able to use DataView.Find or DataView.FindRows on non-primary
keys with about the same performance as DataRowCollection.Find.

When you create the DataView, set the DataView.Sort property to the
non-primary key columns you are searching on, do not need to set the
RowFilter property.

Then rather then using DataTable.Rows.Find, use DataView.Find or
DataView.FindRows! DataView.Find returns an integer index into the DataView,
while DataView.FindRows returns DataRowView objects...

Something like:
Dim newTable As DataTable = dt.Clone

For Each st As String In str1

Dim row As DataRow = dt.Rows.Find(st)

If newTable.Rows.Find(st) Is Nothing Then

newTable.ImportRow(row)

End If

Next

dt.PrimaryKey = Nothing

Dim newTable As DataTable = dt.Clone
newTable.PrimaryKey = New DataColumn() {newTable.Columns("ID")}

Dim view As New DataView(dt)
view.Sort = "ID"

For Each key As String In str1
For Each row As DataRowView In view.FindRows(key)
If Not newTable.Rows.Contains(key) Then
newTable.ImportRow(row.Row)
End If
Next
Next

I don't believe the newTable.Rows.Contains is really needed. I left it in as
you had it in your original code.

Hope this helps
Jay
 
Hi Bernie and Jay

Many thanks for all your help. As always it has been very invaluable to
learn from you guys.

Jay - the reason I used Contains was because otherwise I got an error saying
that the row already existed. If you comment it out, you'll see that it
causes an exception in the code I posted.

Geoff
 
Geoff,
The values in your keys array (str1) are not unique, hence the duplicates...

I would consider ensuring that your keys array is distinct before creating
the newTable. Alternatively I would only process unique values in str1 (by
sorting it first)...

Array.Sort(str1)
Dim lastkey As String = Nothing

For Each key As String In str1
If key <> lastkey Then
For Each row As DataRowView In view.FindRows(key)
newTable.ImportRow(row.Row)
Next
lastkey = key
End If
Next

However! I am not saying using Contains is a bad thing either :-) Which
method to use would require profiling to decide, naturally it may change
over the life of your program.

Of course if you can ensure that your keys array is unique to begin with,
then that might be "best"...

Hope this helps
Jay

Geoff Jones said:
Hi Bernie and Jay

Many thanks for all your help. As always it has been very invaluable to
learn from you guys.

Jay - the reason I used Contains was because otherwise I got an error
saying that the row already existed. If you comment it out, you'll see
that it causes an exception in the code I posted.

Geoff
<<snip>>
 
Bernie,

I have the idea that you did not understand why my first message was for in
this thread and why it was so slow with the first attempts of Geoff.

There is no dataview *select* function.

What probably (that deep did I not look for it) is happening in the mix from
Geoff and your first code, (it is not that consistent that I know where it
is) is that there is created a dataview (defaultview) with a rowfilter.

That is a rowfilter for a datatable that is referenced in that dataview.

So when you filter 100 rows you can loop through that very fast.

However, when you use this.
Dataview.table.select you are selecting again from that datatable from
200.000 rows.

Therefore I showed it with that cloned and copied table.

I hope this makes it more clear why I wrote this in this thread.

Cor
 
Hi Jay

Good advice as always.

You mention "profiling". Is this some way of deciding which routines are the
quickest? If so, I'd be most interested in knowing how the experts do this!
I, in my naive way, have probably been doing it the slow way i.e. I wrap a
routine in between two time variables and use TimeSpan to work out how many
seconds has passed. Is there an easier way to do this using the Visual
Studio environment?

Geoff
 
Hi Cor,

It is more clear to me now and I do understand that the 'double' select is
the cause of the delay.

Tx,

Bernie
 
Geoff,
I wrap a routine in between two time variables and use TimeSpan to work
out how many seconds has passed.
For general stuff like this that is how I normally do it.

For more specific profiling I use CLR Profiler (or other similiar tools).


Just Remember the 80/20 rule. That is 80% of the execution time of your
program is spent in 20% of your code. I will optimize (worry about
performance, memory consumption) the 20% once that 20% has been identified &
proven to be a performance problem via profiling (CLR Profiler is one
profiling tool).

For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at
http://martinfowler.com/ieeeSoftware/yetOptimization.pdf

Info on the CLR Profiler:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenethowto13.asp

http://msdn.microsoft.com/library/d...y/en-us/dndotnet/html/highperfmanagedapps.asp

Hope this helps
Jay
 
Excellent. Thanks Jay

Geoff

Jay B. Harlow said:
Geoff,
For general stuff like this that is how I normally do it.

For more specific profiling I use CLR Profiler (or other similiar tools).


Just Remember the 80/20 rule. That is 80% of the execution time of your
program is spent in 20% of your code. I will optimize (worry about
performance, memory consumption) the 20% once that 20% has been identified
&
proven to be a performance problem via profiling (CLR Profiler is one
profiling tool).

For info on the 80/20 rule & optimizing only the 20% see Martin Fowler's
article "Yet Another Optimization Article" at
http://martinfowler.com/ieeeSoftware/yetOptimization.pdf

Info on the CLR Profiler:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnpag/html/scalenethowto13.asp

http://msdn.microsoft.com/library/d...y/en-us/dndotnet/html/highperfmanagedapps.asp

Hope this helps
Jay
 
Back
Top