Table Lookup vs Array Search?

P

PeteCresswell

Has anybody taken the time to compare how fast something can be found
by searching an array with how fast it can be found using a
Table.Seek?

On one hand, it seems like array searches can be really fast - even
faster if one takes the time to do it binarily instead of just from
top-to-bottom.

On the other hand, if the table is on a user's local drive, one would
think that some really smart people have optimized the .Seek routines
and maybe it could be as fast or faster with less developer hours
invested.

Specifically, I'm thinking of identifying business days. Saturday/
Sunday is a no-brainer, but the holidays currently involve disk I/O
for every determination.

I was thinking of copying the holiday table into a static/persistant
array in the routine, which gets re-built every N ticks (just to catch
the occasional user update to the table) and searched instead of doing
the disk I/O for each business day determination.

We're currently doing that disk I/O using Dynaset.FindFirst instead of
Table.Seek and it's kicking the brains out of whoever's PC is doing a
lot of business day determinations.

The question, then, seems to be whether to take the easy path and just
change over to Table.Seek or invest the man hours in the array
approach.
 
A

Albert D. Kallal

It seems to me the problem with the array approaches that you have to load
up that array and therfore pull a whole bunch of data out of a table.

if you have several thousand types of dates, and you pull them all into some
array, then you just pulled several thousand wrecks across your network and
created a lot of eye out when you really don't need to.

it would seem to me the best approach who would be to fill the array up with
a possible date conflicts, and then use a findfirst on that.

eg:

dim strSql as string
dim dtStartDate as string
dim dtEndDate as string
dim rstHolidays as dao.recordset

dtStartDate = #some mm/dd/yyyy start Date#
dtEndDate = #some mm/dd/yyyy end Date#

strSql = select * from tblHolidays where HolDate is between " & _
dtStartDate & " and " & dtEndDate

set rstHolidays = currentdb.OpenRecordSet(strSql).

You now have a very small in memory "array" of dates to test/scan.

In other words, the additional code and time to load up some array of dates,
then starting writing code to scan it is likely to take more time, more
processing, and actually wind up taking more resources than the above simple
records said that has a reduced amount of records in it, and also can be
indexed by high speed indexing on the date.

Seems to me the simple approach here is that for a given date range, simply
pull in the list of possible holidays into that recordset. AT that point
then you can do your record by record processing using findfirst. I can
hardly imagine an array of 12, or 15 reocrds being much slower then scanning
an array of those values....
 
D

David W. Fenton

:
We're currently doing that disk I/O using Dynaset.FindFirst
instead of Table.Seek and it's kicking the brains out of whoever's
PC is doing a lot of business day determinations.

I found for recordsets having a natural order on the field you're
searching with .FindFirst, it's possible to optimize your find to
choose .FindNext or .FindPrevious depending on whether or not the
value being sought is greater or lesser than the value of the
current recordset pointer.

So, if your last find landed you on Jan. 20, 2009, and you're now
looking for the first date after, say, Feb. 1, 2009, you'd use
..FindNext. If you were on May 1, 2009 and you were looking for the
first date after, you would use .FindPrevious. Since all the
recordset find operators can use > and < and >= and <= as comparison
operators, you can easily find the value by traversing the recordset
the least distance required.

The reason this is more efficient than .FindFirst is that .FindFirst
always starts at the beginning of the index.

And the reason I know this is because I programmed exactly this kind
of persistent recordset used for lookups and found that selecting
..FindNext or .FindPrevious depending on the current record pointer
vastly improved performance.

Here's code that I just used to test it:

Public Sub TestFindFirst(dteFind As Date)
Dim strSQL As String

strSQL = "SELECT InventoryID, LastUpdated FROM tblInventory"
strSQL = strSQL & " ORDER BY LastUpdated"
If rsDates Is Nothing Then
Set rsDates = dbLocal.OpenRecordset(strSQL)
rsDates.MoveFirst
End If
If dteFind > rsDates!LastUpdated Then
rsDates.FindNext "[LastUpdated]>#" & dteFind & "#"
Else
rsDates.FindPrevious "[LastUpdated]<=#" & dteFind & "#"
End If
Debug.Print "Next Date: " & rsDates!LastUpdated
End Sub

I didn't benchmark this particular code -- I'm posting it only to
show how it's done. But this is the method I used in the application
where I tested it and implemented it before, and it vastly improved
performance, since it wasn't making the trip through the whole index
each time it did the search.

I was quite surprised about this -- I assumed that .FindFirst would
be smarter than that.

I don't know how this approach compares to Seek, as I have never
considered Seek to be usable in any circumstances (I always need to
navigate through recordsets drawn from more than one table, so Seek
is never an option). I would think that Seek would be faster than
even my optimized .FindFirst. I don't see what you think you'd be
gaining with an array, though.

It seems to me that if you're navigating a single table, and it's a
Jet table, then that Seek would be the obvious winner. If it's not a
single table, the optimized .FindFirst ought to speed things up
significantly.
 
D

David W. Fenton

Seems to me the simple approach here is that for a given date
range, simply pull in the list of possible holidays into that
recordset. AT that point then you can do your record by record
processing using findfirst. I can hardly imagine an array of 12,
or 15 reocrds being much slower then scanning an array of those
values....

I think that you're ignoring the performance hit of populating the
recordset every time. A persistent recordset of all the possible
values is, I think, going to be a better "array" for lookup than
copying that same data into a VBA array. And recordset find
operations can be optimized (i.e., using .FindNext and .FindPrevious
according to the location of the recordset pointer in the persistent
recordset).

But as I said in my other post, if it's a single-table lookup and
it's a Jet table, I doubt that .Seek can be beat by any method,
either copying the data into an array, or optimizing your find
operation.
 
P

PeteCresswell

I think that you're ignoring the performance hit of populating the
recordset every time. A persistent recordset of all the possible
values is, I think, going to be a better "array"

But as I said in my other post, if it's a single-table lookup and
it's a Jet table, I doubt that .Seek can be beat by any method,
either copying the data into an array, or optimizing your find
operation.

I think I was jumping to cause on my OP.

Looking at the part of the code that determines Holiday, I see that
the recordset is not persistant.

My money's on the opening of the recordset every time the routine
fires being the real culprit.

I'm going to make it a persistant pointer to a dbOpenTable, do a .Seek
and see what happens to the benchmark numbers.
 
D

David W. Fenton

m:
I think I was jumping to cause on my OP.

Looking at the part of the code that determines Holiday, I see
that the recordset is not persistant.

My money's on the opening of the recordset every time the routine
fires being the real culprit.

I'm going to make it a persistant pointer to a dbOpenTable, do a
.Seek and see what happens to the benchmark numbers.

In terms of the issue of newly-added records, the .RecordCount for
the TableDef object should be accurate, seems to me, so you should
only have to requery when that doesn't match your persistent
recordset's recordcount.

Of course, that assumes that records are only ever added/deleted and
never updated. My guess is that a holiday table is not going to
changed too often.
 
P

PeteCresswell

But as I said in my other post, if it's a single-table lookup and
it's a Jet table, I doubt that .Seek can be beat by any method,
either copying the data into an array, or optimizing your find
operation.

Well, I re-wrote and benchmarked it.

Old Way:
- .FindFirst across the LAN
- Opened a recordset and closed it with each call.
- 1,000 iterations took 32,859 ticks

New Way:
- .Seek against a locally-cached copy of the holiday table
- dbOpenTable recordset is persistant. Gets created on the
first call, then stays there forever. Might add a self-reloading
feature every N minutes.... might not....
- 1,000 iterations took 16 ticks.

I'm a little suspicious of my tick count methodology bc sometimes
100-500 iterations comes up with zero ticks (i.e. < a thousanth of
a second - but the logic is so simple.....

To wit:
-----------------------------------------------------------------------------
Public Function IsHoliday_Benchmark(ByVal theIterations As Long) As
Long
' PURPOSE: To get a performance number for IsBizDay so we can
' see if making it's recordset persistant and changing
' .FindFirst to .Seek makes a significant improvement


' Old Routine:
' 100 => 3,297
' 1,000 => 32,859

' New Routine:
' 100 => 16
' 1,000 => 16
' 10,000 => 63
' 100,000 => 719
' 1,000,000 => 7,109

Dim i As Long
Dim s As String

Dim tickCount_Begin As Long
Dim tickCount_End As Long
Dim tickCount_Elapsed As Long
Dim gotBizDay As Boolean

Const myDate As Variant = #1/19/2009#
Const myTickLen As Long = 8
Const myIterationLen As Long = 11

' Debug.Print GetTickCount()
tickCount_Begin = GetTickCount()

For i = 1 To theIterations
gotBizDay = IsHoliday(myDate, s)
' Debug.Print s
Next i

' Debug.Print GetTickCount()
tickCount_End = GetTickCount()

tickCount_Elapsed = tickCount_End - tickCount_Begin

' Debug.Print tickCount_Begin
' Debug.Print tickCount_End
IsHoliday_Benchmark = tickCount_Elapsed
Debug.Print Space(myIterationLen - Len(Format$(theIterations,
"#,##0"))) & Format$(theIterations, "#,##0") & ": " & Space(myTickLen
- Len(Format$(tickCount_Elapsed, "#,##0"))) & Format$
(tickCount_Elapsed, "#,##0") & String(5, ".") & s
End Function
-----------------------------------------------------------------------------
 
D

David W. Fenton

m:
Old Way:
- .FindFirst across the LAN
- Opened a recordset and closed it with each call.
- 1,000 iterations took 32,859 ticks

Did you test this with a persistent recordset?

And then with optimized Find operations? I'd be really interested in
how the latter compares to Seek.
 
P

PeteCresswell

Did you test this with a persistent recordset?

And then with optimized Find operations? I'd be really interested in
how the latter compares to Seek.

No. Just did the whole enchilada instead of stage-by-stage.

Maybe later I'll go back and revisit, but right now I'm totally
overwhelmed with user demands.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top