how to speed up VLOOKUP?

K

kane

hello!

I am currently working with large worksheets with lots of data and I
need to use VLOOKUP (or similar) quite often.

I know about complexity theory and thus I already make use of MATCH
and INDEX to reduce the number of calculated lookups. My lookups work
with exact data, so I can't use the approximate version of VLOOKUP.
But I can provide sorted lists.

an approximated lookup in a sorted list should take O(log n) time, an
exact lookup does not expect the lsit to be sorted and thus should
take O(n) time.

So i thought of sorting the list, making an approximated search via
MATCH and then compare the search value with INDEX(...;MATCH...;).
that's 2 O(log n) operations. if they match, i am happy, otherwise
nothing can be found.
Fortunately, this solution is extremely fast.

I then wanted to shorten this big expression from above using user
defined functions. i did not make the mistake to access the values of
the ranges, i only used "Application.Match" and "Application.Index"
and such inside this UDF so that there should not be any problems
here. anyway ... this personal function is still quite slow...

i come to think that there is some kind of conversion happening when
excel switches to vba and then back to excel, which slows down speed.

so... the best idea i have right now is to program some replacement-
macro, which replaces the imaginary function "=SPEEDVLOOKUP(...)" with
the IF-MATCH-INDEX-monster I mentioned before. but that's not a nice
solution.....

any ideas?
 
P

Pete_UK

If your data is sorted then you could program your own binary search
to look for a match, and this would speed things up. However, UDFs
execute more slowly than built-in functions (which are compiled), so
you may not notice much of an improvement.

Pete
 
N

Niek Otten

If your data is sorted:

Table in A1:B5, lookup value in C1.

In D1:
=MATCH(C1,A1:A5,1)
In E1:
=IF(INDEX(A1:A5,D1)=C1,INDEX(B1:B5,D1),NA())

Lightning fast and returns #NA if no exact match.

--
Kind regards,

Niek Otten
Microsoft MVP - Excel

| hello!
|
| I am currently working with large worksheets with lots of data and I
| need to use VLOOKUP (or similar) quite often.
|
| I know about complexity theory and thus I already make use of MATCH
| and INDEX to reduce the number of calculated lookups. My lookups work
| with exact data, so I can't use the approximate version of VLOOKUP.
| But I can provide sorted lists.
|
| an approximated lookup in a sorted list should take O(log n) time, an
| exact lookup does not expect the lsit to be sorted and thus should
| take O(n) time.
|
| So i thought of sorting the list, making an approximated search via
| MATCH and then compare the search value with INDEX(...;MATCH...;).
| that's 2 O(log n) operations. if they match, i am happy, otherwise
| nothing can be found.
| Fortunately, this solution is extremely fast.
|
| I then wanted to shorten this big expression from above using user
| defined functions. i did not make the mistake to access the values of
| the ranges, i only used "Application.Match" and "Application.Index"
| and such inside this UDF so that there should not be any problems
| here. anyway ... this personal function is still quite slow...
|
| i come to think that there is some kind of conversion happening when
| excel switches to vba and then back to excel, which slows down speed.
|
| so... the best idea i have right now is to program some replacement-
| macro, which replaces the imaginary function "=SPEEDVLOOKUP(...)" with
| the IF-MATCH-INDEX-monster I mentioned before. but that's not a nice
| solution.....
|
| any ideas?
 
K

kane

If your data is sorted:

Table in A1:B5, lookup value in C1.

In D1:
=MATCH(C1,A1:A5,1)
In E1:
=IF(INDEX(A1:A5,D1)=C1,INDEX(B1:B5,D1),NA())

Lightning fast and returns #NA if no exact match.

--
Kind regards,

Niek Otten
Microsoft MVP - Excel

That's exactly what I meant with "match-index-monster". I would like
to have a simple function that also my colleagues could use.


@Pete: I assumed that "match" does a binary search if you use it on a
sorted list (with parameter FALSE), so no need to program this search
by myself. I also read that UDFs are much more slowly, pitily.

Right now I wonder whether match does really do a binary search or
whether vlookup does sort large lists hidden in memory to speed up
things.
i created 3 excel test sheets. in all three there are 40.000 values to
look up and the vector to be searched is of the same size. hence, a
normal vlookup should normally use about 40.000*(a*40.000+d)
operations. dobule-match-index-lookups should have about
40.000*(b*2*(lb 40.000)+c+d) operations. a is the time vlookup needs
to check one cell, d is the time to transfer cell value, set flags and
so on, b is the time match needs for one calculation step, c the time
for index to extract a certain cell value. considering different
code... well there should still be at least 99% less time needed if
double-match-index-lookup is used.
i tested vlookup vs. my code vs. double-match-index-lookup and those
are the times:
52 sec / 13 sec / 12 sec

the good message in here: my own code works almost as fast as the
match-index-solution which is 4 times as fast as vlookup.
the bad message: either vlookup or match does not work as expected (or
both, of course).

so... problem solved, but things are still bad :-(
 
P

Pete_UK

I think if you use a FALSE parameter in MATCH or VLOOKUP (i.e. looking
for an exact match), then Excel does a sequential search as the data
does not need to be sorted and there is no way to tell Excel that the
data is sorted (I've often thought that it would be useful to have a
5th parameter in VLOOKUP to inform Excel that the data is sorted).
When you use a TRUE value then the data MUST be sorted beforehand, and
Excel uses a binary search algorithm, which is much quicker.

You can speed things up if you have a lot of VLOOKUPs getting data
from the same matching row by having a MATCH formula in a helper
column to find the matching row, and then have an INDEX formula which
uses the value from the match, so that the searching for the matching
row is only done once.

Hope this helps.

Pete
 
C

Charles Williams

You can only make Excel MATCH and VLOOKUP use binary search by telling them
that the data is sorted: there is no automatic detection of sorted data.
The drawback to using binary search as implemented by Excel is that you get
the nearest match for missing data rather than #N/A, so you have to use the
trick outlined by Niek (note the 1 as final Match argument) to trap data not
found.
See also http://www.decisionmodels.com/optspeede.htm

There is a long-standing bug associated with VBA UDFs that refreshes the VBE
title bar once for each calculated instance of a UDF, unless calculation is
initiated from VBA (using something like Application.Calculate).
This bug drastically slows down Excel calculations if you have large numbers
of UDFs in formulae, unless you bypass by using Manual Calc mode and
trapping F9 etc.
See also http://www.decisionmodels.com/calcsecretsj.htm

Assuming that a UDF LOOKUP/MATCH is correctly programmed to use Range
objects etc, thus avoiding the Excel to VBA data transfer overhead, and
avoids the VBE refresh bug, it will be very nearly as fast as a native
LOOKUP/MATCH, and faster if programmed to avoid the double LOOKUP or using
better algorithms.

Also worth noting that using Application.WorksheetFunction.Match is faster
that using Application.Match.


Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com
 
D

Don Guillett

Charles, Please share why is this so?

Also worth noting that using Application.WorksheetFunction.Match is faster
that using Application.Match.
 
C

Charles Williams

Don,

The only apparent difference is the way errors are handled (returning an
error value as opposed to raising an error), so I don't know why
WorksheetFunction is faster, it just is (20-30%).

Charles
__________________________________________________
The Excel Calculation Site
http://www.decisionmodels.com
 
N

Niek Otten

Hi Kane,

UDFs are slow, but that's only relative.
I did a few tests on my almost 5 years old Pentium.

I copy my VBA UDF below. It's about a 100 times faster than a workssheet VLOOKUP with FALSE as a 4th argument.
I also include an XLM version (not to be confused with XML); it is again twice as fast.
Both solutions have a simple, understandable User Interface.

A combination of MATCH and INDEX is again 10 times faster, but is a bit more tedious to implement ("a monster", as you chose to
call it).

But what are we talking about?
My example (which I can mail you if you wish), has a 64k table and 1000 VLOOKUP formulas, with random search arguments.
VLOOKUP/FALSE solution: 5 seconds.
VBA solution: 0.11 seconds
XLM solution: .055 seconds
MATCH/INDEX solution: .0043 seconds

Even on my oldtimer it's impossible to notice the difference between the last 3 solutions. Of course, if you would have 64K
formulas, it could be noticed.

There are also differences if the number of matches varies. One would think that more matches mean more double VLOOKUPs so require
more time, but that does not seem to be the case. Apparently returning #NA requires more time.

The XLM solution could be almost twice as fast again if I could find a way to imitate the MATCH/INDEX approach. But unless I
require the lookup table to be passed as an argument twice (once only the fisrt column, once the whole table) I don't know how to
isolate the first column for the MATCH function. My XLM knowledge has faded away quite a bit!

Any suggestions are welcome!

--
Kind regards,

Niek Otten
Microsoft MVP - Excel

=================================================================
Function VLookupSort(SearchArgument As Range, SearchTable As Range, _
ColumnNo As Long, Optional SortDirection, Optional NotFound)
' Works as Vlookup, exact match (4th argument = FALSE)
' But takes advantage of the fact that a table is sorted
' and thus is much faster
' Also permits table to be sorted descending (Sortdirection -1)
' Optional argument for return value if item not found, defaults to #NA
Dim ItemFound

If IsMissing(SortDirection) Then SortDirection = 1

ItemFound = Application.Match(SearchArgument, Intersect(SearchTable, SearchTable.Cells(1).EntireColumn), _
SortDirection)
If SearchTable(ItemFound, 1) <> SearchArgument Then
If IsMissing(NotFound) Then
VLookupSort = CVErr(xlErrNA)
Else
VLookupSort = NotFound
End If
Else
VLookupSort = _
SearchTable(ItemFound, ColumnNo)
End If
End Function
=================================================================
The XLM solution. Create an XLM sheet; CTRL+F11. Paste the code below into A1 and down.
Insert a defined name (VlookupSortXLM) for A1.
=================================================================
VlookupSortXLM
=ARGUMENT("SearchArgument",1)
=ARGUMENT("SearchTable",8)
=ARGUMENT("ColumnNo",1)
=VLOOKUP(SearchArgument,SearchTable,1)
=IF(A5=SearchArgument,VLOOKUP(SearchArgument,SearchTable,ColumnNo),NA())
=RETURN(A6)
=================================================================




| > If your data is sorted:
| >
| > Table in A1:B5, lookup value in C1.
| >
| > In D1:
| > =MATCH(C1,A1:A5,1)
| > In E1:
| > =IF(INDEX(A1:A5,D1)=C1,INDEX(B1:B5,D1),NA())
| >
| > Lightning fast and returns #NA if no exact match.
| >
| > --
| > Kind regards,
| >
| > Niek Otten
| > Microsoft MVP - Excel
| >
|
| That's exactly what I meant with "match-index-monster". I would like
| to have a simple function that also my colleagues could use.
|
|
| @Pete: I assumed that "match" does a binary search if you use it on a
| sorted list (with parameter FALSE), so no need to program this search
| by myself. I also read that UDFs are much more slowly, pitily.
|
| Right now I wonder whether match does really do a binary search or
| whether vlookup does sort large lists hidden in memory to speed up
| things.
| i created 3 excel test sheets. in all three there are 40.000 values to
| look up and the vector to be searched is of the same size. hence, a
| normal vlookup should normally use about 40.000*(a*40.000+d)
| operations. dobule-match-index-lookups should have about
| 40.000*(b*2*(lb 40.000)+c+d) operations. a is the time vlookup needs
| to check one cell, d is the time to transfer cell value, set flags and
| so on, b is the time match needs for one calculation step, c the time
| for index to extract a certain cell value. considering different
| code... well there should still be at least 99% less time needed if
| double-match-index-lookup is used.
| i tested vlookup vs. my code vs. double-match-index-lookup and those
| are the times:
| 52 sec / 13 sec / 12 sec
|
| the good message in here: my own code works almost as fast as the
| match-index-solution which is 4 times as fast as vlookup.
| the bad message: either vlookup or match does not work as expected (or
| both, of course).
|
| so... problem solved, but things are still bad :-(
 
K

kane

Hi all again!

First I want to say sorry that I didn't respond until now, but I've
been away for a business trip the last days and had no internet. I
also want to apologize right now for my bad english - its not my
native language. But it seems until now you all understood my
problems :)

I coded a bit at home and used another testing environment (an old pc)
and could improve time differences significantly (65000 lookups with
vlookup / vba-code / double-match-index-thingy: 200s / 24s / <1s - how
did you measure your calculation times, niek?).
but you see, my own solution is still much worse than the ugly code.

@charles: this information about whether VBE is open or F9 is used is
really interesting. I will try a Application.Calculate button.
I did transfer the SearchRange as Range object, but not the argument -
maybe there is also a slowdown there. Need to find time to test it.
The same for Application.WorksheetFunction.***

@niek:
your solution looks quite nice. I hopefully will test it soon.
Since you say that returning #NA needs a lot of time, maybe an error
handler can help here. I think I read some time ago that transfering
errors in VBA is time consuming. Maybe worth a try.
defining the searchargument as range might be a key element to speed
up my own code which extracts the value of the cell right now.
Also your XLM solution I need to try first, but I have to admit that I
never came in contact with XLM before (at least I don't remember such
an occasion). Might be a bit harder for me to do more than copy&paste
here :)


Another idea of mine is to create a macro which replaces all commands
with name "fastvlookup" (or any other custom name) and some parameters
with the already well known double-match-index-monster. fastvlookup
needs not to be known to excel for this to work, of course. But first
I will try nieks solution :)



PS: It would be a great thing if excel has some kind of "issorted"
flag for columns/rows, which has positions "true/false/don't know".
every time you ask for this flag it is recalculated and if the column/
row is changed it is resetted to "don't know". But that's only wishful
thinking....
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top