searching a large database with a long list of search terms

  • Thread starter Thread starter joe_d_builder
  • Start date Start date
Comments/suggestions welcome!

My compliments on your effort, Biff. Amazing.
Sorting the data can speed up calc time significantly in "large" files.
The use of 1000's of volatile functions should be avoided at all costs!
Monster formulas aren't all bad!

I was unable to copy/drag in a single operation.

FWIW, my m/c was able to fill 6 C x 3000 R, over here.
(no freezing)

Btw, think there was a slight interp difference in that I presumed what the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.
 
I probably would've gone with what I did in my test 2.

... the super complex array <g> ?
filled down only in 1 col x 3000 rows
(and which also contains the same volatile INDIRECT ?)

2 observations:

1. The returns are different, ref explanation in my response to you earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets. Nothing wrong there, just
different interps.
Btw, think there was a slight interp difference in that I presumed what the
OP would like to have was to match & extract separate returns from each of
the 6 sheets for all the 3000 lines. Then he could assess further what he
wants done.
I didn't assume any "precedence" order in the 6 data sheets.
As you can see, the calc times were pretty much the same ..

2. Wondering whether the calc times would be any different if say, your
array formula were to be modified to return similarly as mine the results in
6 cols ?

Just some thoughts, Biff <g>
 
Better to have 6 similar formulas pointing to each of the 6 sheets

What's meant is simply ..

In D1:
=IF($A1="","",IF(ISNA(MATCH($A1,'1'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'1'!$A:$B,2,0)))

In E1:
=IF($A1="","",IF(ISNA(MATCH($A1,'2'!$A:$A,0)),"NO
ENTRY",VLOOKUP($A1,'2'!$A:$B,2,0)))

and so on in F1 till I1 (the same formula essentially but with the
sheetnames changed accordingly to '3', '4', '5', and '6')

D1:I1 is copied down 3000 rows
 
In self-retrospect, I should have suggested the foregoing simpler formulas
(w/o the INDIRECT). It only takes less than a minute to manually edit the
other 5 sheetnames ! urgh ..
 
Biff,

my congratulations also on a mammoth effort!

My interpretation of the OP's original request was that he wanted one
result - I had imagined that he had one massive lookup table (of 393000
rows) which had to be split into 6 because of Excel's row limit of 64k.
If this were the case, I'm not sure if he (or you or Max) ensured that
there were no duplicates between the sheets.

Very interesting - well done!

Pete
 
1. The returns are different, ref explanation in my response to you earlier
(pasted below), because of different interps of what the OP wanted. My
suggestion simply lines up all the returns from the 6 sheets for each of
the
3000 lines in 6 cols. Yours return results in 1 col, with an implicit
precedence order assumed in the 6 data sheets.

I'm not following you on this????

My interp is that on the summary are lookup values that may or may not be on
any one of 6 sheets. Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????
(and which also contains the same volatile INDIRECT ?)

See my "conclusion" !

The problem with this is convincing people (maybe even ones' self) that the
use of a monster nested IF/VLOOKUP is the best way to go!

Biff
 
Hi Biff

Thank you for this mammoth effort, and for sharing the results with us.
They make fascinating reading.
Whilst for some while now I have tried to avoid Indirect functions when
there are lots of formulae and/or data involved, I always used to use
Vlookup, but more recently I have made much more use of INDEX(),
MATCH().

I wondered whether, with this mass of data, there would be any
significant difference in calculation time if one used the format
=INDEX(Sheet2!A:B,MATCH(A2,Sheet1!A:A,0),2)
in place of
=VLOOKUP(A2,Sheet1!A:B,2,0)
throughout the formulae.
The formulae would be longer, and look more horrendous, but I wonder
whether there would be any speed difference.

If you had the time (and inclination) to carry out this test with the
data you already have set up, I would be most interested to see the
results.
 
My interp is that on the summary are lookup values
that may or may not be on any one of 6 sheets.
Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????

My presumption was that there could be multiple "city" returns for the same
lookup value in col A within the 6 sheets. And that the OP might want to
see all of it before deciding next steps.

An illustration ..
For eg: for "aaa", there could be the data
for "aaa" in sheets: 1,2,3,4 such as:

aaa new york (in sheet: 1)
aaa chicago (in sheet: 2)
aaa miami (in sheet: 3)
aaa houston (in sheet: 4)
[ No "aaa" in sheets 5 & 6 (say) ]

Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
formula would return only the "1st" matching value, depending on how the
nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order: Check
sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in your
array, depending on the order that the sheets are listed in WSList. If I
list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
Change the top to: 2, it'll return: chicago. "3" will return: miami. And so
on.
(Btw, it was a nice array, Biff.)

Hope the above clarifies a little better what I meant by "precedence".
Just slightly different interps on the OP's needs, nothing wrong either way.
 
Hope the above clarifies a little better what I meant by "precedence".

OK, now I see.

Biff

Max said:
My interp is that on the summary are lookup values
that may or may not be on any one of 6 sheets.
Basically, it's just a lookup across multiple sheets. I
don't understand your use of "precedence" ???????

My presumption was that there could be multiple "city" returns for the
same
lookup value in col A within the 6 sheets. And that the OP might want to
see all of it before deciding next steps.

An illustration ..
For eg: for "aaa", there could be the data
for "aaa" in sheets: 1,2,3,4 such as:

aaa new york (in sheet: 1)
aaa chicago (in sheet: 2)
aaa miami (in sheet: 3)
aaa houston (in sheet: 4)
[ No "aaa" in sheets 5 & 6 (say) ]

Using nested IF(ISNA(VLOOKUP1),IF(ISNA(VLOOKUP2), ... ), or, your array
formula would return only the "1st" matching value, depending on how the
nested "IF(ISNA(VLOOKUP.." is structured (i.e. the "precedence" order:
Check
sheet: 1 first, then check sheet; 2, then sheet: 3, and so on). Or, in
your
array, depending on the order that the sheets are listed in WSList. If I
list: 1 as the 1st sheet (at the top in WSList), it returns: new york.
Change the top to: 2, it'll return: chicago. "3" will return: miami. And
so
on.
(Btw, it was a nice array, Biff.)

Hope the above clarifies a little better what I meant by "precedence".
Just slightly different interps on the OP's needs, nothing wrong either
way.
 
... I'm not sure if he (or you or Max) ensured that
there were no duplicates between the sheets.

Pete,

I followed the OP's response (below)
to David (the first responder) as the "state-of-events":

David asked:
.. The other 6 sheets, where the lookup is taking place, there are not
duplicate lookup values, ie aaa, bbb, ccc would only have a single entry
somewhere on the six other sheets and only one city associated with each
lookup value?

OP's reply to David:
Yes, these are unique.

Also, as clarified in my responses since:
 
Roger,

I still have my test files. I'll tinker around with this sometime Sunday
night.

The thing that surprised me the most was the file size. I knew it would be
big but did not think it would be 27 - 28 Mb big!

Biff
 
Hmmm........

This is turning into a "major" undertaking!

Although the calc times I reported in my other post are correct and
"accurate", I failed to disclose that the calc times do not reflect the fact
that no changes were made to any of the lookup table data. This means that
the lookup formulas did not have to recalculate (excluding those with
volatile functions) with any subsequent workbook calculations. I just
assumed that at this point the formulas and tables would remain "static" and
the calc times I reported were for any new data or formulas entered that
were not associated with the lookups!

That was a major gaff on my part!

So, that ~1 second calc time for the monster IF/VLOOKUP only pertains to
calc time for new data/formula entry.

To thoroughly test these operations/concepts in this context takes a lot of
time and patience. I think that it has already been demonstrated that under
these parameters Excel is going to be "very" slow. For a non-static,
actively used spreadsheet to take upwards of 3 to 4 minutes (at the minimum
end) to calculate, is to me, unacceptable.

I guess this would be "easier" to do if one had benchmarking software!

And with those observations I think I'll end my "tests".

Biff
 
Hi Biff

Thank you for your efforts and observations.
I must admit I was very surprised at the ~ 1 second time, especially for
such a large file size. Thankfully I have never had to deal with
anything of that size but one client's data file does grow to around 9
MB by the end of each year, and that is where I am always looking to see
if I can make changes to improve the existing performance, which does at
times border on the limits of patience!!!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top