How to find the most common numbers?



I have 300 number groups and each group has 22 different numbers (from 1
to 80).

I want to find 5 (or more) numbers that occur together the most.
Can I find these numbers by excel, macro, VBA or any program?

for example, let us suppose that 4-15-23-36-45 are the most common
numbers and these numbers are in 8 groups from 300.

Is it possible , I added my file to explain my question in detail,

|Filename: find common |

Bernie Deitrick

It is possible, but you would need to check 80*79*78*77*76 different combinations of 5 numbers
against 300 groups, for a total of 4.3 TRILLION comparisons. It would be very easy to program, but
take a verrrrrrry long to execute - well, unless you have access to a really fast computer. Cutting
down the number of groups, the combination length, or the highest allowable number would decrease
the calculation time, to the point were you might be able to do it.

MS Excel MVP

Bernie Deitrick

Actually, upon reflection, you would need to only check 22*21*20*19*18
*1500/5!, or 39 Million calculations, more doable... but still a long


Joe User

Bernie Deitrick said:
you would need to only check 22*21*20*19*18 *1500/5!, or 39 Million

I compute a whole lot more. But counting has never been my strong suit; so
I would appreciate constructive comments about my analysis.

First, I think you are considering only combinations of 5. The OP said "5
or more".

But even for 5 combinations, I compute about 31.103E+12 (trillion US)
operations worst case. Here is I count that:

There are 26,334 ways to choose 5 of 22 from one group. So:

26334*300 operations to generate combinations from each group
26334*26334 comparisons to add combinations from 2nd group
2*26334*26334 comparisons to add combinations from 3rd group
299*26334*26334 comparisons to add combinations from 300th group

That can be expressed as:


The SUMPRODUCT expression can be replaced by 300*299/2.

Quibble: If you don't want to count the operations to generate combinations
in general, that's okay with me. It's an insignificant delta. But as a
nitpick, I would include at least the cost to generate the first group,
which must be added to the overall list of combinations; nonetheless, an
even less significant delta.

For "5 or more", I believe the formula would be [1]:

+ SUMPRODUCT(COMBIN(22,ROW($5:$22))^2)*300*299/2

That is about 94.366E+15 (quadrillion US).

This assumes an efficient algorithm that realizes that since the 26,334
combinations in each group are unique (per problem specification), we only
need to compare with combinations from all previous groups.

It also assumes the optimization that we only compare combinations of N with
other combinations of N.

Finally, I reiterate that this assumes a "worst case" scenario where all
combinations of N are unique. This is feasible in all cases of combinations
of 5 to 22 numbers out of 80. For example, at the lowest end, there are
COMBIN(80,5) = 24,040,016 5-tuple combinations -- 3 times the number of
combinations in 300 groups of 22 numbers.

I don't even what to think about the "typical" (aka "expected") scenario.
It hurts my head :).


[1] Replace ROW($5:$22) with ROW(INDIRECT("$5:$22")) to avoid range changes
when inserting rows above.

----- original message -----

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question