Adjusting alogarithm

D

Daka

I am in need of help to adjust the code posted below from an Excel
file. The code generate every possible combination from the value
supplied in the input boxes. Now, I do not want every combination. For
example if I want to generate the combination between 1 and 24 numbers
in subsets of 8., the first few rows of the output should look like
this:
12,3,4,5,6,7,8
1,2,3,4,9,10,11,12
1,2,3,4,13.1.4.15.16
1,2,3,4,17,18,19,20
1,2,3,4,21,22,23,24
1,2,3,5,9.13.17.21
1,2,3,5,10,14,18,22
This works on the concept that the values in each subset must not be
repeated more than four times when matched against the preceeding
subsets. This would be more easiky understood with a copy of the excel
file.
Here is the code:
Dim NFavorites As Byte 'Number of Favoritess
Dim NElements As Byte 'Number of elements in one subset
Dim maxLen As Variant
Dim SubsetCount As Variant
Dim Elements() As Integer
Dim outPut() As Integer
Dim subset As Variant
Dim NumRng As Range
Dim chkNum As Byte
Dim Favorites() As Integer
Dim rowNum As Long
Dim rngNum As Range

Sub SubSets()
Set NumRng = Sheets("The Numbers").Range("A1:A180")
Set rngNum = Sheets("Tabelle").Range("F7")
chkNum = Application.WorksheetFunction.CountA(NumRng)
On Error GoTo Terminate

NFavorites = InputBox("Please give the number of favorites",
"Selective Records", chkNum)

NElements = InputBox("Please give the number of elements of one
subset", "Selective Records", 8)
maxLen = Application.WorksheetFunction.Combin(NFavorites, NElements)
rowNum = 9
Application.StatusBar = ""
Range("A7") = maxLen
Application.EnableEvents = True
'Const Num = 1500000
ReDim Elements(1 To NElements) As Integer
ReDim Favorites(1 To NFavorites) As Integer
ReDim outPut(1, 1 To NElements) As Integer

'Fill favorites from values on worksheet
For N = 1 To NFavorites
Favorites(N) = NumRng(N)
Next N
For E = 1 To NElements
Elements(E) = E
Next E
Elements(NElements) = Elements(NElements) - 1
subset = 1
SubsetCount = subset
N = 0
mark:
Elements(NElements - N) = Elements(NElements - N) + 1
For m = NElements - N + 1 To NElements
Elements(m) = Elements(m - 1) + 1
Next m
If Elements(NElements - N) = NFavorites - N + 1 Then
If N = NElements - 1 Then
endstring = Chr(13) & Chr(13) & "The
calculation is finished."
Exit Sub
End If
N = N + 1
GoTo mark
End If
For E = 1 To NElements
outPut(subset, E) = Favorites(Elements(E))
Next E
N = 0

'Place subset on worksheet
Range(Cells(rowNum, 1),
Cells(rowNum, NElements)) = outPut()
rowNum = rowNum + 1

Range("A8").Value = rowNum - 9

cv = 0
NextMove:
maxLen = maxLen - 1
SubsetCount = SubsetCount + 1
Application.StatusBar =
Format(maxLen, "#,##0") & " Complete : " & Format(SubsetCount /
Range("A7"), "0.0000%") & " ," & outPut(1, 1) & "," & outPut(1, 2) &
"," & outPut(1, 3) & " ," & outPut(1, 4) & "," & outPut(1, 5)
r = 0
If maxLen = 0 Then
Application.EnableEvents =
True
Application.ScreenUpdating
= True
Application.Calculation =
xlCalculationAutomatic
ThisWorkbook.Save
Exit Sub
End If
cv = 0
GoTo mark
Terminate:
Exit Sub
End Sub
 
J

joeu2004

Daka said:
This would be more easiky understood with a copy
of the excel file.

You can upload the Excel file to a file-sharing website and post the URL
(link; http://...) in a response here. Be sure the uploaded file is marked
shared or sharable. The following are some free file-sharing websites.

Windows Live Skydrive: http://skydrive.live.com
MediaFire: http://www.mediafire.com
FileFactory: http://www.filefactory.com
FileSavr: http://www.filesavr.com
FileDropper: http://www.filedropper.com
RapidShare: http://www.rapidshare.com
Box.Net: http://www.box.net/files
I am in need of help to adjust the code posted
below from an Excel file. The code generate every
possible combination from the value supplied in the
input boxes. Now, I do not want every combination.
For example if I want to generate the combination
between 1 and 24 numbers in subsets of 8

This description is unclear to me, even with the example. Do you simply
want all the combinations of 8 from a set of 24 numbers, for example?
For example if I want to generate the combination
between 1 and 24 numbers in subsets of 8., the
first few rows of the output should look like this:
12,3,4,5,6,7,8
1,2,3,4,9,10,11,12
1,2,3,4,13.1.4.15.16
1,2,3,4,17,18,19,20
1,2,3,4,21,22,23,24
1,2,3,5,9.13.17.21
1,2,3,5,10,14,18,22

Does it really need to be in that order? The more natural order is:

1,2,3,4,5,6,7,8
1,2,3,4,5,6,7,9
.....
1,2,3,4,5,6,7,24
1,2,3,4,5,6,8,9
1,2,3,4,5,6,8,10
.....
1,2,3,4,5,6,8,24

This works on the concept that the values in each
subset must not be repeated more than four times
when matched against the preceeding subsets.

Huh?! In your example, 1, 2, and 3 are all "repeated more than four times"
in preceding subsets. Are you just trying to suggest an algorithm based on
ignorance of how to generate all combinations of 8 from a set of 24 numbers
(for example)?

The posted code is junk, an obvious hack of something. It did nothing
useful when I tried it.

-----

The following macro outputs all combinations of K out of N values from
Input!A:A starting in A1 as you did, writing the combinations into
Output!F:F starting in F7 as you did. I kept the variable names similar to
yours.

The macro intended to be working starting point. It might even be exactly
what you want. If not and if you cannot can make the necessary
modifications, let me know what you need (with examples), and perhaps I can
make the changes.

I suggest that you start by executing the macro as-is. Use a list of 7
numbers or strings in Input!A:A (nFavorites), and enter 3 or 4 for the size
of the subset (nElements).

That keeps the output and runtime manageable. Note that I artificially slow
down the statusbar update so that you can see that in operation. For longer
runs, set #Const slowStatus to False. Eventually, you can remove the code
between #If slowStatus and #EndIf.

***Caveat*** COMBIN(N,K) grows quite large very quickly. For example,
COMBIN(24,8) is 735,471. That exceeds the limits of XL2003. COMBIN(180,90)
is about 9E+52, which exceeds the limits of anything. So this approach is
not practical for most sets of "favorites".

-----

Option Explicit

#Const slowStatus = True

Sub combinKofN()
Dim favRng As Range, outRng As Range
Dim chkNum As Long, nFavorites As Long
Dim nElements As Long, maxLen As Long
Dim ofMaxLen As String, s As String
Dim i As Long, j As Long, rowNum As Long
Dim prevPct As Long

On Error GoTo terminate
Application.StatusBar = ""

' column A of sheet "Input" must contain
' data starting in A1, with no interstitial
' empty cells.
' output goes into column F of sheet "Output"

With Sheets("input")
Set favRng = .Range("a1", .Range("a1").End(xlDown))
End With
chkNum = favRng.Count

Set outRng = Sheets("output").Range("f7")
outRng.Resize(1, chkNum + 1).EntireColumn.Clear

' allow user to see clear output if sheet
' "output" is active
Application.ScreenUpdating = False

' generate all combinations of nElements of nFavorites

nFavorites = _
InputBox("Enter number of favorites", "", chkNum)
If nFavorites <= 0 Or nFavorites > chkNum _
Then GoTo terminate

nElements = _
InputBox("Enter size of subset", "", nFavorites)
If nElements <= 0 Or nElements > nFavorites _
Then GoTo terminate

maxLen = WorksheetFunction.Combin(nFavorites, nElements)
If outRng.Row + maxLen - 1 > _
Range("a1").SpecialCells(xlLastCell).End(xlDown).Row _
Then GoTo terminate
ofMaxLen = " of " & maxLen & ": " ' for status

ReDim favorites(1 To nFavorites)
For i = 1 To nFavorites: favorites(i) = favRng(i): Next

ReDim outData(1 To 1, 1 To nElements)
ReDim elements(1 To nElements) As Long
For i = 1 To nElements: elements(i) = i: Next

i = 1: rowNum = 0: prevPct = 0
Do
For i = i To nElements
outData(1, i) = favorites(elements(i))
Next
rowNum = rowNum + 1
outRng.Cells(rowNum).Resize(1, nElements) = outData
If rowNum = maxLen Then GoTo terminate

' update Excel status bar by each integer percentage

If Int(rowNum / maxLen * 100) > prevPct Then
prevPct = Int(rowNum / maxLen * 100)
s = prevPct & "% complete, " & _
Format(rowNum, "#,##0") & _
ofMaxLen & outData(1, 1)
For j = 2 To nElements
s = s & "," & outData(1, j)
Next
Application.StatusBar = s
DoEvents
#If slowStatus Then
Dim x As Double
x = Timer
Do: DoEvents: Loop Until Timer - x >= 0.1
#End If
End If

' next combination

i = nElements: j = 0
While elements(i) = nFavorites - j
i = i - 1: j = j + 1
Wend
elements(i) = elements(i) + 1
For j = i + 1 To nElements
elements(j) = elements(j - 1) + 1
Next
Loop

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
 
J

joeu2004

Some minor comments....

joeu2004 said:
outRng.Resize(1, chkNum + 1).EntireColumn.Clear

The use of chkNum+1 instead of chkNum is an aesthetic choice. If the
previous run use a large chkNum, I preferred to have one column separation
between new and old data. Of course, it would be better to clear everything
down from and to the right of outRng. But I was not sure if that might wipe
out other important data.


I wrote
rowNum = rowNum + 1
outRng.Cells(rowNum).Resize(1, nElements) = outData

This can be made more efficient. For example (untested):

Set outRng = outRng.resize(1,nElements) ' above Do loop
[....]
outRng.Offset(rowNum,0) = outData ' inside Do loop
rowNum = rowNum + 1
 
D

Daka

You can upload the Excel file to a file-sharing website and post the URL
(link; http://...) in a response here.  Be sure the uploaded file is marked
shared or sharable.  The following are some free file-sharing websites.

Windows Live Skydrive:http://skydrive.live.com
MediaFire:http://www.mediafire.com
FileFactory:http://www.filefactory.com
FileSavr:http://www.filesavr.com
FileDropper:http://www.filedropper.com
RapidShare:http://www.rapidshare.com
Box.Net:http://www.box.net/files


This description is unclear to me, even with the example.  Do you simply
want all the combinations of 8 from a set of 24 numbers, for example?


Does it really need to be in that order?  The more natural order is:

1,2,3,4,5,6,7,8
1,2,3,4,5,6,7,9
....
1,2,3,4,5,6,7,24
1,2,3,4,5,6,8,9
1,2,3,4,5,6,8,10
....
1,2,3,4,5,6,8,24


Huh?!  In your example, 1, 2, and 3 are all "repeated more than four times"
in preceding subsets.  Are you just trying to suggest an algorithm based on
ignorance of how to generate all combinations of 8 from a set of 24 numbers
(for example)?

The posted code is junk, an obvious hack of something.  It did nothing
useful when I tried it.

-----

The following macro outputs all combinations of K out of N values from
Input!A:A starting in A1 as you did, writing the combinations into
Output!F:F starting in F7 as you did.  I kept the variable names similar to
yours.

The macro intended to be working starting point.  It might even be exactly
what you want.  If not and if you cannot can make the necessary
modifications, let me know what you need (with examples), and perhaps I can
make the changes.

I suggest that you start by executing the macro as-is.  Use a list of 7
numbers or strings in Input!A:A (nFavorites), and enter 3 or 4 for the size
of the subset (nElements).

That keeps the output and runtime manageable.  Note that I artificiallyslow
down the statusbar update so that you can see that in operation.  For longer
runs, set #Const slowStatus to False.  Eventually, you can remove the code
between #If slowStatus and #EndIf.

***Caveat***  COMBIN(N,K) grows quite large very quickly.  For example,
COMBIN(24,8) is 735,471.  That exceeds the limits of XL2003.  COMBIN(180,90)
is about 9E+52, which exceeds the limits of anything.  So this approachis
not practical for most sets of "favorites".

-----

Option Explicit

#Const slowStatus = True

Sub combinKofN()
Dim favRng As Range, outRng As Range
Dim chkNum As Long, nFavorites As Long
Dim nElements As Long, maxLen As Long
Dim ofMaxLen As String, s As String
Dim i As Long, j As Long, rowNum As Long
Dim prevPct As Long

On Error GoTo terminate
Application.StatusBar = ""

' column A of sheet "Input" must contain
' data starting in A1, with no interstitial
' empty cells.
' output goes into column F of sheet "Output"

With Sheets("input")
   Set favRng = .Range("a1", .Range("a1").End(xlDown))
End With
chkNum = favRng.Count

Set outRng = Sheets("output").Range("f7")
outRng.Resize(1, chkNum + 1).EntireColumn.Clear

' allow user to see clear output if sheet
' "output" is active
Application.ScreenUpdating = False

' generate all combinations of nElements of nFavorites

nFavorites = _
    InputBox("Enter number of favorites", "", chkNum)
If nFavorites <= 0 Or nFavorites > chkNum _
    Then GoTo terminate

nElements = _
    InputBox("Enter size of subset", "", nFavorites)
If nElements <= 0 Or nElements > nFavorites _
    Then GoTo terminate

maxLen = WorksheetFunction.Combin(nFavorites, nElements)
If outRng.Row + maxLen - 1 > _
    Range("a1").SpecialCells(xlLastCell).End(xlDown).Row _
    Then GoTo terminate
ofMaxLen = " of " & maxLen & ": "  ' for status

ReDim favorites(1 To nFavorites)
For i = 1 To nFavorites: favorites(i) = favRng(i): Next

ReDim outData(1 To 1, 1 To nElements)
ReDim elements(1 To nElements) As Long
For i = 1 To nElements: elements(i) = i: Next

i = 1: rowNum = 0: prevPct = 0
Do
    For i = i To nElements
        outData(1, i) = favorites(elements(i))
    Next
    rowNum = rowNum + 1
    outRng.Cells(rowNum).Resize(1, nElements) = outData
    If rowNum = maxLen Then GoTo terminate

    ' update Excel status bar by each integer percentage

    If Int(rowNum / maxLen * 100) > prevPct Then
        prevPct = Int(rowNum / maxLen * 100)
        s = prevPct & "% complete, " & _
            Format(rowNum, "#,##0") & _
            ofMaxLen & outData(1, 1)
        For j = 2 To nElements
            s = s & "," & outData(1, j)
        Next
        Application.StatusBar = s
        DoEvents
        #If slowStatus Then
            Dim x As Double
            x = Timer
            Do: DoEvents: Loop Until Timer - x >= 0.1
        #End If
    End If

    ' next combination

    i = nElements: j = 0
    While elements(i) = nFavorites - j
        i = i - 1: j = j + 1
    Wend
    elements(i) = elements(i) + 1
    For j = i + 1 To nElements
        elements(j) = elements(j - 1) + 1
    Next
Loop

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub

The code you provided worked very well. However you misunderstood what
I am trying to do. The code produces every possible combination. But i
do not want every combination.

Lets work with 24 numbers in sets of 8. We know every possible
combination would be 735,471.

I was able to achieve what I am trying to do by ajusting the code to
loop thru the subsets on the outputsheet but this was extremely slow
approach.
The first line would be:
1,2,3,4,5,6,7,8

For the second line or subset the code would compare the second subset
with whats already on the output sheet. 1,2,3,4,5, 6, 7, 9 would be
inelligible because becuse 7 of the nubers would be repeated, we only
want 4. So the next elligible subset would be 1,2,3,4,9,10,11,12 the
next would be 1,2,3,4,13,14,15,16.
As I stated before I was able to achieve this by looping thru the code
on the output sheet but it was extremely slow. It produced
approximately 759 subset.
Having the alogarithm adjusted to do this would be far more efficient
if it is possible to have it done this way.
Thanks
Derick
 
D

Daka

Some minor comments....

joeu2004 said:
outRng.Resize(1, chkNum + 1).EntireColumn.Clear

The use of chkNum+1 instead of chkNum is an aesthetic choice.  If the
previous run use a large chkNum, I preferred to have one column separation
between new and old data.  Of course, it would be better to clear everything
down from and to the right of outRng.  But I was not sure if that mightwipe
out other important data.

I wrote
rowNum = rowNum + 1
outRng.Cells(rowNum).Resize(1, nElements) = outData

This can be made more efficient.  For example (untested):

Set outRng = outRng.resize(1,nElements)  ' above Do loop
[....]
outRng.Offset(rowNum,0) = outData      ' inside Do loop
rowNum = rowNum + 1

This is the link to the file I have and modified to do what I wanted.
https://skydrive.live.com/redir.aspx?cid=25ebc7bdffe769ab&resid=25EBC7BDFFE769AB!161
 
M

Martin Brown

The code you provided worked very well. However you misunderstood what
I am trying to do. The code produces every possible combination. But i
do not want every combination.

Lets work with 24 numbers in sets of 8. We know every possible
combination would be 735,471.

I was able to achieve what I am trying to do by ajusting the code to
loop thru the subsets on the outputsheet but this was extremely slow
approach.
The first line would be:
1,2,3,4,5,6,7,8

For the second line or subset the code would compare the second subset
with whats already on the output sheet. 1,2,3,4,5, 6, 7, 9 would be
inelligible because becuse 7 of the nubers would be repeated, we only
want 4. So the next elligible subset would be 1,2,3,4,9,10,11,12 the
next would be 1,2,3,4,13,14,15,16.

I can't see why on earth you would want to do this, but once you have a
root with the first four values determined the only possible solutions
are of the form:

1,2,3,4,5,6,7,8
1,2,3,4,5+4n,6+4n,7+4n, 8+4n

Then moving to 5 on digit 4
1,2,3,5,6+4n+p,9+4n+p, 12+4n+p, 15+4n+p

*BUT* some of these will now have more than four values in common with
the earlier patterns. The closed forms will allow you to short circuit
generating the patterns that cannot possibly match your constraint.
As I stated before I was able to achieve this by looping thru the code
on the output sheet but it was extremely slow. It produced
approximately 759 subset.
Having the alogarithm adjusted to do this would be far more efficient
if it is possible to have it done this way.
Thanks
Derick

It might help if you explained why you want to do this. It seems to me
like there is no real merit at all in this weird subset. Unless that is
you are trying to beat some very badly designed national lottery. See
for example the famous Irish Lottery syndicate which hammered them on a
double rollover by buying up almost all possible permutations.

http://en.wikipedia.org/wiki/Lottery_Wheeling

Regards,
Martin Brown
 
J

joeu2004

Daka said:
Lets work with 24 numbers in sets of 8. We know every
possible combination would be 735,471. [....]
The first line would be: 1,2,3,4,5,6,7,8
For the second line or subset the code would compare
the second subset with whats already on the output
sheet. 1,2,3,4,5, 6, 7, 9 would be inelligible because
7 of the nubers would be repeated, we only want 4.
So the next elligible subset would be 1,2,3,4,9,10,11,12

Okay, I believe I understand your requirement now. For example, you want to
select combinations of 8 from a set of 24, but only the subset of
combinations with 4 or fewer matches.

(Note that I am using "subset" differently than you do. You use "subset"
where I would use the term "combination".)
I was able to achieve this by looping thru the code
on the output sheet but it was extremely slow. It
produced approximately 759 subset.

Actually, 759 is exactly the correct number for 24 choose 8 with 4 or fewer
matches.

The ideal algorithm would generate the 759 combinations directly, without
having to generate all 735,471 combinations. But I am having trouble
"counting" (i.e. computing the count of) the number of the subset of
combinations that fit the requirement. And for me, knowing how to "count"
is key to understanding the simplest algorithm.

The macro below is a brute force approach. It consumes a significant amount
of memory. But it should run significantly faster than processing all of
the combinations stored in a worksheet, if that is what you mean by "looping
thru the code on the output sheet". For example, on my computer, the
algorithm takes about 77 sec to produce the 759 combinations that meet the
requirements of your example.

***CAVEAT*** That is based on the assumption that the set of 24 data are
type Long. If the data might be something else, set the #Const is
dataIsLong to False. But in that case, certain variables are type Variant,
and the algorithm run significantly longer; for example, about 120 sec to
produce the 759 combinations.

Is that the sort of algorithm you were hoping for?

PS: I was unable to open or download the Excel file that you uploaded to
Skydrive. That is, I refused to open it. I got a download block message
indicating that the file had some "suspicious" requirements; a DLL.
Probably innocuous; but I did not want to take any chances. I wonder if the
file is simply incompatible with my XL2003/WinXP system.

------

Option Explicit

' set to False if Input!A:A is not integer
#Const dataIsLong = True

Sub combinKofN()
Dim nData As Long, nSelect As Long
Dim maxCombin As Long, nCombin As Long
Dim maxSubset As Long, nSubset As Long
Dim maxMatch As Long, nMatch As Long
Dim i As Long, j As Long, k As Long
Dim chkNum As Long
Dim inRng As Range, outRng As Range
Dim st0 As Double, st As Double
#If dataIsLong Then
Dim x As Long
#Else
Dim x
#End If

Application.StatusBar = ""
On Error GoTo terminate

With Sheets("input")
Set inRng = _
.Range("a1", .Range("a1").End(xlDown))
End With
chkNum = inRng.Count

Set outRng = Sheets("output").Range("f7")

nData = InputBox("Enter size of data set", _
"", chkNum)
If nData <= 0 Or nData > chkNum _
Then GoTo terminate

nSelect = _
InputBox("Enter size of combination", _
"", nData)
If nSelect <= 0 Or nSelect > nData _
Then GoTo terminate

maxMatch = _
InputBox("Enter max number of matches", _
"", nSelect)
If maxMatch <= 0 Or maxMatch > nSelect _
Then GoTo terminate

st0 = Timer

maxCombin = WorksheetFunction.Combin(nData, nSelect)
maxSubset = _
Range("a1").SpecialCells(xlLastCell).End(xlDown).Row _
- outRng.Row + 1
If maxSubset > maxCombin Then maxSubset = maxCombin

' clear one more column in case nSelect for previous
' run was larger
outRng.Resize(maxSubset, nSelect + 2).Clear

#If dataIsLong Then
ReDim allCombin(1 To maxCombin, 1 To nSelect) As Long
ReDim mySubset(1 To maxSubset, 1 To nSelect) As Long
ReDim myData(1 To nData) As Long
#Else
ReDim allCombin(1 To maxCombin, 1 To nSelect)
ReDim mySubset(1 To maxSubset, 1 To nSelect)
ReDim myData(1 To nData)
#End If
For i = 1 To nData: myData(i) = inRng(i): Next

ReDim idx(1 To nSelect) As Long
For i = 1 To nSelect: idx(i) = i: Next

nCombin = 0: nSubset = 0: nMatch = 0: st = 0
Do
' generate next combination

nCombin = nCombin + 1
For i = 1 To nSelect
allCombin(nCombin, i) = myData(idx(i))
Next

' be sure it matches maxMatch or less.
' if so, add to mySubset

For i = 1 To nSubset
nMatch = 0
For j = 1 To nSelect
x = allCombin(nCombin, j)
For k = 1 To nSelect
If x = mySubset(i, k) _
Then nMatch = nMatch + 1: Exit For
Next
Next
If nMatch > maxMatch Then Exit For
Next
If nMatch <= maxMatch Then
nSubset = nSubset + 1
For j = 1 To nSelect
mySubset(nSubset, j) = allCombin(nCombin, j)
Next
End If

' update status every 1 sec

If Timer - st >= 1 Then
st = Timer
Application.StatusBar = _
Round(nCombin / maxCombin * 100) & _
"%, " & Round(st - st0) & _
" sec, " & nCombin & " of " & _
maxCombin & ", " & nSubset
DoEvents
End If
If nSubset = maxSubset Then GoTo showResults
If nCombin = maxCombin Then GoTo showResults

' next combination index

i = nSelect: j = 0
While idx(i) = nData - j
i = i - 1: j = j + 1
Wend
idx(i) = idx(i) + 1
For j = i + 1 To nSelect
idx(j) = idx(j - 1) + 1
Next
Loop

showResults:

Application.ScreenUpdating = False
With outRng
.Cells(1, 1) = nData
.Cells(2, 1) = nSelect
.Cells(3, 1) = maxMatch
.Cells(4, 1) = nCombin
.Cells(5, 1) = nSubset
.Cells(1, 2).Resize(nSubset, nSelect) = mySubset
End With

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
outRng.Cells(6, 1) = Format(Timer - st0, "0.000")
End Sub
 
J

joeu2004

Errata....

joeu2004 said:
.Cells(1, 2).Resize(nSubset, nSelect) = mySubset
End With

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
outRng.Cells(6, 1) = Format(Timer - st0, "0.000")

More reliable....


.Cells(1, 2).Resize(nSubset, nSelect) = mySubset
.Cells(6, 1) = Format(Timer - st0, "0.000")
End With

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
 
J

joeu2004

Errata....
The macro below is a brute force approach.
It consumes a significant amount of memory.

Oops. What I posted had some vestiges of a design that permitted me to see
all of the combinations. It is not necessary to retain all combinations in
memory. That results in a __huge__ savings in memory.

The following implementation also includes some other improvements.

1. With limSubset, I put a cap on the memory for the number of qualified
combinations. This constant is currently set to 10,000. That is probably
more than sufficient for any reasonable maxCombin.

2. With promptUser, I make it optional to use the Input worksheet and prompt
the user for nData, nSelect and maxMatch. Alternatively, these values are
read from cells(1,1) of outRng, which works well if the data set is always
the integers from 1 to nData.

-----

Option Explicit

' set to False if inRng is not integer
#Const dataIsLong = True

' set to False if data is always 1 to N and
' nData, nSelect, nMatch come from outRng
#Const promptUser = False

Sub combinKofN()
Const limSubset As Long = 10000
Dim nData As Long, nSelect As Long
Dim maxCombin As Long, nCombin As Long
Dim maxSubset As Long, nSubset As Long
Dim maxMatch As Long, nMatch As Long
Dim i As Long, j As Long, k As Long
Dim inRng As Range, outRng As Range
Dim st0 As Double, st As Double
#If dataIsLong Then
Dim x As Long
#Else
Dim x
#End If

Application.StatusBar = ""
'On Error GoTo terminate

Set outRng = Sheets("output").Range("f7")

#If Not promptUser Then
With outRng
nData = .Cells(1, 1)
nSelect = .Cells(2, 1)
maxMatch = .Cells(3, 1)
End With
#Else
Dim chkNum As Long
With Sheets("input")
Set inRng = _
.Range("a1", .Range("a1").End(xlDown))
End With
chkNum = inRng.Count

nData = InputBox("Enter size of data set", _
"", chkNum)
If nData <= 0 Or nData > chkNum _
Then GoTo terminate

nSelect = _
InputBox("Enter size of combination", _
"", nData)
If nSelect <= 0 Or nSelect > nData _
Then GoTo terminate

maxMatch = _
InputBox("Enter max number of matches", _
"", nSelect)
If maxMatch <= 0 Or maxMatch > nSelect _
Then GoTo terminate
#End If

st0 = Timer

maxCombin = WorksheetFunction.Combin(nData, nSelect)
maxSubset = _
Range("a1").SpecialCells(xlLastCell).End(xlDown).Row _
- outRng.Row + 1
If maxSubset > maxCombin Then maxSubset = maxCombin
If maxSubset > limSubset Then maxSubset = limSubset

' clear one more column in case nSelect for previous
' run was larger. do not clear column 1
outRng.Offset(0, 1).Resize(maxSubset, nSelect + 1).Clear

#If dataIsLong Then
ReDim allcombin(1 To 1, 1 To nSelect) As Long
ReDim mySubset(1 To maxSubset, 1 To nSelect) As Long
ReDim myData(1 To nData) As Long
#Else
ReDim allcombin(1 To 1, 1 To nSelect)
ReDim mySubset(1 To maxSubset, 1 To nSelect)
ReDim myData(1 To nData)
#End If

#If Not promptUser Then
For i = 1 To nData: myData(i) = i: Next
#Else
For i = 1 To nData: myData(i) = inRng(i): Next
#End If

ReDim idx(1 To nSelect) As Long
For i = 1 To nSelect: idx(i) = i: Next

nCombin = 0: nSubset = 0: nMatch = 0: st = 0
i = 1
Do
' generate next combination

nCombin = nCombin + 1
For i = i To nSelect
allcombin(1, i) = myData(idx(i))
Next

' be sure it matches maxMatch or less.
' if so, add to mySubset

For i = 1 To nSubset
nMatch = 0
For j = 1 To nSelect
x = allcombin(1, j)
For k = 1 To nSelect
If x = mySubset(i, k) _
Then nMatch = nMatch + 1: Exit For
Next
Next
If nMatch > maxMatch Then Exit For
Next
If nMatch <= maxMatch Then
nSubset = nSubset + 1
For j = 1 To nSelect
mySubset(nSubset, j) = allcombin(1, j)
Next
If nSubset = maxSubset Then GoTo showResults
End If
If nCombin = maxCombin Then GoTo showResults

' update status every 1 sec

If Timer - st >= 1 Then
st = Timer
Application.StatusBar = _
Round(nCombin / maxCombin * 100) & _
"%, " & Round(st - st0) & _
" sec, " & nCombin & " of " & _
maxCombin & ", " & nSubset
DoEvents
End If

' next combination index

i = nSelect: j = 0
While idx(i) = nData - j
i = i - 1: j = j + 1
Wend
idx(i) = idx(i) + 1
For j = i + 1 To nSelect
idx(j) = idx(j - 1) + 1
Next
Loop

showResults:

Application.ScreenUpdating = False
With outRng
#If promptUser Then
.Cells(1, 1) = nData
.Cells(2, 1) = nSelect
.Cells(3, 1) = maxMatch
#End If
.Cells(4, 1) = nCombin
.Cells(5, 1) = nSubset
.Cells(1, 2).Resize(nSubset, nSelect) = mySubset
.Cells(6, 1) = Format(Timer - st0, "0.000")
End With

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub
 
D

Daka

Errata....


Oops.  What I posted had some vestiges of a design that permitted me tosee
all of the combinations.  It is not necessary to retain all combinations in
memory.  That results in a __huge__ savings in memory.

The following implementation also includes some other improvements.

1. With limSubset, I put a cap on the memory for the number of qualified
combinations.  This constant is currently set to 10,000.  That is probably
more than sufficient for any reasonable maxCombin.

2. With promptUser, I make it optional to use the Input worksheet and prompt
the user for nData, nSelect and maxMatch.  Alternatively, these values are
read from cells(1,1) of outRng, which works well if the data set is always
the integers from 1 to nData.

-----

Option Explicit

' set to False if inRng is not integer
#Const dataIsLong = True

' set to False if data is always 1 to N and
' nData, nSelect, nMatch come from outRng
#Const promptUser = False

Sub combinKofN()
Const limSubset As Long = 10000
Dim nData As Long, nSelect As Long
Dim maxCombin As Long, nCombin As Long
Dim maxSubset As Long, nSubset As Long
Dim maxMatch As Long, nMatch As Long
Dim i As Long, j As Long, k As Long
Dim inRng As Range, outRng As Range
Dim st0 As Double, st As Double
#If dataIsLong Then
   Dim x As Long
#Else
   Dim x
#End If

Application.StatusBar = ""
'On Error GoTo terminate

Set outRng = Sheets("output").Range("f7")

#If Not promptUser Then
    With outRng
        nData = .Cells(1, 1)
        nSelect = .Cells(2, 1)
        maxMatch = .Cells(3, 1)
    End With
#Else
    Dim chkNum As Long
    With Sheets("input")
        Set inRng = _
           .Range("a1", .Range("a1").End(xlDown))
    End With
    chkNum = inRng.Count

    nData = InputBox("Enter size of data set", _
        "", chkNum)
    If nData <= 0 Or nData > chkNum _
       Then GoTo terminate

    nSelect = _
        InputBox("Enter size of combination", _
        "", nData)
    If nSelect <= 0 Or nSelect > nData _
        Then GoTo terminate

    maxMatch = _
        InputBox("Enter max number of matches", _
        "", nSelect)
    If maxMatch <= 0 Or maxMatch > nSelect _
        Then GoTo terminate
#End If

st0 = Timer

maxCombin = WorksheetFunction.Combin(nData, nSelect)
maxSubset = _
    Range("a1").SpecialCells(xlLastCell).End(xlDown).Row _
    - outRng.Row + 1
If maxSubset > maxCombin Then maxSubset = maxCombin
If maxSubset > limSubset Then maxSubset = limSubset

' clear one more column in case nSelect for previous
' run was larger.  do not clear column 1
outRng.Offset(0, 1).Resize(maxSubset, nSelect + 1).Clear

#If dataIsLong Then
    ReDim allcombin(1 To 1, 1 To nSelect) As Long
    ReDim mySubset(1 To maxSubset, 1 To nSelect) As Long
    ReDim myData(1 To nData) As Long
#Else
    ReDim allcombin(1 To 1, 1 To nSelect)
    ReDim mySubset(1 To maxSubset, 1 To nSelect)
    ReDim myData(1 To nData)
#End If

#If Not promptUser Then
    For i = 1 To nData: myData(i) = i: Next
#Else
    For i = 1 To nData: myData(i) = inRng(i): Next
#End If

ReDim idx(1 To nSelect) As Long
For i = 1 To nSelect: idx(i) = i: Next

nCombin = 0: nSubset = 0: nMatch = 0: st = 0
i = 1
Do
    ' generate next combination

    nCombin = nCombin + 1
    For i = i To nSelect
        allcombin(1, i) = myData(idx(i))
    Next

    ' be sure it matches maxMatch or less.
    ' if so, add to mySubset

    For i = 1 To nSubset
        nMatch = 0
        For j = 1 To nSelect
            x = allcombin(1, j)
            For k = 1 To nSelect
                If x = mySubset(i, k) _
                    Then nMatch = nMatch + 1: Exit For
            Next
        Next
        If nMatch > maxMatch Then Exit For
    Next
    If nMatch <= maxMatch Then
        nSubset = nSubset + 1
        For j = 1 To nSelect
            mySubset(nSubset, j) = allcombin(1, j)
        Next
        If nSubset = maxSubset Then GoTo showResults
    End If
    If nCombin = maxCombin Then GoTo showResults

    ' update status every 1 sec

    If Timer - st >= 1 Then
        st = Timer
        Application.StatusBar = _
            Round(nCombin / maxCombin * 100) & _
            "%, " & Round(st - st0) & _
            " sec, " & nCombin & " of " & _
            maxCombin & ", " & nSubset
        DoEvents
    End If

    ' next combination index

    i = nSelect: j = 0
    While idx(i) = nData - j
        i = i - 1: j = j + 1
    Wend
    idx(i) = idx(i) + 1
    For j = i + 1 To nSelect
        idx(j) = idx(j - 1) + 1
    Next
Loop

showResults:

Application.ScreenUpdating = False
With outRng
    #If promptUser Then
        .Cells(1, 1) = nData
        .Cells(2, 1) = nSelect
        .Cells(3, 1) = maxMatch
    #End If
    .Cells(4, 1) = nCombin
    .Cells(5, 1) = nSubset
    .Cells(1, 2).Resize(nSubset, nSelect) = mySubset
    .Cells(6, 1) = Format(Timer - st0, "0.000")
End With

terminate:

Application.StatusBar = ""
Application.ScreenUpdating = True
End Sub

I posted a message thanking you for your help but i dont see it
showing up here. I however have had a chance to try the code doulbling
the values but I appears to be using up too much memory and gets
sluggish or even stops altogether. Can you try writitng the output to
the worksheet as it is processed. I think that way It will use less
memory.
Derick
 
J

joeu2004

Daka said:
I posted a message thanking you for your help
but i dont see it showing up here.

Of course, the most likely explanation is that you made a mistake during
submission. However, it seems to me that Google Groups has been very
unreliable lately. So I switched to using a free newsgroup server,
news.eternal-september.org. I am using Outlook Express as my newsreader.
I however have had a chance to try the code
doulbling the values but I appears to be using
up too much memory and gets sluggish or even
stops altogether. Can you try writitng the output
to the worksheet as it is processed. I think
that way It will use less memory.

Yes, that can be done easily. But the performance would be so terrible to
the point that the VBA subroutine would be useless for "large" numbers
anyway.

I don't know what you mean by "doubling the values". It would be helpful if
you would be specific.

Originally, you were using nData=24, nSelect=8 and maxMatch=4. If you
changed nData to 48, the algorithm would take about 513 times longer. On my
computer, that would take nearly 11 hours(!).

As I mentioned before, these kinds of algorithms (combinatorial) are
extremely limited by the parameters of COMBIN (n-choose-k). It is
ill-advised to it with "large" numbers. And you can see that "large" is not
very large at all.

In any case, I doubt that the problem is memory usage. The constant
limSubset limits the amount of memory to 10,000 combinations regardless of
nData and nSelect.

To confirm that memory usage is not the issue, try the following experiment.

1. Open the Excel file with the macro. Do not start the macro yet.

2. Start Task Manager by pressing ctrl+alt+Delete. In the Application tab,
click on Excel, then right-click and click on GoTo. That will highlight
Excel in the Process tab.

3. In the Process tab, be sure that you see columns labeled Mem Usage, Peak
Mem Usage and VM Size. If not, click View > Select Columns and select the
appropriate columns.

4. In VBA, put a breakpoint on the Do statement. Alternative, enter a Stop
statement just before the Do statement.

5. Run the VBA subroutine.

If it gets to the breakpoint or Stop statement, your problem is not memory
usage. All memory will have been allocated by that point.

If you have any further questions for me about memory usage, please give me
some specific information from Task Manager, namely:

* For Excel on the Process tab: Mem Usage, Peak Mem Usage and VM Size.

* On the Performance tab, Total and Available under Physical Memory.

Another potential problem that might create the illusion of sluggishness or
stopping: Excel does not always update the status bar reliably. I took
steps to make it more reliable -- using DoEvents and updating only once per
second. But the point is, perhaps the VBA subroutine is still chugging
away, and Excel is simply not displaying the progress in the status bar.
 
D

Daka

Of course, the most likely explanation is that you made a mistake during
submission.  However, it seems to me that Google Groups has been very
unreliable lately.  So I switched to using a free newsgroup server,
news.eternal-september.org.  I am using Outlook Express as my newsreader.


Yes, that can be done easily.  But the performance would be so terribleto
the point that the VBA subroutine would be useless for "large" numbers
anyway.

I don't know what you mean by "doubling the values".  It would be helpful if
you would be specific.

Originally, you were using nData=24, nSelect=8 and maxMatch=4.  If you
changed nData to 48, the algorithm would take about 513 times longer.  On my
computer, that would take nearly 11 hours(!).

As I mentioned before, these kinds of algorithms (combinatorial) are
extremely limited by the parameters of COMBIN (n-choose-k).  It is
ill-advised to it with "large" numbers.  And you can see that "large" is not
very large at all.

In any case, I doubt that the problem is memory usage.  The constant
limSubset limits the amount of memory to 10,000 combinations regardless of
nData and nSelect.

To confirm that memory usage is not the issue, try the following experiment.

1. Open the Excel file with the macro.  Do not start the macro yet.

2. Start Task Manager by pressing ctrl+alt+Delete.  In the Application tab,
click on Excel, then right-click and click on GoTo.  That will highlight
Excel in the Process tab.

3. In the Process tab, be sure that you see columns labeled Mem Usage, Peak
Mem Usage and VM Size.  If not, click View > Select Columns and select the
appropriate columns.

4. In VBA, put a breakpoint on the Do statement.  Alternative, enter a Stop
statement just before the Do statement.

5. Run the VBA subroutine.

If it gets to the breakpoint or Stop statement, your problem is not memory
usage.  All memory will have been allocated by that point.

If you have any further questions for me about memory usage, please give me
some specific information from Task Manager, namely:

* For Excel on the Process tab:  Mem Usage, Peak Mem Usage and VM Size.

* On the Performance tab, Total and Available under Physical Memory.

Another potential problem that might create the illusion of sluggishness or
stopping:  Excel does not always update the status bar reliably.  I took
steps to make it more reliable -- using DoEvents and updating only once per
second.  But the point is, perhaps the VBA subroutine is still chugging
away, and Excel is simply not displaying the progress in the status bar.

The VBA routine got to the break point without any problem. It appears
to me that as you said Excel may not be updating the status bar. The
system I am running it on has 3.35gig of Ram.

When I say doubling the values I am refering to for example we ran 24
in subsets/Combinations of 8 with a repeat value of four. So I want to
see the result if I use 48 in subsets/combinations of 16 with the
repeat value of 8 esentially doubling all the values.

The highest value I would probabaly want to experiment with is 80 in
subsets of 16 repeat value 8. Any hing else would be below this with
even smaller subsets.
 
J

joeu2004

Daka said:
I want to see the result if I use 48 in subsets/combinations
of 16 with the repeat value of 8 esentially doubling all
the values.
The highest value I would probabaly want to experiment
with is 80 in subsets of 16 repeat value 8.

The algorithm that you started with and I embellished would need to look at
COMBIN(48,16) combinations in the first case and COMBIN(80,16) in the second
case. That is 2.25E12 (trillions) and 27E15 (quadrillions) respectively.
On my computer, I estimate that would take more than 7 __years__ and 89497
__years__ respectively.

So you cannot use this algorithm for combinations of that magnitude.

I have made this point several times already. Am I finally getting the
point across?

As it happens, it appears that the size of the subset (e.g. of all
combinations that match 8 or fewer) might be computable with an algorithm
that needs to look at only about Sum(COMBIN(16,m),m=1,...,8)*80/16 in the
second case -- about 200,000 combinations.

That is not an accurate count; I am still trying figure out how to compute.
But it is probably the right order of magnitude. For example, for the
original example of 24-choose-8 combinations that match 4 or fewer,
Sum(COMBIN(8,m),m=1,...,4)*24/8 is 1296, which is close to the correct
answer of 1379.

However, I do not know if or when I will get around to developing such an
algorithm.

Good luck!
 
J

joeu2004

Errata....
As it happens, it appears that the size of the subset (e.g. of all
combinations that match 8 or fewer) might be computable with an algorithm
that needs to look at only about Sum(COMBIN(16,m),m=1,...,8)*80/16 in the
second case -- about 200,000 combinations.

That is not an accurate count; I am still trying figure out how to
compute. But it is probably the right order of magnitude. For example,
for the original example of 24-choose-8 combinations that match 4 or
fewer, Sum(COMBIN(8,m),m=1,...,4)*24/8 is 1296, which is close to the
correct answer of 1379.

I should have written (Sum(COMBIN(8,m),m=1,...,4)+1)*24/8, which is 1328.
Still not correct.
 
J

joeu2004

Errata #2....
I should have written (Sum(COMBIN(8,m),m=1,...,4)+1)*24/8

Arrgghh! I blew the syntax. That should be:

Sum(COMBIN(8,m)+1,m=1,...,4)*24/8

And again, that still is not correct.
 
J

joeu2004

Errata #3....
For example, for the original example of 24-choose-8
combinations that match 4 or fewer,
Sum(COMBIN(8,m),m=1,...,4)*24/8 is 1296, which is close
to the correct answer of 1379.

I should have written (Sum(COMBIN(8,m),m=1,...,4)+1)*24/8,
which is 1328. Still not correct.

Okay, let me try to get this right this time.

I am talking about a slightly different example which is faster to compute,
namely: 24-choose-8 combinations that match __5__ or fewer.

The formula for an __approximate__ count is
Sum(COMBIN(8,m)+1,m=1,...,5)*24/8.

That results in 1328, whereas the correct answer is 1379.

Sorry for the incessant postings. Too late after a long day :-(.
 
J

joeu2004

Errata #4....
Okay, let me try to get this right this time.

Oops! I guess I don't realize just how tired I am.

First, I am talking about a slightly different example which is faster to
compute,
namely: the subset of 24-choose-5 combinations that match __3__ or fewer.

Second, my formula does not work at all. That is,
Sum(COMBIN(5,m)+1,m=1,...,3)*INT(24/5) is not even close to 1379.

Klunk! Back to the drawing board....

Over and out!
 
D

Daka

Errata #4....



Oops!  I guess I don't realize just how tired I am.

First, I am talking about a slightly different example which is faster to
compute,
namely:  the subset of 24-choose-5 combinations that match __3__ or fewer.

Second, my formula does not work at all.  That is,
Sum(COMBIN(5,m)+1,m=1,...,3)*INT(24/5) is not even close to 1379.

Klunk!  Back to the drawing board....

Over and out!

This is the second time I have posted and it has not shown up here.
I wrote to let you know that there is no need to go back to the
drawing board. Your last full routine is 100% effective in producing
the desired result.

I think I found the problem why the status bar is not updating. The
following statement does not execute after a while:
If Timer - st >= 1 Then
st = Timer
Application.StatusBar = _
Round(nCombin / maxCombin * 100) & _
"%, " & Round(st - st0) & _
" sec, " & nCombin & " of " & _
maxCombin & ", " & nSubset
DoEvents
End If
This is because "Timer - st" eventaully produces a negative value so
the condition will always be false.

Is it possible to set the start time as a constant and subtract it
from the current time to produce the elapsed time?

Can it be displayed in the status bar as hh:mm:ss? No need for the
text 12:10:01 will be fine.

I have noticed that when the status bar is not being updated the
routine runs much faster; maybe ten time or more faster.
Having the timer to updated every one or two seconds is fine because
you like to know the the routine is running.
 
J

joeu2004

Daka said:
I think I found the problem why the status bar is not updating. [....]
This is because "Timer - st" eventaully produces a negative
value so the condition will always be false.

Right, if you run across midnight. A simple fix....

Dim y as double
..... some code ....
y = Timer - st
If y >= 1 or y < 0 Then
st = Time
.... etc ...
Endif

D said:
I have noticed that when the status bar is not being updated
the routine runs much faster; maybe ten time or more faster.

In my experience, the difference was nothing that large. But yes, it would
run faster with fewer updates to the status bar.

Having the timer to updated every one or two seconds is fine
because you like to know the the routine is running.

The expression ``Timer - st`` is simply computing time difference in
seconds. So change ``y >= 1`` to ``y >= 2`` for 2 seconds, etc.

It would be even better we compute ``Timer - st`` less often. The best
solution is an adaptive algorithm based on nCombin. But frankly, I do not
have any more time work on this.

Perhaps someone else will help you if you post a new inquiry specifically
about reducing the frequency of executing ``Timer - st``.

Good luck!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top