taking average of each set of all possible combinations

T

Tom Mort

I'd like to take the average of all possible combinations of an array
and get the average of each set of permutations. I will specify how
many permutions per set. Ultimately I will then look at the the
percentage of these averages above or below a certain number.

I've got some code below that will list all the permutations, however
I have to add them together and then divide by the number of each set.
I don't really know VBA, maybe someone can help with this.

Also this code puts the results in a new worksheet. I would like this
to allways put the results in the same worksheet, say worksheet two
after first deleting any existing data.

Again thanks in advance.

The code is below:

Option Explicit

Dim vAllItems As Variant
Dim Buffer() As String
Dim BufferPtr As Long
Dim Results As Worksheet
'
' Myrna Larson, July 25, 2000, Microsoft.Public.Excel.Misc

Sub ListPermutationsOrCombinations()
Dim Rng As Range
Dim PopSize As Integer
Dim SetSize As Integer
Dim Which As String
Dim n As Double
Const BufferSize As Long = 4096

Worksheets("Sheet1").Range("A1").Select
Set Rng = Selection.Columns(1).Cells
If Rng.Cells.Count = 1 Then
Set Rng = Range(Rng, Rng.End(xlDown))
End If

PopSize = Rng.Cells.Count - 2
If PopSize < 2 Then GoTo DataError

SetSize = Rng.Cells(2).Value
If SetSize > PopSize Then GoTo DataError

Which = UCase$(Rng.Cells(1).Value)
Select Case Which
Case "C"
n = Application.WorksheetFunction.Combin(PopSize, SetSize)
Case "P"
n = Application.WorksheetFunction.Permut(PopSize, SetSize)
Case Else
GoTo DataError
End Select
If n > Cells.Count Then GoTo DataError

Application.ScreenUpdating = False

Set Results = Worksheets.Add

vAllItems = Rng.Offset(2, 0).Resize(PopSize).Value
ReDim Buffer(1 To BufferSize) As String
BufferPtr = 0

If Which = "C" Then
AddCombination PopSize, SetSize
Else
AddPermutation PopSize, SetSize
End If
vAllItems = 0

Application.ScreenUpdating = True
Exit Sub

DataError:
If n = 0 Then
Which = "Enter your data in a vertical range of at least 4 cells."
_
& String$(2, 10) _
& "Top cell must contain the letter C or P, 2nd cell is the
Number" _
& "of items in a subset, the cells below are the values from
Which" _
& "the subset is to be chosen."

Else
Which = "This requires " & Format$(n, "#,##0") & _
" cells, more than are available on the worksheet!"
End If
MsgBox Which, vbOKOnly, "DATA ERROR"
Exit Sub
End Sub

Private Sub AddPermutation(Optional PopSize As Integer = 0, _
Optional SetSize As Integer = 0, _
Optional NextMember As Integer = 0)

Static iPopSize As Integer
Static iSetSize As Integer
Static SetMembers() As Integer
Static Used() As Integer
Dim i As Integer

If PopSize <> 0 Then
iPopSize = PopSize
iSetSize = SetSize
ReDim SetMembers(1 To iSetSize) As Integer
ReDim Used(1 To iPopSize) As Integer
NextMember = 1
End If

For i = 1 To iPopSize
If Used(i) = 0 Then
SetMembers(NextMember) = i
If NextMember <> iSetSize Then
Used(i) = True
AddPermutation , , NextMember + 1
Used(i) = False
Else
SavePermutation SetMembers()
End If
End If
Next i

If NextMember = 1 Then
SavePermutation SetMembers(), True
Erase SetMembers
Erase Used
End If

End Sub 'AddPermutation

Private Sub AddCombination(Optional PopSize As Integer = 0, _
Optional SetSize As Integer = 0, _
Optional NextMember As Integer = 0, _
Optional NextItem As Integer = 0)

Static iPopSize As Integer
Static iSetSize As Integer
Static SetMembers() As Integer
Dim i As Integer

If PopSize <> 0 Then
iPopSize = PopSize
iSetSize = SetSize
ReDim SetMembers(1 To iSetSize) As Integer
NextMember = 1
NextItem = 1
End If

For i = NextItem To iPopSize
SetMembers(NextMember) = i
If NextMember <> iSetSize Then
AddCombination , , NextMember + 1, i + 1
Else
SavePermutation SetMembers()
End If
Next i

If NextMember = 1 Then
SavePermutation SetMembers(), True
Erase SetMembers
End If

End Sub 'AddCombination

Private Sub SavePermutation(ItemsChosen() As Integer, _
Optional FlushBuffer As Boolean = False)

Dim i As Integer, sValue As String
Static RowNum As Long, ColNum As Long

If RowNum = 0 Then RowNum = 1
If ColNum = 0 Then ColNum = 1

If FlushBuffer = True Or BufferPtr = UBound(Buffer()) Then
If BufferPtr > 0 Then
If (RowNum + BufferPtr - 1) > Rows.Count Then
RowNum = 1
ColNum = ColNum + 1
If ColNum > 256 Then Exit Sub
End If

Results.Cells(RowNum, ColNum).Resize(BufferPtr, 1).Value _
= Application.WorksheetFunction.Transpose(Buffer())
RowNum = RowNum + BufferPtr
End If

BufferPtr = 0
If FlushBuffer = True Then
Erase Buffer
RowNum = 0
ColNum = 0
Exit Sub
Else
ReDim Buffer(1 To UBound(Buffer))
End If

End If

'construct the next set
For i = 1 To UBound(ItemsChosen)
sValue = sValue & ", " & vAllItems(ItemsChosen(i), 1)
Next i

'and save it in the buffer
BufferPtr = BufferPtr + 1
Buffer(BufferPtr) = Mid$(sValue, 3)
End Sub 'SavePermutation
 
H

Harlan Grove

Tom Mort said:
I'd like to take the average of all possible combinations of an array
and get the average of each set of permutations. I will specify how
many permutions per set. Ultimately I will then look at the the
percentage of these averages above or below a certain number.

You're being sloppy with terminology. {1,2,3} and {3,1,2} represent
different permutations of the same combination, and all permutations of the
same combination have the same average (and sum and count and variance . .
..), so kinda pointless to average different permutations.

A set with N elements has 2^N total subsets, and 2^N-2 nontrivial proper
subsets, so a set with, say, 32 elements has over 4 billion (US - 10^9)
subsets, or combinations of 1 to 32 elements.

If you have an array of 32 positive numbers, and you want to know the
percentage of all of its subsets with averages less than X, there are better
approaches to take than brute force. Obviously all subsets containing only
elements each less than X will have averages less than X, and all subsets
containing only elements greater than X will have averages greater than X.
More generally, all subsets of n < 32 elements which sum to less than n*X
will have averages less than X, and all that sum to more than n*X will have
averages greater than X.

These sorts of optimizations could be done manually, and they'll be a lot
faster than code implementing brute force algorithms.
 
T

Tom Mort

I did mean perumutions. I realize that the same numbers in a different
order would have the same average, but, I'm interested in what
percentage of the total permutations would meet a criteria.

I guess I'd consider optimization, but I don't want to do it manually as
this is intended to be used over and over and distributed to others.
Ultimately all the remaining permutions would still have to be crunched
so the initial question remains whether this code can be adapted to
provide the average.

A typical array for this would be <100 datapoints with the number of
items in a set 5 or less
 
T

Tom Ogilvy

That code generates all the combinations/permutations of a subset of n items
taken from a set of m items where n must be less than or equal to m.

It doesn't do all subsets of any size from an array of m items. Not sure
what you really want to do.
 
H

Harlan Grove

Tom Mort said:
I did mean perumutions. I realize that the same numbers in a different
order would have the same average, but, I'm interested in what
percentage of the total permutations would meet a criteria.
....

And what do you mean by 'percentage of the total permutations would meet a
criteria'? If these criteria were any of the typical descriptive statistics,
then either 0% or 100% of *permutations* would satisfy them. That is, *all*
permutations of the same combination of elements would have the same
descriptive statistics. Or are you interested in order-dependent criteria?
A typical array for this would be <100 datapoints with the number of
items in a set 5 or less

I should asked this direct question before: what are you trying to do?

Note that SUMPRODUCT(COMBIN(90,{5;4;3;2;1})) returns 46,626,033, which is
rather a lot of individual combinations to analyze using brute force
techniques. Unless you intend to provide these users with a run-over-night
or run-over-the-weekend application, you need to return the initial design
stage and rethink what you're trying to do. It's a dead certainty your
application as you've described it can't be interactive.
 
T

Tom Mort

I thought I sent a reply yesterday but don't see it.

What I'm trying to do is set up a program to help companies make an
informed decesion as to how many wastewater sample to take for
determining extra user charges and to compare with a monthly average
limit.

Certain business like bakeries etc have stronger sewage than regular
households. To make up the difference in costs to treat this sewage,
cites normally charge extra for concentrations above a certain amount.
Sampling is required to determine the concentration.

Also, at least in our community, there will be individual upper
concentration limits (a monthly average) that companies are not to
exceed. This is to insure that the municipal wasteater plant treatment
capacity is not exceeded.

In the past just a few samples per quarter were taken to assess the
extra charge. As an upper limit comes into play businesses will be
urged to perform self monitoring to assess compliance. That leads to
the question of how frequently to do this.

Each individual business has concentrations within a renge unique to
that business. The upper limits (at least intially) are based on the
90th percentile or so of sampling results for the past couple years. On
the one hand if a business doesn't have much variation they might be
able to only take one sample per month and be reasonably sure they have
a representative sample and that this one sample would meet the average
limit. On the other extreme to account for variation a sample could be
taken every day. That way the most representative sample would be taken
and any high anomolies would be averaged out.

What I't like is to have something that the businesses could enter their
past results for a period of time and the number of samples they
proposed to take per month and it would tell them what the odds
(assuming current conditions and variability are within the range of
past) that the average concentraion from this many samples would be
under the average limit.
 
H

hgrove

Tom Mort wrote...
...
What I't like is to have something that the businesses could
enter their past results for a period of time and the number of
samples they proposed to take per month and it would tell them
what the odds (assuming current conditions and variability are
within the range of past) that the average concentraion from
this many samples would be under the average limit.

I was afraid of this. How much stats background do you have?

When, exactly, would businesses sample themselves? When would the wast
water facility or agency sample their output? Does the latter sampl
each business on site?

Your individual businesses are going to need to establish baselin
samples, which means a few days of sampling at specific, relativel
frequent intervals. From that, they may be able to establish relativel
homogeneous periods within the day when they can take one of tw
samples going forward.

Anyway, your sampling procedure should *NOT* involve combinatorics. Yo
can use standard statisitcal techniques to set sample sizes based o
normal distribution assumptions (unless the sample concentration value
show pronounced asymmetry) along with regional overall and specifi
industry descriptive statistics to supplement individual busines
sample values.

This isn't my line of work, so I'm not comfortable enough to go int
the statistical details. However, I'd be very surprised if there wasn'
something available through SIAM that didn't address your needs closel
if not exactly (there are one heck of a lot of engineers
mathematicians and statisticians working in the chemical an
electronics industries, and post-Super Fund, they devote A LOT of tim
and attention to matters of managing discharges from their factories)
 
T

Tom Mort

Well for virtually all of the business there is at least many years
worth of pretty much randomly taken samples. The samples are 24 hour
composite samples and are set to take a sample every 15 minutes. If
there is no flow for a period of time there is no sample. We would
sample 3 days per quarter. So there is background data already.

I have a little statistics background. What is SIAMs?

I can get a suggested sample frequency by divding standard deviation by
variance, but it doesn't really address averages of sets of samples.

Tell me more
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top