Randomize three variables subject to sum always equal to 1

sulprobil · Sep 1, 2004

Tom,

I do not think your solution is wrong. I think it produces
(relatively) more values in the [0.1,0.9]-intervall than
in (0,0.1) and (0.9,1), for example.

This is because if your algorithm creates a first extreme
value of 0.9999, for example, then a high likelihood
exists that it will be shifted into direction 0.5 due to
your sum- and 1/sum-calculation.

A quick simulation shows that my solution generates about
1% numbers < 0.01, 8% numbers < 0.1 and 14% numbers < 0.2;
where your algorithm generates about 0.5% numbers < 0.01,
3% numbers < 0.1 and 8% numbers < 0.2.

The stated problem introduced two restrictions (on all
three numbers, not the third or any other):
1. All numbers are > 0.
2. All numbers sum to 1.

With my algorithm I try to simulate this problem
efficiently without introducing further restrictions or
making further assumptions.

Kind regards,
sulprobil
Reverse(moc.liborplus.www)

Tom Ogilvy · Sep 1, 2004

With my algorithm I try to simulate this problem

efficiently without introducing further restrictions or
making further assumptions

I believe I could make the same statement.

Your algorithm produces less than 15% of your numbers at > .5 so naturally
you will have a higher percentage in the lower half of the numbers, but I am
not sure how that achieves all the above.

You stated:
I think it produces

(relatively) more values in the [0.1,0.9]-intervall than
in (0,0.1) and (0.9,1), for example.

which seems to be true for your algorithm as well, although your
distribution is decreasing exponential like from the left, so you have a
large amount in [0,0.1] from what I could see.

A quick simulation shows that my solution generates about
1% numbers < 0.01, 8% numbers < 0.1 and 14% numbers < 0.2;

A quick simulation for me showed
your solution generates about
4.1% numbers < 0.01, 25.4% numbers < 0.1 and 41% numbers < 0.2;

based on 10000 calls to your function (30000 numbers generated).

Public Function TRAT() As Variant
Dim vTemp As Variant
Dim i1, i2, i3 As Integer
Randomize
Application.Volatile
ReDim vTemp(0 To 2)

i1 = Int(Rnd * 3)
i2 = (i1 + 1) Mod 3
i3 = (i1 + 2) Mod 3
vTemp(i1) = Round(Rnd, 15)
vTemp(i2) = Round(Rnd * (1 - vTemp(i1)), 15)
vTemp(i3) = 1 - vTemp(i1) - vTemp(i2)
' TRAT = Application.Transpose(vTemp)
TRAT = vTemp
End Function

Sub Tester1()
For i = 1 To 10000
varr1 = TRAT
For j = LBound(varr1) To UBound(varr1)
If varr1(j) <= 0.1 Then
m = m + 1
ElseIf varr1(j) <= 0.9 Then
n = n + 1
Else
o = o + 1
End If
Next j
Next i
Range("C1") = m
Range("C2") = n
Range("C3") = o
End Sub
--
Regards,
Tom Ogilvy

sulprobil said:
Tom,

I do not think your solution is wrong. I think it produces
(relatively) more values in the [0.1,0.9]-intervall than
in (0,0.1) and (0.9,1), for example.

This is because if your algorithm creates a first extreme
value of 0.9999, for example, then a high likelihood
exists that it will be shifted into direction 0.5 due to
your sum- and 1/sum-calculation.

A quick simulation shows that my solution generates about
1% numbers < 0.01, 8% numbers < 0.1 and 14% numbers < 0.2;
where your algorithm generates about 0.5% numbers < 0.01,
3% numbers < 0.1 and 8% numbers < 0.2.

The stated problem introduced two restrictions (on all
three numbers, not the third or any other):
1. All numbers are > 0.
2. All numbers sum to 1.

With my algorithm I try to simulate this problem
efficiently without introducing further restrictions or
making further assumptions.

Kind regards,
sulprobil
Reverse(moc.liborplus.www)

grinder62 · Sep 1, 2004

I think the following will work. Actually, you don't need VBA at all fo
this. To get around the problem of negative values for the third number
I'd do the following (this uses the Excel RAND function in th
pseudocode as opposed to the Rnd VBA function).

DesiredSum = 1 (or whatever you want)
r1 = RAND()
r2 = RAND()
r3 = RAND()
SumRand = r1 + r2 + r3
d1 = (r1/SumRand)*DesiredSum
d2 = (r2/SumRand)*DesiredSum
d3 = (r3/SumRand)*DesiredSum

d1, d2, and d3 are the "random" values that now sum to DesiredSum. The
are simply scaled versions of r1, r2, and r3. Note these value
(individually) are not uniformly distributed between 0 and 1. Nor ar
they independent (they are negatively correlated with one another).

Roger Grind

sulprobil · Sep 1, 2004

I get the following distribution for 10,000 runs (example):

Likelihood(0,0.1) 0.2471 0.2615 0.2513 0.1087
0.1085 0.107
Likelihood(0.1,0.2) 0.1614 0.1578 0.1593 0.143
0.1407 0.1427
Likelihood(0.2,0.3) 0.1195 0.1276 0.1288 0.1761
0.1764 0.1821
Likelihood(0.3,0.4) 0.1114 0.1022 0.1045 0.2195
0.2294 0.2222
Likelihood(0.4,0.5) 0.0886 0.0905 0.0897 0.1849
0.1784 0.1829
Likelihood(0.5,0.6) 0.0701 0.0738 0.0705 0.0973
0.0926 0.0903
Likelihood(0.6,0.7) 0.0639 0.0602 0.0617 0.0409
0.0431 0.044
Likelihood(0.7,0.8) 0.0578 0.0521 0.0501 0.0175
0.0207 0.0205
Likelihood(0.8,0.9) 0.0443 0.0402 0.0465 0.0105
0.0084 0.0069
Likelihood(0.9,1) 0.0359 0.0341 0.0376 0.0016
0.0018 0.0014

First 3 numbers are produced by my algorithm, last 3 by
yours. You are right: my algorithm gets 25% numbers < 0.1
and 41% < 0.2. But numbers > 0.5 are not below 15%, they
are about 26.5% (yours about 16.5%).

Now that's what I meant: We should ask the original poster
what he intends to simulate - your distribution is
concentrated in the (0.2,0.5)-intervall while mine is
decreasing strictly.

And please do not misunderstand me: I think my
distribution reflects the stated problem with given
restrictions in a canonic way while your distribution is
influenced substantially by the 1/sum-calculation.

Kind regards,
sulprobil

sulprobil · Sep 2, 2004

Yes. That reflects the distribution of Tom's VB program.

Another Worksheet-solution would be:
A1: DesiredSum (1 for example)
B1: =RAND()*$A$1
C1: =RAND()*($A$1-B1)
D1: =$A$1-B1-C1
E1: =INT(RAND()*3)
F1: =OFFSET($B1,0,MOD($E1+COLUMN(), 3))
G1, H1: same as F1

F1:H1 are the - not uniformly distributed, not
independend - random numbers, distribution like my VB
program.

Regards,
sulprobil

John Coleman · Sep 2, 2004

emsfeld said:
Hi guys

I need a hand on this one, cuz I have absolutely no clue:

I need a code that randomly selects three real numbers between 0 and 1
and assigns these to three variables. I defined the variables as w(0),
w(1), w(2). The problem is that the sum has to equal 1! Always! So
that:

w(0) + w(1) + w(2) = 1

Help is very much appreciated!!!

Thanks

Emsfeld

Greetings
As other posters have pointed out, there in more than 1 way to answer
depending on what you are looking for. A very easy way is to imagine
the 3 numbers as being the result of two random cuts, where the random
cuts are uniformly distributed in [0,1]:
something like

cut1 = rnd()
cut2 = rnd()
if cut1 > cut2 then
temp = cut1
cut1 = cut2
cut2 = temp
end if
w(0) = cut1
w(1) = cut2 - cut1
w(2) = 1 - cut2

works.

On the other hand, what you are asking for might be the following:
pick a point (x,y,z) at random which satisfies x,y,z >= 0 and x + y +
z = 1. Then you are talking about picking a point at random
(presumably uniformly) from an equilateral triangle with sides sqrt(2)
which is situated in 3-D space. A crude algorithm would be to first do
this for a triangle situated in the plane (perhaps by a hit-miss
approach - embed the triangle in a square with sides sqrt(2) and
select points from the square at random (easy) until you get 1 that
hits the triangle, on average 2 or 3 attempts will suffice) and then
apply a linear transformation to get your point in 3-space. I'm sure
that better algorithms exist but this will work and won't be too hard
to code, though not worth the effort if it isn't what you want. If you
want - I can try to code it and might do so tomorrow for fun anyway.

Hope this helps

-John Coleman

John Coleman · Sep 3, 2004

My last post was way too complicated. I still think that the best
solution to the problem is to generate points (x,y,z) uniformly in the
region detrmined by the constraints, but there is a simpler way to do
so:

Note that the region in question is the graph of the linear function z
= 1-x-y
over the region in the x,y-plane determined by the constraints x,y

=0, x+y <=1.

This later region is exactly one half of the unit square, for which
x+y = 1 is a diagonal. It is clearly enough to pick a point uniformly
distributed in the triangle in the x-y plane and then just evaluate z.
If x and y are independent uniform variables in [0,1], then (x,y) is
uniformly distributed in the unit square. If it is below the diagonal
(if x+y <= 1) - ok. If not, then it is uniformly distributed in the
triangle above the square. But these two triangles are congruent, so
reflecting over the line x+y = 1 puts you in the right triangle (with
the right distribution) after all. Here is code which implemements
this:

Function pickthree() As Variant

'This function returns a variant array consisting of 3
'pseudorandom numbers uniformly distributed over the region
'x,y,z >= 0, x+y+z = 1

Dim num1 As Double, num2 As Double, num3 As Double

Randomize
num1 = Rnd()
num2 = Rnd()
If num1 + num2 > 1 Then 'reflect over line x+y = 1
num1 = 1 - num1
num2 = 1 - num2
End If

'x,y ok now - so let z = 1-x-y and return

num3 = 1 - num1 - num2
pickthree = Array(num1, num2, num3)

End Function

A simple driver to test the function:

Sub testit()
Dim i As Integer, j As Integer
Dim w As Variant
For i = 1 To 20
w = pickthree()
For j = 0 To 2
Range("A1").Offset(i, j).Value = w(j)
Next j
Next i
End Sub

Hope this helps

-John Coleman

kcc · Sep 4, 2004

The problem, or it may be a problem depending on the original poster's need,
is that most of the solutions do not produce 3 RV's with the same
distribution.
Another way to think of the problem is to start with a pie. Generate 3 RV's
where each defines the location of where to make a cut. 0 is at 12 o'clock,
..5 at 6 o'clock, etc. These values would be transformed by defining 3 new
RV's as the size of the slices. By definition, the sum will equal the
whole.

No VB is needed. Just use the following formulas:
Row 1 has the uniformly dist RV's
A1, B1 & C1=rand()
Row 2 is the RV's sorted
A2 = min(A1:C1), B2=large(A1:C1,2), C2=max(A1:C1)
Then the desired RV's would be
A3=B2-A2, B3=C2-B2, C3=1-C2+A2

As with all the solutions, the RV's are not independent.
It's been too many years for me to do the math, but the distribution
appears to be exponential in shape with the highest density at 0
and lowest at 1.
If I could do the math, I also might prove a suspicion that the result is
effectively the same as Tom's.

John Coleman said:
My last post was way too complicated. I still think that the best
solution to the problem is to generate points (x,y,z) uniformly in the
region detrmined by the constraints, but there is a simpler way to do
so:

Note that the region in question is the graph of the linear function z
= 1-x-y
over the region in the x,y-plane determined by the constraints x,y

=0, x+y <=1.

Click to expand...

This later region is exactly one half of the unit square, for which
x+y = 1 is a diagonal. It is clearly enough to pick a point uniformly
distributed in the triangle in the x-y plane and then just evaluate z.
If x and y are independent uniform variables in [0,1], then (x,y) is
uniformly distributed in the unit square. If it is below the diagonal
(if x+y <= 1) - ok. If not, then it is uniformly distributed in the
triangle above the square. But these two triangles are congruent, so
reflecting over the line x+y = 1 puts you in the right triangle (with
the right distribution) after all. Here is code which implemements
this:

Function pickthree() As Variant

'This function returns a variant array consisting of 3
'pseudorandom numbers uniformly distributed over the region
'x,y,z >= 0, x+y+z = 1

Dim num1 As Double, num2 As Double, num3 As Double

Randomize
num1 = Rnd()
num2 = Rnd()
If num1 + num2 > 1 Then 'reflect over line x+y = 1
num1 = 1 - num1
num2 = 1 - num2
End If

'x,y ok now - so let z = 1-x-y and return

num3 = 1 - num1 - num2
pickthree = Array(num1, num2, num3)

End Function

A simple driver to test the function:

Sub testit()
Dim i As Integer, j As Integer
Dim w As Variant
For i = 1 To 20
w = pickthree()
For j = 0 To 2
Range("A1").Offset(i, j).Value = w(j)
Next j
Next i
End Sub

Hope this helps

-John Coleman

Bernd Plumhoff · Sep 4, 2004

We got three different distributions so far.

Option Explicit

'*********************************************************
'Purpose: produce n random numbers which sum up to 1
'Inputs: Explicitly: Integer parameter to specify desired
' distribution, i.e.
' 1 = reduce degree of freedom linearly
' 2 = divide n random numbers by their sum
' 3 = n-1 random cuts of (0,1)-intervall
' 4 = 2 random cuts of (0,1)-intervall
' Implicitly: n (by selecting n cells in worksheet)
'Outputs: n numbers
'*********************************************************
Public Function RandSum1(ByVal distribution As Long) As Variant
'Suggestions for distributions:
' 1: J.E. McGimpsey
' 2: Tom Ogilvy
' 3,4: John Coleman
Dim vArr As Variant
Dim vResult, temp As Variant
Dim nCount, i, j As Long
Dim sum As Double
Application.Volatile
If TypeName(Application.Caller) <> "Range" Then Exit Function
With Application.Caller
ReDim vResult(1 To .Rows.Count, 1 To .Columns.Count)
nCount = .Count
If nCount = 1 Then
RandSum1 = 1
Exit Function
End If
End With
ReDim vArr(1 To nCount) As Variant
Select Case distribution
Case 1
ReDim nRand(1 To nCount) As Long
For i = 1 To nCount
nRand(i) = i
Next i
For i = 1 To nCount - 1
j = Int(Rnd * (nCount - i + 1)) + i
vArr(nRand(j)) = Rnd * (1# - sum)
sum = sum + vArr(nRand(j))
nRand(j) = nRand(i)
Next i
vArr(nRand(nCount)) = 1# - sum
Case 2
For i = 1 To nCount
vArr(i) = Rnd
sum = sum + vArr(i)
Next i
For i = 1 To nCount
vArr(i) = vArr(i) / sum
Next i
Case 3
For i = 1 To nCount - 1
vArr(i) = Rnd
j = i - 1
While j > 0
If vArr(j) > vArr(j + 1) Then
temp = vArr(j + 1)
vArr(j + 1) = vArr(j)
vArr(j) = temp
End If
j = j - 1
Wend
Next i
vArr(nCount) = 1# - vArr(nCount - 1)
i = nCount - 1
While i > 1
vArr(i) = vArr(i) - vArr(i - 1)
i = i - 1
Wend
Case 4 'Special case of 3
If nCount <> 3 Then
Exit Function
End If
vArr(1) = Rnd
vArr(2) = Rnd
If vArr(1) + vArr(2) > 1# Then
vArr(1) = 1# - vArr(1)
vArr(2) = 1# - vArr(2)
End If
vArr(3) = 1# - vArr(1) - vArr(2)
Case Else
End Select
nCount = 1
For i = 1 To UBound(vResult, 1)
For j = 1 To UBound(vResult, 2)
vResult(i, j) = vArr(nCount)
nCount = nCount + 1
Next j
Next i
RandSum1 = vResult
End Function

JE McGimpsey · Sep 4, 2004

And I suspect from his lack of participation that we've left the OP far
behind...

Lars-Åke Aspelin · Sep 4, 2004

The problem, or it may be a problem depending on the original poster's need,
is that most of the solutions do not produce 3 RV's with the same
distribution.

And the one suggested below doesn't do that either.
We don't know if that is a problem or not as the original problem was
to vaguely defined when it comes to what "randomly selects" mean,

Another way to think of the problem is to start with a pie. Generate 3 RV's
where each defines the location of where to make a cut. 0 is at 12 o'clock,
.5 at 6 o'clock, etc. These values would be transformed by defining 3 new
RV's as the size of the slices. By definition, the sum will equal the
whole.

No VB is needed. Just use the following formulas:
Row 1 has the uniformly dist RV's
A1, B1 & C1=rand()
Row 2 is the RV's sorted
A2 = min(A1:C1), B2=large(A1:C1,2), C2=max(A1:C1)
Then the desired RV's would be
A3=B2-A2, B3=C2-B2, C3=1-C2+A2

As with all the solutions, the RV's are not independent.
It's been too many years for me to do the math, but the distribution
appears to be exponential in shape with the highest density at 0
and lowest at 1.
If I could do the math, I also might prove a suspicion that the result is
effectively the same as Tom's.

No, it is not the same as the "divide by the sum" result given by Tom
where the "RV's" do have the same distribution (for what it's worth).
Actually your "A2", "B2", and "C2" variables have the averages of
0.25, 0.5 and 0.75 respectively. That means that your
"A3", "B3" and "C3" variables have the averages of
0.25, 0.25 and 0.5 respectively. Thus the three "RV's" do not have
the same distrubution.

Lars-Åke

Tom Ogilvy · Sep 5, 2004

He/She probably ran with yours and has their grade back already <g>

kcc · Sep 5, 2004

Lars-Åke Aspelin said:
And the one suggested below doesn't do that either.
We don't know if that is a problem or not as the original problem was
to vaguely defined when it comes to what "randomly selects" mean,

No, it is not the same as the "divide by the sum" result given by Tom
where the "RV's" do have the same distribution (for what it's worth).
Actually your "A2", "B2", and "C2" variables have the averages of
0.25, 0.5 and 0.75 respectively. That means that your
"A3", "B3" and "C3" variables have the averages of
0.25, 0.25 and 0.5 respectively. Thus the three "RV's" do not have
the same distrubution.

Lars-Åke

I hate being wrong. The pie analogy was fine, but the implementation
was flawed. I don't know why it didn't occur to me that sorting would
effect the distribution. If I skip the extra step and make
A2==IF(A1=MIN($A1:$C1),LARGE($A1:$C1,2)-A1,IF(A1=LARGE($A1:$C1,2),MAX($A1:$C
1)-A1,1-A1+MIN($A1:$C1)))
and copy to B2 and C2, row 2 will have the distribution I was shooting for.
This time I tested all three RV's rather than one and assume they where the
same. This time, each has a mean of 1/3, as expected.
Ken

Lars-Åke Aspelin · Sep 5, 2004

I hate being wrong. The pie analogy was fine, but the implementation
was flawed. I don't know why it didn't occur to me that sorting would
effect the distribution. If I skip the extra step and make
A2==IF(A1=MIN($A1:$C1),LARGE($A1:$C1,2)-A1,IF(A1=LARGE($A1:$C1,2),MAX($A1:$C
1)-A1,1-A1+MIN($A1:$C1)))
and copy to B2 and C2, row 2 will have the distribution I was shooting for.
This time I tested all three RV's rather than one and assume they where the
same. This time, each has a mean of 1/3, as expected.
Ken

Yes, the sorting is obviously not as innocent as one would imagine.
Without the sorting your three distributions are equal as you state
above.

I really liked your pie analogy and thats why I started fiddling about
with it. My first thought was that once you had selected your three
cuts there would be no changes to the sizes of the pie slices if you
"rotated" the pie to get one of the cuts at "12 o'clock".
The other two cuts will still be uniformly distributed over the pice.

With that "predefined cut" you just have to generate two random
cuts and this will be the same as cutting the interval [0,1] with two
cuts (which someone else might already have proposed).

So with the two randoms in A1 and B1 the following formulas give the
same "common" distribution to the three "RV's" as your example above:

In A2: "=MIN(A1:B1)" (the first slice)
In B2: "=ABS(A1-B1)" (the second slice)
In C2: " =1-MAX(A1:B1)" (the third slice)

The distribution of all these "RV's" is not exponential but triangular
with the probability density function (pdf) equal to 2*(1-x).
(0<=x<=1)

You can obtain the same result with your formulas above and just set
one of the three random numbers in A1:C1 to any constant value in
[0,1] e.g. a 0 in C1. I just chose to skip it from the calculations.
So there is no need for more than two random numbers if you are
satisfied with the triangular distribution for the three "RV's".
The "divide by the sum" proposal gives a "more balanced distribution"
but that involves three random numbers in the calculations.

Lars-Åke

Randomize three variables subject to sum always equal to 1

sulprobil

Tom Ogilvy

grinder62

sulprobil

sulprobil

John Coleman

John Coleman

kcc

Bernd Plumhoff

JE McGimpsey

Lars-Åke Aspelin

Tom Ogilvy

kcc

Lars-Åke Aspelin