Bias in rand for excel 07

J

Jerry W. Lewis

(R825001C3:R865000C3=7)
is an array of logical values, which AVERAGE would ignore. Multiplying by
one coerces TRUE to 1 and FALSE to 0, so that you average an indicator
variable of whether 7 occurred in a cell or not.

This type of thing is probably discussed in some 'tips and tricks' books,
but I don't have access to any and so can't give you a recommendation.

Jerry

yttrias said:
Jerry,

Thanks again for your code suggestion. Although the array formula works
fine, I don't understand the syntax. How does the "*1" function? Can I
substitute a variable for the "7" delineator? (When I try to do that, I get
error.)

Do you know of any published reference to this, and other, helpful array
formula expressions that are (for me) beyond the ordinary?
--
yttrias


Jerry W. Lewis said:
The code was very helpful. I agree that the expcted value in F1 should be
1/6=0.666... and that 0.1587 is a sigificant departure from expectation (the
missing decimal points hindered my understanding).

I am surprised that CountSevens() worked at all, since i is declared an
integer, but takes values that are outside the valid range for an
integer--that may have been the problem.

There is a slight bias built into using either the INT() or TRUNC()
functions. For example
=((2-2^-52)-2)
is <0, so =INT(2-2^-52) should be 1, but MS, in trying to hide the binary
underpinnings, returns 2. Therefore there will be a slight reduction in the
number of 6's and a slight increase in the number of 1's in your pseudo
random integers, but these slight biases should not be statistically
detectable in an experiment of your size. If you are concerned, you can
avoid them by using an intermediate cell to hold the =RAND() values which you
would then transform with the slightly more complicated formula
=IF(((x*6)-INT(x*6))<0,INT(x*6)-1,INT(x*6))

Some recalculations of the random integers occur during the execution of
CountSevens(). It is not clear to me whether these might be occurring during
the execution of the For/Next loop; regardless, I don't see how it could
cause an undercount of 7's.

Here is an alternate VBA code, that simplifies and combines the work of both
OneMillionEvents() and CountSevens(). It also replaces the static value in
F1 with a dynamic formula that will update whenever the RAND() functions
recalculate. I tried about a dozen reps in Excel 2007, and got only one rep
that was statistically different from expectation.

Sub SimplerVersion()
With Range("A1:C1000000")
.ClearContents
.NumberFormat = "0"
End With
Range("A1:B1000000").FormulaR1C1 = "=1+INT(RAND()*6)"
Range("C1:C1000000").FormulaR1C1 = "=RC[-2]+RC[-1]"
Range("F1").FormulaArray = "=AVERAGE((R825001C3:R865000C3=7)*1)"
End Sub

Jerry

yttrias said:
I am grateful to those of you who have responded so quickly to my OP. I
hope you will be able to point out the errors of my ways. The following is
the code sequence for the generation of two sets of 1-million integers
between 1 and 6, and their sum. Then, the subsequent code counts the number
of sevens in a sample of 40,000 sums (divided by 40,000) to give decimal
representation to be compared with ideal ratio of 1/6. I ran the latter code
20 times at arbitrary locations, and averaged those results.

Sub OneMillionEvents()
Range("A1:C1000000").Select
Selection.ClearContents
Selection.NumberFormat = "0"
Range("A1").Select
ActiveCell.FormulaR1C1 = "=1+int(RAND()*6)"
Range("B1").Select
ActiveCell.FormulaR1C1 = "=1+int(RAND()*6)"
Range("C1").Select
ActiveCell.FormulaR1C1 = "=RC[-2]+RC[-1]"
Range("A1:C1000000").Select
Selection.FillDown
Calculate
End Sub

Sub CountSevens()
Dim i, n As Integer
Range("A1").Select
ActiveCell(1, 5) = 40000
ActiveCell(1, 6) = 0
n = 0
For i = 825001 to 865000
If ActiveCell(i, 3) = 7 Then
n = n + 1
End If
Next i
ActiveCell(1, 6) = n / ActiveCell(1, 5)
End Sub

Regards,

Yttrias

--
yttrias


:

You are still providing too little information for us to understand what you
are doing.

There is no RAND() function in VBA. In VBA, you either used the VBA Rnd()
function, or the worksheet RAND() function via Evaluate("RAND()").
As noted in my earlier post, in 2003 the worksheet RAND() function was far
superior to the VBA Rnd(). I have seen no evidence that either function
changed between 2003 and 2007.

Your original post discussed the sum of two (presumably independent) random
integers between 1 and 6 (such as the total from rolling two fair dice). The
expected value of a single such sum would be 7. It is unclear how that
relates to either your observed average of 1587 or expected average of 1667
for an "average of 20 windows of 40,000 consecutive values.

It sounds as though your VBA would involve only a few lines of code. If you
paste that code into your reply, we can see exactly what you did.

Jerry

:

I ran 1 million samples using RAND in VBA in excel 07. Then I sampled 40,000
consecutive values at 20 different locations throughout the range of 1M. The
average of those 20 windows in the 1M was 1587, where the expected average
should be 1667.
--
yttrias


:

And what was your sample size?
--
Bernard V Liengme
Microsoft Excel MVP
www.stfx.ca/people/bliengme
remove caps from email

I'm testing random numbers for games using excel 07 generator, but there
is
too much bias away from statistical expectation. For example, when two
integer numbers are generated between 1 and 6, and subsequently added, the
results differ from the easily predictable numbers. Sevens are
consistently
in short supply by about 1.5%. That's a lot! Any suggestions for better
randomization in excel would be very much appreciated.
 
R

Rick Rothstein \(MVP - VB\)

I presume your question is dealing with this line of code..

Because the text string is being assigned to the FormulaArray property of
the Range, it means that the text will be evaluated as a spreadsheet
formula, so it must meet the rules for those type of functions. The *1 is
multiplying the logical expression, which will be a TRUE or FALSE result, by
a number in order to turn the TRUE or FALSE result into a number (which the
AVERAGE function needs in order to do its calculation). Sometimes you see
this conversion handled by a double unary affixed to the beginning of the
logical expression (in essence, forcing the calculation by multiply by minus
one times minus one....

Range("F1").FormulaArray = "=AVERAGE(--(R825001C3:R865000C3=7))"

As for making the "7" a variable instead of a constant, something like this
should work...

Dim Num As Long
Num = 7
Range("F1").FormulaArray = "=AVERAGE((R825001C3:R865000C3=" & CStr(Num) &
")*1)"

Rick
 
Y

yttrias

Very helpful. Thank you.
--
yttrias


Rick Rothstein (MVP - VB) said:
I presume your question is dealing with this line of code..


Because the text string is being assigned to the FormulaArray property of
the Range, it means that the text will be evaluated as a spreadsheet
formula, so it must meet the rules for those type of functions. The *1 is
multiplying the logical expression, which will be a TRUE or FALSE result, by
a number in order to turn the TRUE or FALSE result into a number (which the
AVERAGE function needs in order to do its calculation). Sometimes you see
this conversion handled by a double unary affixed to the beginning of the
logical expression (in essence, forcing the calculation by multiply by minus
one times minus one....

Range("F1").FormulaArray = "=AVERAGE(--(R825001C3:R865000C3=7))"

As for making the "7" a variable instead of a constant, something like this
should work...

Dim Num As Long
Num = 7
Range("F1").FormulaArray = "=AVERAGE((R825001C3:R865000C3=" & CStr(Num) &
")*1)"

Rick
 
Y

yttrias

Very helpful. Thank you.
--
yttrias


Jerry W. Lewis said:
(R825001C3:R865000C3=7)
is an array of logical values, which AVERAGE would ignore. Multiplying by
one coerces TRUE to 1 and FALSE to 0, so that you average an indicator
variable of whether 7 occurred in a cell or not.

This type of thing is probably discussed in some 'tips and tricks' books,
but I don't have access to any and so can't give you a recommendation.

Jerry

yttrias said:
Jerry,

Thanks again for your code suggestion. Although the array formula works
fine, I don't understand the syntax. How does the "*1" function? Can I
substitute a variable for the "7" delineator? (When I try to do that, I get
error.)

Do you know of any published reference to this, and other, helpful array
formula expressions that are (for me) beyond the ordinary?
--
yttrias


Jerry W. Lewis said:
The code was very helpful. I agree that the expcted value in F1 should be
1/6=0.666... and that 0.1587 is a sigificant departure from expectation (the
missing decimal points hindered my understanding).

I am surprised that CountSevens() worked at all, since i is declared an
integer, but takes values that are outside the valid range for an
integer--that may have been the problem.

There is a slight bias built into using either the INT() or TRUNC()
functions. For example
=((2-2^-52)-2)
is <0, so =INT(2-2^-52) should be 1, but MS, in trying to hide the binary
underpinnings, returns 2. Therefore there will be a slight reduction in the
number of 6's and a slight increase in the number of 1's in your pseudo
random integers, but these slight biases should not be statistically
detectable in an experiment of your size. If you are concerned, you can
avoid them by using an intermediate cell to hold the =RAND() values which you
would then transform with the slightly more complicated formula
=IF(((x*6)-INT(x*6))<0,INT(x*6)-1,INT(x*6))

Some recalculations of the random integers occur during the execution of
CountSevens(). It is not clear to me whether these might be occurring during
the execution of the For/Next loop; regardless, I don't see how it could
cause an undercount of 7's.

Here is an alternate VBA code, that simplifies and combines the work of both
OneMillionEvents() and CountSevens(). It also replaces the static value in
F1 with a dynamic formula that will update whenever the RAND() functions
recalculate. I tried about a dozen reps in Excel 2007, and got only one rep
that was statistically different from expectation.

Sub SimplerVersion()
With Range("A1:C1000000")
.ClearContents
.NumberFormat = "0"
End With
Range("A1:B1000000").FormulaR1C1 = "=1+INT(RAND()*6)"
Range("C1:C1000000").FormulaR1C1 = "=RC[-2]+RC[-1]"
Range("F1").FormulaArray = "=AVERAGE((R825001C3:R865000C3=7)*1)"
End Sub

Jerry

:


I am grateful to those of you who have responded so quickly to my OP. I
hope you will be able to point out the errors of my ways. The following is
the code sequence for the generation of two sets of 1-million integers
between 1 and 6, and their sum. Then, the subsequent code counts the number
of sevens in a sample of 40,000 sums (divided by 40,000) to give decimal
representation to be compared with ideal ratio of 1/6. I ran the latter code
20 times at arbitrary locations, and averaged those results.

Sub OneMillionEvents()
Range("A1:C1000000").Select
Selection.ClearContents
Selection.NumberFormat = "0"
Range("A1").Select
ActiveCell.FormulaR1C1 = "=1+int(RAND()*6)"
Range("B1").Select
ActiveCell.FormulaR1C1 = "=1+int(RAND()*6)"
Range("C1").Select
ActiveCell.FormulaR1C1 = "=RC[-2]+RC[-1]"
Range("A1:C1000000").Select
Selection.FillDown
Calculate
End Sub

Sub CountSevens()
Dim i, n As Integer
Range("A1").Select
ActiveCell(1, 5) = 40000
ActiveCell(1, 6) = 0
n = 0
For i = 825001 to 865000
If ActiveCell(i, 3) = 7 Then
n = n + 1
End If
Next i
ActiveCell(1, 6) = n / ActiveCell(1, 5)
End Sub

Regards,

Yttrias

--
yttrias


:

You are still providing too little information for us to understand what you
are doing.

There is no RAND() function in VBA. In VBA, you either used the VBA Rnd()
function, or the worksheet RAND() function via Evaluate("RAND()").
As noted in my earlier post, in 2003 the worksheet RAND() function was far
superior to the VBA Rnd(). I have seen no evidence that either function
changed between 2003 and 2007.

Your original post discussed the sum of two (presumably independent) random
integers between 1 and 6 (such as the total from rolling two fair dice). The
expected value of a single such sum would be 7. It is unclear how that
relates to either your observed average of 1587 or expected average of 1667
for an "average of 20 windows of 40,000 consecutive values.

It sounds as though your VBA would involve only a few lines of code. If you
paste that code into your reply, we can see exactly what you did.

Jerry

:

I ran 1 million samples using RAND in VBA in excel 07. Then I sampled 40,000
consecutive values at 20 different locations throughout the range of 1M. The
average of those 20 windows in the 1M was 1587, where the expected average
should be 1667.
--
yttrias


:

And what was your sample size?
--
Bernard V Liengme
Microsoft Excel MVP
www.stfx.ca/people/bliengme
remove caps from email

I'm testing random numbers for games using excel 07 generator, but there
is
too much bias away from statistical expectation. For example, when two
integer numbers are generated between 1 and 6, and subsequently added, the
results differ from the easily predictable numbers. Sevens are
consistently
in short supply by about 1.5%. That's a lot! Any suggestions for better
randomization in excel would be very much appreciated.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top