RAND and RANDBETWEEN Workaround........An A Comment

D

David Byrne

Hi All

I know the Hot Fix is out, but this is a Provable Workaround

The following is based on a VERY large battery of tests using up to
130,000 cells at a time.

1. The percentage of negative numbers generated by Rand() varies
from less than 1 to over 90.

2. This is a bimodal distribution, with peak frequencies of around
16% and 84%.

3. Across around a hundred samples of 100,000 cells, the effect of
adding 1 where the number generated is negative, produces a
distribution indistinguishable from ones where all values are
positive.

4. In the "16%" and "84%" examples, "creative" plotting of the
frequencies at 0.1 intervals from -1 to + 1 produces a graph not
unlike a Normal Distribution. From these graphs, and the similar ones
for other percentages, the result (3) above is almost intuitive.

5. RANDBETWEEN(a,b) produces distributions of a basically similar
form. That is, for a given percentage of results outside the expected
range, there is a predictable set of "out of range" values, and
predictable frequencies for all values within and outside the range.

6. Once again, a (simple) addition to any values outside the
expected range, produces a distribution of values indistinguishable
from one where all values produced are within the expected range.

(email me direct for copies of RAND() results)

Some personal observations

Having formerly been a frequent contributor to these forums, I was
somewhat saddened to see the change in tone that seems to have crept
in more recently.

One regular correspondent appears renowned for public fault-picking.

Just this once I wish to do the same in an attempt to redress the
issue, and will not debate the matter further publicly.

I find this sort of behaviour irritating, distracting, inappropriate,
unprofessional, and demeaning to all concerned.

Those who give freely of their own time deserve better than this.

Perhaps more importantly, on the RAND() issue, we have recently seen a
sweeping "proof" that if the use of ABS(RAND()) produces a skewed
distribution, so "MUST" the use of x+1. This was presumably based on
some ill-founded assumption.

I find it incomprehensible that someone of that author's background,
experience, qualifications and personal beliefs could make such a
statement in a global forum without checking the facts.

For this "foul" alone I would like to wish that contributor well on
his (hopefully) self imposed "time out", and trust that if/when he
returns it will be with a new spirit of goodwill and co-operation. I
feel sure that in the next month or two, problems will continue to be
answered in a timely manner.



"Discovery is seeing what everyone else has seen
And thinking what no-one else has thought"



David Byrne
Melbourne Australia
 
H

Harlan Grove

...
...
4. In the "16%" and "84%" examples, "creative" plotting of the
frequencies at 0.1 intervals from -1 to + 1 produces a graph not
unlike a Normal Distribution. From these graphs, and the similar ones
for other percentages, the result (3) above is almost intuitive.

By your own statement, the samples are bimodal. Normal distributions are
unimodal. How do you define 'creative' or 'unlike'?
5. RANDBETWEEN(a,b) produces distributions of a basically similar
form. That is, for a given percentage of results outside the expected
range, there is a predictable set of "out of range" values, and
predictable frequencies for all values within and outside the range.

Not unexpected if RANDBETWEEN is implemented as INT(a+(b-a+1)*RAND()).
6. Once again, a (simple) addition to any values outside the
expected range, produces a distribution of values indistinguishable
from one where all values produced are within the expected range.

But does that mean this resulting distribution is symmetric or (stronger)
uniform? You haven't answered that yet.
(email me direct for copies of RAND() results)

You have my e-mail address at the top of this posting. Want to send your results
to me?

...
Just this once I wish to do the same in an attempt to redress the
issue, and will not debate the matter further publicly.

So you want to cast barbs but not suffer them yourself? Quite brave!
Perhaps more importantly, on the RAND() issue, we have recently seen a
sweeping "proof" that if the use of ABS(RAND()) produces a skewed
distribution, so "MUST" the use of x+1. This was presumably based on
some ill-founded assumption.
...

You're right. I was too loose in specifying what I has assumed was skewed.

It's possible that if a skewed sampling of deviates from a symmetric population
on (0,1) were transformed to negative deviates via either -RAND() or RAND()-1,
then either ABS(RAND()) or RAND()+1 applied to only the negative deviates could
reproduce an overall population of symmetric positive deviates. Note that in
this case ABS(RAND()) is just -RAND(). Then both of these transformations are
linear transformations of the negative deviates. FWLIW, linear transformations
of skewed random variables produce skewed random variables (presumably you
recall the proof of this from your introductory probability textbook). If the
initial subpopulation of *positive* deviates were symmetrically distributed on
its own (I don't know if it is or isn't), then adding to it a linearly
transformed skewed subpopulation produces a resulting skewed overall population.
Do you need a formal proof of that?

The only way adding a linearly transformed skewed subpopulation to another
subpopulation produces a symmetric distribution is when the former is either a
mirror image or just a shifted image of a 'hole' in the latter that would render
it symmetric. Perhaps this is the case. I dunno. But it's a necessary condition.
Also, FWIW, there are bit manipulations being made throughout Excel, and it's
easier to imagine such a manipulation erroneously flipping the sign bit of a
register than it is to imagine an unnecessary subtraction of 1, especially given
others' reports of this bug in which it takes many thousands of RAND() calls to
generate the first negative deviate. If there were a -1 somewhere in the code,
negative deviates should have shown up earlier.
For this "foul" alone I would like to wish that contributor well on
his (hopefully) self imposed "time out", and trust that if/when he
returns it will be with a new spirit of goodwill and co-operation. I
feel sure that in the next month or two, problems will continue to be
answered in a timely manner.

Wouldn't you be happier if it was you taking the time out?
 
E

Earl Takasaki

Hi!

I am one of the people who discovered the "bug" and the person who
suggested the workaround of adding 1 to Woody's website:

COMMENTS:

1) Someone suggested what I believe to be a mathematically equivalent,
but implementationally superior, alternative to adding 1 to negative
RAND() results. It is the following:

=MOD(RAND(),1)

This takes away the requirement of using a scratch column. I was
surprised that the MOD function worked this way with negative floating
point numbers.


2) I got a lot of email from "very smart" people (mostly math and stats
professors) who told me that there was NO WAY my "naive" workaround
could work. ONe claimed that the fix did not pass the "independence"
test at the 99.99999% confidence level. I asked each of them 4 questions:

1. did you analyze the data? (if you did, it would suggest the fix)
2. did you implement the fix and analyze the results?
3. did you perform statistical (uniformity, correlation, independence)
tests on the "fixed" results?
4. did you compare the results of the test on the original RAND()
function to determine whether the fix was "at least" as good as the
original?

Not one person could reply yes to all 4 questions.


3) To Microsoft: While I know that I was one of the people to origianlly
report this bug, I did not find out about the fix until someone else
told me. Even so, it is not easy to figure out what I need to do in
order to get the fix. Seems I need to call or email you, but I need to
have a PASSPORT account and be registered, and then go through a series
of menu choices none of which lead to a simply phone number or email
address. I ended up at a site that implied that I would need to use up
my ONE precious free call just to get this fix. That is bulls---.

Why can't you simply put this fix on the autoupdate site for Office?
Why can't you send this fix to all registered users rather than put it
out there for us to find on our own? And once we find it, why must we go
through so many hoops just to download it? (I still can't figure out
what to do and I am supposedly smart).
Why wasn't I, as a person who specifically reported this bug, given
notice of acknowledgement of the problem, then the fix?
Why is it so hard to report a bug? Why can't you just have a menu item
under "help" that says "report a bug/problem" that would send an email
directly to a supprot group. The adresss could be stored in the registry
and updated as needed.
Why do you have to be registered and have a passport account? What if I
am a student using an academic copy, or using a corporate version? Even
if I was a crook, it would still be in your best interest to accept bug
reports.

Earl Takasaki
EXCELent Consulting
www.excelentconsulting.com
 
D

David Byrne

Harlan wrote


"The only way adding a linearly transformed skewed subpopulation to
another
subpopulation produces a symmetric distribution is when the former is
either a
mirror image or just a shifted image of a 'hole' in the latter that
would render it symmetric. Perhaps this is the case. I dunno. But it's
a necessary condition."


OK Harlan,

You've had my data for a few days now.

Why don't you just say it??


"Earl and David are right after all. Adding one to the negative values
actually produces a symmetric distribution"


David
 
H

Harlan Grove

David Byrne said:
Harlan wrote ....
mirror image or just a shifted image of a 'hole' in the latter that
would render it symmetric. Perhaps this is the case. I dunno. But it's
a necessary condition."

This is still the case. If the data you provided shows this, fine.
You've had my data for a few days now.

I received your data late Friday night (yesterday!). You may have nothing
better to do on your weekends than check the results from RAND(), but I
usually do. As they say over here, "Get a life!"
"Earl and David are right after all. Adding one to the negative values
actually produces a symmetric distribution"

Congratulations. This is the first time YOU've said the result is symmetric.
 
H

Harlan Grove

Harlan Grove said:
I received your data late Friday night (yesterday!). You may have nothing
better to do on your weekends than check the results from RAND(), but I
usually do. As they say over here, "Get a life!"

I finally looked at your data. You sent me two files. One contained just two
date/time stamps in col A and percentages in col B. Presumably the col B
percentages were relative frequencies of negative values in 65,536 separate
runs of 100,000 RAND() calls each.

The other appears to be one (and only one) of those separate runs. A single
such sampling isn't very useful on its own. Be that as it may, the skewness
of the 100K values was reduced to nearly zero by adding 1 to negatives. This
shows that for this particular run, adding one to negative values makes the
resulting distribution of deviates symmetric and close to uniform, but it
doesn't prove this would always be the case.

I've added some additional calculations and a chart to the second workbook,
and it's available in a Zip file on my AOL ftp space so anyone interested
can download it (be warned, even Zipped with max compression, it's 1.16MB).
I added 4 formulas to calculate initial skewness for all, positive only, and
negative only values as well as skewness of the adjusted (+1 for <0) values.
Seeing what those are over many RAND() samples would be nice.

Anyway, the Zip file's url:

ftp://members.aol.com/hrlngrv/xl2k3_rand_bug.zip
 
D

David Byrne

Hi Harlan


Trust me on this one.

How many more sets would you like me to send??

BTW, my records show that I sent 3 sets. Two of those should be
100,000 cell samples: one with a very low % -ve, the other around 84%,
and the third the results from 65536 consective tests.

As you note, 100,000 cells even when zipped is over a 1Mb.

Probably easier if send you a CD, unless someone nearer to you can do
so.

Please advise. I have hundreds. One of the neat features of 2003 is
its capacity to handle larger files. Automated capture/save of sets
containing negative results gave me files pushing 200Mb, and still
recalculating freely.

Just set the range of percentage negative values you would like to
see.

As indicated and you will have seen from the data, peak frequencies
around 16 & 84.

Can set to capture as you decide.

David
 
H

Harlan Grove

David Byrne said:
BTW, my records show that I sent 3 sets. . . .

Possible, but I received only two e-mails with attachments from you.
. . . Two of those should be
100,000 cell samples: one with a very low % -ve, the other around 84%,
and the third the results from 65536 consective tests.

OK, I never received the second of these.

As for other data, the SKEW functions I mentioned in my previous response
would be more meaningful than the percentage of negative values from each
run.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top