density dependent random draw without replacement

  • Thread starter Thread starter Niels
  • Start date Start date
N

Niels

Wow, this works indeed as I described. And thank you for introducing arrays
(-; The only thing is that I did not completely describe my problem as I had
in mind. I should have included density dependent. If I again refer to the
list below, I want to randomly draw 5 unique values but the chance of
drawing 4 should be 3 times as high as drawing 1 etc. Something like a
density dependent random draw of unique values. I hope you or anybody else
can help me out, best,

Niels
 
Yes, could if this hypothetical distribution was all I was using. In reality
I am dealing with over 100 000 records (still have to solve that one in
Excel). But, thanks anyway,

Niels
 
Hello Bernd,

I would love to be able to understand the scripts you are referring to, but
at this stage I don't unfortunately. Let me clarify the problem.

I have 1837 gridcells in environmental space. Each of these cells contain
plant collections. Some many, others only few. What I need is a random draw
from these cells, where cells with many collections having a proportionately
higher chance of being selected. However, each cell can only be represented
once. I use this to contruct a null distribution. Meaning I need 999x5
cells, 999x6 cells etc. up to 999x100 cells. I do now how to write a macro
to repeat the process 999 times, but making this density dependent random
draw without replacement not. If there is an easy solution would be great,

Regards,

Niels
 
Niels,

Here is a unique-value, density-dependent solution that will work for up to about 65000 samples -
more if you upgrade to XL2007.

Assume that your values are in column A. In column B, use the formula

=RAND()

and copy to match your values in column A.

In cell C2 (I assume that you have headers in row 1), array-enter the formula (enter using
Ctrl-Shift-Enter)

=IF(B2=MAX(IF($A$2:$A$XXXXX=A2,$B$2:$B$XXXXX)),B2,"")

(Replace the XXXXXs with the row number of your last formula in column B)

This formula will highlight the highest random number for the family of values in column A. (The
more members of the family, the higher the probability that one of those members will have a higher
random number, thus a better chance of being drawn...)

In Cell D2, enter the formula

=IF(C2<>"",RANK(C2,C:C),"")

Starting in cell E2, enter the number series 1,2,3....X down column E, where X is the number of
unique values you need.

Then in F2, enter the formula

=INDEX(A:A,MATCH(E2,D:D,FALSE))

and copy down to match your numbers in column E.

This will produce a random draw of values from column A, with those values appearing more often
having a better chance of being drawn.

You can experiment by recalculating and seeing which values are drawn - but of course, don't be
surprised if a low frequency value is drawn - that is randomness....

IF you need to work with more than 65000 or so values, use a randomized selection routine to select
65000 values from your larger population - certainly a valid technique, if not ideal.

HTH,
Bernie
MS Excel MVP
 
Hello Niels,

Would it be a solution to simplify the drawing process like this:
1. Put the number of plant collections for each gridcells into
A1:A1837. If for example gridcell number 17 has 47 plant collections
then enter 47 into A17.
2. Then use a modified redw version which uses an input range (and not
a vararray):
B1:
=redw(A1:A1837)
B2 (showing the gridcell):
=INT(1+1837*B1))
B3 (showing the plant number within the gridcell):
=INT(1+OFFSET(A1,B2-1,0)*RAND())

If this approach would help and you need further help to modify my UDF
tell me, pls.

Regards,
Bernd
 
Back
Top