Generate Correlated variable to an existing data set

H

hdf

I would like to generate a column of random variables (-1 to 1) that
are correlated to an existing column of values that are not -1 to 1.

I know how to generate two sets of random correlated variables, but I
can't seem to find a way to generate a column of random variables that
is correlated to an existing column of numbers as opposed to one that
was generated at the same time. It is probably something very simple
or impossible, but I'm no stats expert and thus need help in
identifying a function or formula that will allow me to do this.

To be clear, I have a column of values ranging from 100 to 175. I
want to generate a second column of random variables (-1 to 1) which
are correlated with this column, to whatever correlation coefficient I
choose.

I hope this makes sense.
 
J

joeu2004

I know how to generate two sets of random correlated
variables

What do you mean by that? Do you mean that you know how to generate
two random sets of data and compute their correlation? Or do you mean
that you know how to generate one random set of data and generate
another data set over a different range of numbers that is exactly
correlated (non-random)?

An example would help clarify your meaning.
To be clear, I have a column of values ranging from
100 to 175.  I want to generate a second column of random
variables (-1 to 1) which are correlated with this column,
to whatever correlation coefficient I choose.

Few things are "impossible" to do with computers. It depends on how
long you are willing to wait for a solution.

Off-hand, if I understand your requirements correctly, I believe this
is a very difficult task to accomplish in a reasonable finite amount
of time. The troubleshome requirement is "whatever correlation
coefficient I choose".

You could generate n-1 random values in the range [-1,1], then solve
for the n-th value that gives the required correlation coefficient.

Even that might be difficult due to the nature of the correlation
formula, if you use the same one as that found on the Excel CORREL
help page.

Moreover, there is no guarantee that the n-th value that meets the
correlation requirement will be in the range [-1,1].

Of course, you can always generate n random values in the range [-1,1]
repeatedly until you stumble upon a set that meets your criteria. But
such repetition can take a long time, and there is no guarantee of
"ever" finding a solution in your lifetime.

For additional ideas and broader participation, you might want to
repost your inquiry using the MS Answers Forums at
http://social.answers.microsoft.com/Forums/en-US/category/officeexcel.

It's not that I like that forum. It's just that MS has ceased to
support the Usenet newsgroups; see http://www.microsoft.com/communities/newsgroups/default.mspx.
Hence, participation here is limited to the sites that share a common
newsgroup mirror, which is no longer centralized at MS. Many of the
diehard contributors no longer pay attention to the Usenet newsgroups.
 

hdf

Joined
Apr 3, 2011
Messages
1
Reaction score
0
I have no idea how my question got posted to this forum, since I wasn't a member. I originally posted it in Google Groups. Anyway, thank you to whomever did it.

Joeu2004, thanks for the help.

First, I should correct a glaring mistake in my original question. I meant to ask for random numbers between 0 and 1, NOT -1 and 1.

As for the rest of my question, let me try to clarify.

When I said "I know how to generate two sets of random correlated variables" I probably should have been more precise. What I meant is that I use a VBA user defined function to do so.

There is a series of VBA defined functions called Simtools which include a function that allows you to generate sets of correlated random variables (between 0 and 1) that can then be used as substitutes for the rand() function in generating distribution curves. This particular function allows you to input the correlation you want and then generates pairs of correlated variables between 0 and 1.

However, I already have one set of data that is the output from a series of calculations. For sake of argument, let's say that I have calculated the distribution of students by age who wear blue socks. And that I know that there is a correlation of X between those who wear blue socks and those who wear red hats. I want to generate a series of random variables that are correlated (X) with the distribution of students wearing blue socks.

I hope this makes more sense.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top