# Generate Correlated variable to an existing data set

hdf
Guest
Posts: n/a

 31st Mar 2011
I would like to generate a column of random variables (-1 to 1) that
are correlated to an existing column of values that are not -1 to 1.

I know how to generate two sets of random correlated variables, but I
can't seem to find a way to generate a column of random variables that
is correlated to an existing column of numbers as opposed to one that
was generated at the same time. It is probably something very simple
or impossible, but I'm no stats expert and thus need help in
identifying a function or formula that will allow me to do this.

To be clear, I have a column of values ranging from 100 to 175. I
want to generate a second column of random variables (-1 to 1) which
are correlated with this column, to whatever correlation coefficient I
choose.

I hope this makes sense.

joeu2004
Guest
Posts: n/a

 1st Apr 2011
On Mar 31, 5:29*am, hdf <(E-Mail Removed)> wrote:
> I know how to generate two sets of random correlated
> variables

What do you mean by that? Do you mean that you know how to generate
two random sets of data and compute their correlation? Or do you mean
that you know how to generate one random set of data and generate
another data set over a different range of numbers that is exactly
correlated (non-random)?

An example would help clarify your meaning.

You wrote:
> To be clear, I have a column of values ranging from
> 100 to 175. *I want to generate a second column of random
> variables (-1 to 1) which are correlated with this column,
> to whatever correlation coefficient I choose.

Few things are "impossible" to do with computers. It depends on how
long you are willing to wait for a solution.

Off-hand, if I understand your requirements correctly, I believe this
is a very difficult task to accomplish in a reasonable finite amount
of time. The troubleshome requirement is "whatever correlation
coefficient I choose".

You could generate n-1 random values in the range [-1,1], then solve
for the n-th value that gives the required correlation coefficient.

Even that might be difficult due to the nature of the correlation
formula, if you use the same one as that found on the Excel CORREL
help page.

Moreover, there is no guarantee that the n-th value that meets the
correlation requirement will be in the range [-1,1].

Of course, you can always generate n random values in the range [-1,1]
repeatedly until you stumble upon a set that meets your criteria. But
such repetition can take a long time, and there is no guarantee of
"ever" finding a solution in your lifetime.

For additional ideas and broader participation, you might want to
repost your inquiry using the MS Answers Forums at

It's not that I like that forum. It's just that MS has ceased to
support the Usenet newsgroups; see http://www.microsoft.com/communities...s/default.mspx.
Hence, participation here is limited to the sites that share a common
newsgroup mirror, which is no longer centralized at MS. Many of the
diehard contributors no longer pay attention to the Usenet newsgroups.

New Member
Join Date: Apr 2011
Posts: 1

 3rd Apr 2011
I have no idea how my question got posted to this forum, since I wasn't a member. I originally posted it in Google Groups. Anyway, thank you to whomever did it.

Joeu2004, thanks for the help.

First, I should correct a glaring mistake in my original question. I meant to ask for random numbers between 0 and 1, NOT -1 and 1.

As for the rest of my question, let me try to clarify.

When I said "I know how to generate two sets of random correlated variables" I probably should have been more precise. What I meant is that I use a VBA user defined function to do so.

There is a series of VBA defined functions called Simtools which include a function that allows you to generate sets of correlated random variables (between 0 and 1) that can then be used as substitutes for the rand() function in generating distribution curves. This particular function allows you to input the correlation you want and then generates pairs of correlated variables between 0 and 1.

However, I already have one set of data that is the output from a series of calculations. For sake of argument, let's say that I have calculated the distribution of students by age who wear blue socks. And that I know that there is a correlation of X between those who wear blue socks and those who wear red hats. I want to generate a series of random variables that are correlated (X) with the distribution of students wearing blue socks.

I hope this makes more sense.

 Thread Tools Rate This Thread Rate This Thread: 5 : Excellent 4 : Good 3 : Average 2 : Bad 1 : Terrible

 Posting Rules You may not post new threads You may not post replies You may not post attachments You may not edit your posts BB code is On Smilies are On [IMG] code is On HTML code is OffTrackbacks are On Pingbacks are On Refbacks are Off Forum Rules

 Similar Threads Thread Thread Starter Forum Replies Last Post =?Utf-8?B?RGFu?= Microsoft Excel Worksheet Functions 1 9th Nov 2005 12:21 AM Randy Microsoft Excel Misc 2 16th Jan 2005 10:50 PM =?Utf-8?B?dmlzaHM=?= Microsoft Excel Worksheet Functions 1 20th Nov 2004 01:24 AM Masahiro Ito Microsoft ADO .NET 1 18th Jul 2004 09:15 PM Paul Goldwater Microsoft Excel Programming 2 1st Sep 2003 05:51 PM

Features