Diagonal line in scatter plot?

P

Peter Frank

Hi,

I have two columns of data that I want to plot against each other as a
scatter plot, e.g. column 1 is the x value and column 2 is the y
value. The values next to each are corresponding values. This is easy
enough to do.

But the point of this scatter plot is to show which corresponding
values differ a lot from each other, how many don't and how many do.
If the two columns were identical (x=y), I would get a perfect
diagonal line of 45 ° relative to the origin. However, since the two
columns are not identical, I will not get such a line but - with my
type of data - I will get a cloud of dots scattered around this
hypothetical line.
In order to be able to easily see the deviation from the x=y line, I
would like to have such a line drawn into my scatter plot in addition
to the dots (data points).

How can I do this in Excel?

Peter
 
R

raycyn.wright

Do your y values represent a dependant variable, i.e., you are interested in
the value of y as a function of x? If so, and if I understand you
correctly, you want to see the deviation of any given value from the best
fit line of the x and y.

You imply that the expected best fit line with a slope of 1 (45o angle). If
so, the equation of the best fit straight line is y = a + b*x, where "b" is
the slope (in this case = 1) and "a" represents the value of y when x = 0
(this is often called the slope of the line). If this is the case then the
deviation of any given y value from the expected is calculated as:

Deviation = observed y value - (a+x). Some of these deviations will be
positive numbers and others negative.

If, on the other hand, you do not have an expectation that the "true"
relationship between x and y is a slope of 1 (45o angle), then you can use
XL functions to calculate the slope and intercept of the relationship.

=SLOPE (known y values, known x values)

=INTERCEPT(known y values, known x values).

And the deviation in y values would be: observed y value - (a+(b*x)).

If you wish to visualize the variability, you could make a chart of these
deviations.



It might help if you could describe that your x and y value represent and
what you are trying to determine.
 
P

Peter Frank

raycyn.wright said:
Do your y values represent a dependant variable, i.e., you are interested in
the value of y as a function of x?

No. There is no functional dependency between the two columns of
values.
If so, and if I understand you correctly, you want to see the deviation of
any given value from the best fit line of the x and y.

Yes, I want to see the deviations and I am interested in their size
and their frequency.
You imply that the expected best fit line with a slope of 1 (45o angle).

The line with a slope of 1 is not really the expected result, if it
was the experiment would have been a failure. It is rather meant for
the purpose of orientation.
It might help if you could describe that your x and y value represent and
what you are trying to determine.

OK, I hope this makes it a bit clearer what I am trying to achieve
with this plot. The x and y values are fluorescence intensities of
gene spots on a microarray. x is channel 1 (fluorophore Cy3) and y is
channel 2 (fluorophore Cy5). The competitive hybridization of two
fluorescently labelled cDNA samples gives me these two fluorescence
intensities, and the bigger the difference in the quantity of a
certain cDNA, the bigger the difference in x and y. Only if I used the
same cDNA sample labelled with the two fluorphores above or if there
is absolutely no difference between cDNA sample 1 and cDNA sample 2,
which I don't expect (or which I don't hope), then I would get the x=y
line (i.e. the line with the slope of 1).
I have many thousands of values and this plot is supposed to show me
*visually* how many of them differ from x=y and how much they differ
from x=y.

Peter
 
R

raycyn.wright

Thanks for the additional information, Peter.

In ordinary linear regression, which is what I was describing in my post,
the goal is to minimize the sum of the squared vertical distances between
the y data values and the corresponding y values on the fitted line. In your
case, the appropriate approach is orthogonal regression in which the goal is
to minimize the orthogonal (perpendicular) distances from the data points to
the fitted line. Unfortunately, Excel does not have this functionality
built in. Orthogonal regression (sometimes called reduced major axis
regression) is not as commonly used and the calculation methodology is
rarely included in statistics texts. A reasonably good explanation of
orthogonal regression is given at

http://www.jmp.com/news/jmpercable/06_summer1998/regression.html.

There are statistical packages that calculate orthogonal regressions, but
none are free that I can find. If you have access to SAS or SPSS, you can
run your data there. One package that I have tested is NLREG. You can
download a 30 day free trial at

http://www.nlreg.com/DownloadDemo.htm

If you use this program for your calculations, the syntax you would use for
orthogonal regression is:

Variables X0, Y0;

Parameters m, b;

Double Xi, Yi, distance;

Xi = (X0 + m*Y0 - m*b) / (m^2 + 1);

Yi = m*Xi + b;

distance = sqrt((X0-Xi)^2 + (Y0-Yi)^2);

Function distance;

Data;

1 4

2 6

3 8

2.5 7.5

5 12

4 10

5 6



Where X0 and Y0 are your fluorescence intensities for channel 1 and 2
respectively, and below the line Data you would add your data points in
pairs (as in my example above). I think NLREG has a way to import the data
(read the tutorial or consult help).



Good like - sounds like an intriguing problem.
 
J

Jon Peltier

Peter -

If you just want to see the difference visually, use column C to store the
difference between columns B and A. Plot column C against A, and any deviation from
A=B shows up as a positive or negative position of the point away from the Y=0 axis.
You could even run descriptive statistics on the values in column C.

- Jon
-------
Jon Peltier, Microsoft Excel MVP
Peltier Technical Services
Tutorials and Custom Solutions
http://PeltierTech.com/
_______
 
P

Peter Frank

I said:
I have two columns of data that I want to plot against each other as a
scatter plot, e.g. column 1 is the x value and column 2 is the y
value. The values next to each are corresponding values. This is easy
enough to do.

But the point of this scatter plot is to show which corresponding
values differ a lot from each other, how many don't and how many do.
If the two columns were identical (x=y), I would get a perfect
diagonal line of 45 ° relative to the origin. However, since the two
columns are not identical, I will not get such a line but - with my
type of data - I will get a cloud of dots scattered around this
hypothetical line.
In order to be able to easily see the deviation from the x=y line, I
would like to have such a line drawn into my scatter plot in addition
to the dots (data points).

How can I do this in Excel?

Thanks, raycyn and Jon, for your help. I am definitely going to do
further statistical analyses like the ones you described (side note: I
have access to SPSS) but since I still wanted that type of plot, I
finally came up with a pretty simple although not perfectly elegant
solution.

I will just create two additional columns with some artificial (i.e.
not experimental) x and y values ranging from the minimum to the
maximum of the experimental x and y values. These values will always
fullfill the condition x = y so that they form a perfect x = y line
(with a slope of 1). I will format this XY plot to be a line without
data points showing, whereas the actual experimental data will be
formatted as an actual scatter plot in the same diagram. That way I
will get my diagonal line "surrounded" by the scatter plot.

Peter
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top