finding the location of a group of cells

J

jimbo

I have 50,000 rows of 12 values in each column. I am looking for low
outliers, so I use MIN across each row and get the lowest value in each
group of 12 values. From there I calculate the mean, then the SD, and
get a Z score for each row.

My problem is that if cell d4 is the MIN of the 12 values in that row,
I need to know what the values were in C4 and E4, because if C4 is
approximately equal to E4, and the Z score is high, then D4 is an
outlier. In a clinical setting though, it's important that the
previous and subsequent values are approximately equal.

Right now I do this manually. I find the MIN in the group of 12 data
points and copy the three values into seperate rows so That I can apply
formulas to the data. But I know that anytime I have to do that much
manual minipulation of data, I'm doing something wrong.

Any assistance would be appreciated.
 
N

Niek Otten

Use

=MATCH(MIN(A1:A50000),A1:A50000,0) to locate the row where the MIN is. You can then use INDEX to retrieve cells from that row.
The MATCH() may be time-consuming, so do it only once, store the result in a cell and use that cell in several INDEX() formulas;
INDEX() is very fast.
--
Kind regards,

Niek Otten
Microsoft MVP - Excel


|I have 50,000 rows of 12 values in each column. I am looking for low
| outliers, so I use MIN across each row and get the lowest value in each
| group of 12 values. From there I calculate the mean, then the SD, and
| get a Z score for each row.
|
| My problem is that if cell d4 is the MIN of the 12 values in that row,
| I need to know what the values were in C4 and E4, because if C4 is
| approximately equal to E4, and the Z score is high, then D4 is an
| outlier. In a clinical setting though, it's important that the
| previous and subsequent values are approximately equal.
|
| Right now I do this manually. I find the MIN in the group of 12 data
| points and copy the three values into seperate rows so That I can apply
| formulas to the data. But I know that anytime I have to do that much
| manual minipulation of data, I'm doing something wrong.
|
| Any assistance would be appreciated.
|
 
B

Bob Phillips

Assuming that the MIN value is in column M, use

=IF(MATCH($M2,$A2:$L2,0)=1,"None
left",INDEX($A2:$L2,MATCH($M2,$A2:$L2,0)-1))

and

=IF(MATCH($M2,$A2:$L2,0)=12,"None
right",INDEX($A2:$L2,MATCH($M2,$A2:$L2,0)+1))

--
---
HTH

Bob

(change the xxxx to gmail if mailing direct)
 
J

jimbo

As an example, here are seven patients. On different months their
phosphorus dropped, some significantly, some less so. With some
patients, e.g. p1 and p2, the drop was significant, and the subsequent
month, the phosphorus value went back to its previous level. In one
patient, p3, the drop wqas sustained.

jul sep oct nov dec
p1 5.9 6.0 6.1 1.5 6.2
p2 6.1 0.9 6.9 6.8 6.9
p3 4.1 3.9 4.1 2.8 3.2
p4 4.9 4.9 4.0 3.7 4.3
p5 3.9 3.9 2.7 3.9 3.8
p7 5.6 3.6 5.6 5.6 5.0
p8 5.3 6.2 3.5 5.6 5.9

So, I try and run some manner of statistics on the data looking for 1)
the mean of the five months, MIN, SD, and calculate the Z score
((mean-MIN)/SD), and the difference between the previous and subsequent
value.

mean min sd z delta delta%
p1 5.1 1.5 2.0 1.8 0.10 2%
p2 5.5 0.9 2.6 1.8 0.80 13%
p3 3.6 2.8 0.6 1.4 -0.90 -28%
p4 4.4 3.7 0.5 1.2 0.30 8%
p5 3.6 2.7 0.5 1.8 0.00 0%
p7 5.1 3.6 0.9 1.7 0.00 0%
p8 5.3 3.5 1.1 1.7 -0.60 -10%

The problem is that I have 12 months of cumulative data on some 50,000
patients, and each patient's MIN occurs anywhere in that 12 month range
(I could only show five months because of the limitation of the email).
What I currently try and do is to sort each month and look for the
lowest values for each month. But this is extremly time consuming and
anytime something takes that long in Excel, I know that there is a
differrent way to approach the data.

So, I am looking for some formula that I could add to the above,
indicating the MIN and the previous and subsequent values to the MIN
value.

Thank yiou for your response. I could not get your suggestion to work,
but assume that it is because of my limited understanding of Excel, or
it was inadequate explanation of my problem.

Thanks for your time.
 
J

jimbo

As an example, here are seven patients. On different months their
phosphorus dropped, some significantly, some less so. With some
patients, e.g. p1 and p2, the drop was significant, and the subsequent
month, the phosphorus value went back to its previous level. In one
patient, p3, the drop wqas sustained.

jul sep oct nov dec
p1 5.9 6.0 6.1 1.5 6.2
p2 6.1 0.9 6.9 6.8 6.9
p3 4.1 3.9 4.1 2.8 3.2
p4 4.9 4.9 4.0 3.7 4.3
p5 3.9 3.9 2.7 3.9 3.8
p7 5.6 3.6 5.6 5.6 5.0
p8 5.3 6.2 3.5 5.6 5.9

So, I try and run some manner of statistics on the data looking for 1)
the mean of the five months, MIN, SD, and calculate the Z score
((mean-MIN)/SD), and the difference between the previous and subsequent
value.

mean min sd z delta delta%
p1 5.1 1.5 2.0 1.8 0.10 2%
p2 5.5 0.9 2.6 1.8 0.80 13%
p3 3.6 2.8 0.6 1.4 -0.90 -28%
p4 4.4 3.7 0.5 1.2 0.30 8%
p5 3.6 2.7 0.5 1.8 0.00 0%
p7 5.1 3.6 0.9 1.7 0.00 0%
p8 5.3 3.5 1.1 1.7 -0.60 -10%

The problem is that I have 12 months of cumulative data on some 50,000
patients, and each patient's MIN occurs anywhere in that 12 month range
(I could only show five months because of the limitation of the email).
What I currently try and do is to sort each month and look for the
lowest values for each month. But this is extremly time consuming and
anytime something takes that long in Excel, I know that there is a
differrent way to approach the data.

So, I am looking for some formula that I could add to the above,
indicating the MIN and the previous and subsequent values to the MIN
value.

Thank yiou for your response. I could not get your suggestion to work,
but assume that it is because of my limited understanding of Excel, or
it was inadequate explanation of my problem.

Thanks for your time.
 
J

jimbo

As an example, here are seven patients. On different months their
phosphorus dropped, some significantly, some less so. With some
patients, e.g. p1 and p2, the drop was significant, and the subsequent
month, the phosphorus value went back to its previous level. In one
patient, p3, the drop wqas sustained.

jul sep oct nov dec
p1 5.9 6.0 6.1 1.5 6.2
p2 6.1 0.9 6.9 6.8 6.9
p3 4.1 3.9 4.1 2.8 3.2
p4 4.9 4.9 4.0 3.7 4.3
p5 3.9 3.9 2.7 3.9 3.8
p7 5.6 3.6 5.6 5.6 5.0
p8 5.3 6.2 3.5 5.6 5.9

So, I try and run some manner of statistics on the data looking for 1)
the mean of the five months, MIN, SD, and calculate the Z score
((mean-MIN)/SD), and the difference between the previous and subsequent
value.

mean min sd z delta delta%
p1 5.1 1.5 2.0 1.8 0.10 2%
p2 5.5 0.9 2.6 1.8 0.80 13%
p3 3.6 2.8 0.6 1.4 -0.90 -28%
p4 4.4 3.7 0.5 1.2 0.30 8%
p5 3.6 2.7 0.5 1.8 0.00 0%
p7 5.1 3.6 0.9 1.7 0.00 0%
p8 5.3 3.5 1.1 1.7 -0.60 -10%

The problem is that I have 12 months of cumulative data on some 50,000
patients, and each patient's MIN occurs anywhere in that 12 month range
(I could only show five months because of the limitation of the email).
What I currently try and do is to sort each month and look for the
lowest values for each month. But this is extremly time consuming and
anytime something takes that long in Excel, I know that there is a
differrent way to approach the data.

So, I am looking for some formula that I could add to the above,
indicating the MIN and the previous and subsequent values to the MIN
value.

Thank yiou for your response. I could not get your suggestion to work,
but assume that it is because of my limited understanding of Excel, or
it was inadequate explanation of my problem.

Thanks for your time.
 
J

jimbo

That worked perfectly. Thanks. If I could, one more question: I have
12 monthly results in a row, and I now can identrify the lowest result
and the results before and subsequent to that lowest result. What I'd
like to do is to calculate the SD of the 11 results, not including the
lowest value. The notion is that if the lowest value is an artifact,
then the SD would be falsly elevated if it was included in the SD
calculation. Alternatively, if the lowest value is not an artifact,
then exclusion of the value will not change the SD in a significant
manner. An accurate SD will enable me to use the Z score to determine
if the lowest value is an outlier or not.

So, what I would like to do is to calculate the Standard Deviation of
an array to results, excluding the lowest value, regardless where it
resides in the group of tests.

Thanks for your help
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top