KURT and SKEW functions

G

Guest

I have 5 independent datasets that I am analyzing with descriptive statistics
and Gaussian (normal) distributions to search for outliers. During the
process, I decided to see if there is a way to “test†a sample for
“normality†since I’m getting some graphs that do not represent normal
distributions. I want to know why. So, I found the functions KURT and SKEW,
but I need some advice about their significance. I do not understand the
values I am getting from these functions and the help files in Excel 2007 are
not very specific. (I wish they could expand more the description listed
under “Excel Home > Function reference > Statisticalâ€). This is what they say:

// KURT(number1,number2,...) Returns the kurtosis of a data set. Kurtosis
characterizes the relative peakedness or flatness of a distribution compared
with the normal distribution. Positive kurtosis indicates a relatively peaked
distribution. Negative kurtosis indicates a relatively flat distribution.
Number1, number2, ... are 1 to 255 arguments for which you want to
calculate kurtosis. You can also use a single array or a reference to an
array instead of arguments separated by commas. //

// SKEW(number1,number2,...) Returns the skewness of a distribution.
Skewness characterizes the degree of asymmetry of a distribution around its
mean. Positive skewness indicates a distribution with an asymmetric tail
extending toward more positive values. Negative skewness indicates a
distribution with an asymmetric tail extending toward more negative values.
Number1, number2 ... are 1 to 255 arguments for which you want to calculate
skewness. You can also use a single array or a reference to an array instead
of arguments separated by commas. //

For instance:

My datasets range from 75,00 entries to 13,845 entries. I am getting a
Kurtosis = 0.71808 and Skewness = -0.07325 for one dataset, and Kurtosis =
0.59906 and Skewness = 0.91872 for another one, but I don’t know what they
mean.
Can anybody help me to discuss/explain/expand these values?
 
G

Guest

Wikipedia has a pretty good discussion of kurtosis. See, especially, the
section "Kurtosis of well known distributions":
http://en.wikipedia.org/wiki/Kurtosis (standard disclaimers about relying on
Wikipedia apply). A normal distribution should have a kurtosis of 0.

A normal distribution should also have a skew of 0.

So, based on the results you give, none of your distributions are normal.
But they don't seem to be either extremely peaked or extremely skewed.
Positive numbers for SKEW indicate that the distribution is skewed to the
right and negative skew indicates that the distribution is skewed to the left.

Does this help give context to the numbers you're getting?

Dave
 
G

Guest

Thank you for your super-fast response Dave!

Now, before I posted this today, I did some research on the internet and I
did found the same document in Wikipedia. I wish it could be easier to
determine if there is a limit for this values, let's say -1 to +1, where 0 is
the ideal value for the normal distribution.

The problem is, what can I conclude when I see things like Kurtosis =
2.94948 and Skewness = 1.01299, Kurtosis = 3.72673 and Skewness = 1.81848,
etc... the purpose of this project is to test for normality as we move
forward identifying and removing outliers (the given values do have outliers
present). In theory, we should get a nearly close normal distribution as we
remove outliers after careful examination.

One more thing I am concerned of, is the Excel help file when it is limiting
the values to n=255. If I have samples much greater than that, are the
functions just taking the first 255 values??? that's a little bit confusing.

Finally, is there any other rigorous test (from the ones listed in
Wikipedia) built-in in Excel?
 
G

Guest

All good questions. If normal distribution should have a kurtosis and
skewness of 0, and your kurtosis and skewness numbers are not 0, or close to
it, then you don't have a normal distribution. The greater the absolute
value of your kurtosis and skewness numbers are, the more peaked/skewed your
distributions are.

As to Excel's limits for these two functions: I don't know the answer to
that. Maybe someone else on these discussion groups does. One thing to
consider is that there are software packages out there designed specifically
for statistical analysis, which may be more accurate/powerful than Excel.

Dave
 
G

Guest

Biometrika Tables for Statisticians, vol 1, Tables 34B and 34C give critical
values for testing normality with the sample skewness and sample kurtosis.
The largest sample size in the table is 2000, where |Kurtosis|>0.3 represents
a statistically significant departure from normality.

However, with sample sizes in your range, you have tremendous power to
detect minor departures of little or no practical meaning. Unless there is a
useful alternative model for the analysis, I would tend to look a probability
plot
http://www.usfca.edu/~middleton/demand.pdf
instead of a hypothesis test to assess normality.

Jerry
 
G

Guest

Thank you all for your help and references. I am starting to think if I am
using the correct approach, but before I explain this further, I have to say
that a professor from the University of Hawai at Manoa has an in-depth
explanation about Excel's Skewness and Kurtosis funcions in this website:
http://jalt.org/test/bro_1.htm
His article brings more significanse to the values we get from Excel.

Now, what am I trying to do in this project is: I have two different sets of
elevations represented by a series of points (13814 total). One set came from
ground measurements, and another set came from aerial measurements.

The datasets I am using to analyse the data are simply the absolute value of
the differences between this two sets of elevations.

I then subdivide them in four different groups to get descriptive statistics
plus the Skewness and Kurtosis statistics. The groups are: points falling on
hard surfaces (3026 total), points falling on short grass (7645 total),
points falling on tall grass (1583 total) and points falling on forested
areas and canopies (1560 total). I would expect smaller differences in the
first two groups, and greater differences in the last two groups, but that's
not always the case because we have some outliers. I am trying to compare the
distributions for each group with a normal distribution using the KURT and
SKEW functions to come out with a firm and solid explanation. But now I am
not sure if I am using the correct approach to do this.
 
G

Guest

I would tend to expect the differences (instead of the absolute differences)
to be normally distributed. If the differences are normally distributed,
then the absolute differences whould tend to have a skewness around 0.995,
and a kurtosis around 0.869.

Jerry
 
G

Guest

If X follows the Normal(0,1) distribution, then the population parameters are
Skewness(X)=0
Excess Kurtosis(X)=0
Skewness(|X|)=0.995...
Excess Kurtosis(|X|)=0.869...

Jerry
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top