Re: a novice asks about ANOVA



I'm sure that there will be people who HATE these explanations, but
I'll try to remain true to my statistical roots.

> What role does P play?
Provided that the assumptions are met in the ANOVA model, decreasing
p-value implies a lower "likelihood" of seeing this data by chance.
The critical value is usually selected with a set level of "likelihood"
(e.g. 5%) being the point where the "by chance" part of things is too
small and you think that there is an underlying model to the data.

The F-statistic, crudely put, is a ratio of the (amount of error
explained by the model) and (the amount of error not explained by the
model). The ratio also involves some adjustment for model complexity.
Values significantly larger than 1 imply that you are "better off" with
the model than without. Simply "better off" is not good enough. You
need to be significantly "better off", so we look for values
significantly better than 1 in the F-statistic.

> ...Does it indicate some kinds of weakness in ANOVA analysis?
The one-way ANOVA is a somewhat crude model. Like most crude models,
it makes some basic assumptions {in order of importance}:
(1) the error terms within or between groups are not correlated - e.g.
if each observation within a group is the height of the same tree at 5
different timepoints, then you can expect some correlation between the
measurement at time 1 and time 2
(2) the error terms have the same variance from group to group - For
example, if I have 3 groups that have the following 5 observations
Group 1 {199,200,201,200,200}
Group 2 {190,191,192,191,191}
Group 3 {100,200,300,200,200}
There probably is a difference between Group 1 and Group 2. The
problem is that the F test will not detect it (F(2,12)=0.081,p=0.9227)
because of one of the key assumptions of the test => all of the groups
have roughly the same variance.
(3) those error terms are, in fact, normally distributed - The test
makes an assumption that the differences from the mean are normally
distributed which implies that you don't have a lot of extreme values.
Provided that you get sufficient amounts of data and that there isn't a
plethera of VERY extreme values, the test still works reasonably well.
There are some strange situations where ANOVA will not work at all.

Of course, as apparently you can tell by playing with the data that you
plug into the work***, the test can be sensitive to some modest
changes in the data. The key to properly applying any statistical
model is to understand where it performs as it should and where it
doesn't. "The right tool for the right job"

Jason Clark
Senior Biometrician, Merck Research Labs

.