Re: Reef Fish Statistics for Dummies: Testing Correlations
- From: Russell.Martin@xxxxxxx
- Date: 5 Oct 2006 06:37:54 -0700
On Oct 5, 4:02 am, "m00es" <m...@xxxxxxxxx> wrote:
Reef Fish wrote:
I hope this finally settled the m00es Lecture topic, which
dragged on for weeks and billions of wasted electrons
dozens of posts and NOISE, about how to test a hypothesis
about a correlation coefficient.No, it does not, because what you wrote is still not correct.
Reef Fish wrote:
It is HERE that m00es should have realized that his repeatedThis has nothing to do with a viewpoint. It's a fact. I don't need to
claim that "DATA is irrelevant" to the distribution of the
TEST STATISTICS is drastically wrong, from the APPLIED
Statistics and Data Analytic point of view.Data IS irrelevant for deriving the distribution of the test statistic.
observe any data to derive that distribution.
I'll explain this again. Why don't you explicitly point out in this
proof the source of my error.
1) The model: Y = beta0 + beta1 x + e, where e ~ iid N(0, sigma^2)
2) beta1 = rho(X,Y) * SD(Y) / SD(X)
3) Therefore, beta1 = 0 iff rho(X,Y) = 0
(since SD(Y) and SD(X) can safely be assumed to be > 0).
Now we want to test H0: beta1 = 0. As you have said yourself, we can
use:
t = b1/s(b1)
s = r * sqrt(n-2) / sqrt( 1 - r^2 )
to test H0: beta1 = 0. Why?
4) under H0: beta1 = 0, t follows a t-distribution with n - 2 degrees
of freedom
5) s = t, so both MUST have the same distribution
6) we can also use the result from Hogg & Craig to see that s has a
t-distribution with n - 2 degrees of freedom under H0. Let's use my
quote:
m00> Hogg, R. V., & Craig, A. T. (1995). Introduction to mathematical
m00> statistics (5th ed.).
m00> On pages 478-480, the authors derive the distribution of rsigma^2). We see that under H0, Y is normal and not a mixture
m00> under the bivariate normal assumption and show that
m00> under rho = 0, r * sqrt(n-2)/ sqrt(1-r^2 ) is distributed t(n-2).
m00> Now, on page 480, the authors mention EXPLICITLY that
m00> a careful review of their proof reveals that nowhere was it
m00> necessary to assume that the two variables are bivariate
m00> normal. Only one of the variables must be normal.Under H0: beta1 = 0, then Y = beta0 + 0 X + e. Therefore Y ~ N(beta0,
distribution. Therefore, s follows a t-distribution with n - 2 degrees
of freedom.
An important point: t (as well as s) only follows a t-distribution with
n - 2 degrees of freedom when beta1 = 0 holds!
7) When we reject H0: beta1 = 0, we automatically reject H0: rho = 0
and vice-versa.
q.e.d.
So, why don't you actually point out where the error is. And don't say:
Y follows a mixture distribution. Under H0, it does not.
But as I read RF's point, you don't *know* that the *data* support
H0 a priori, so you can't validly run the test to show that it does
*until* you check for normality. Kind of a Catch 22. :-) Your point
about deriving the distribution is moot in the case of the actual
process of doing the data analysis. IOW what you show is true
in theory under a set of assumptions may not be valid if those
assumptions are violated, and you need to test the validity of the
assumptions *before* proceeding. At least that is my reading of
the situation. I'll admit I've only skimmed much of the voluminous
exchange on this topic, in part because so much of it is repetitious
because neither of you are trying (it seems to me) to *understand*
what the other is saying, so I may have missed everyone's point
entirely.
If it would,
then NEITHER t NOR s would have a t-distribution with n - 2 degrees of
freedom. But since s = t, they both have the SAME distribution -- it
just won't be a central t-distribution when H0: beta1 != 0.
In fact, that's the whole idea of hypothesis testing:
(a) Assume H0 holds.
(b) Derive the distribution of the test statistic UNDER THE ASSUMPTION
THAT H0 holds (and Y is normal under that assumption).
(c) Then obtain data,
And here is where it seems to me you need:
(c-1) Test that the data satisfies the requirements of the
hypothesis test.
calculate the test statistic in the sample, and
see where the observed test statistic falls with respect to the
critical bounds according to the distribution under H0.
(d) When H0 holds, then using the critical bounds according to the
distribution under H0 guarantees that we will only reject H0 in alpha *
100% of the cases. But if H0 does not hold, then the distribution of
the test statistic (i.e., the distribution of t, which is the same as
the distribution of s) will be stochastically greater than the
distribution of t (= s) under H0. Therefore, the probability of
rejecting H0 increases, which is exactly what we would want.
But one more time: Under H0: beta1 = 0, both t and s have a central
t-distribution. Moreover, beta1 = 0 iff rho = 0. Therefore, rejecting
beta1 = 0 implies that we can reject rho = 0 and vice-versa.
So, please, enlighten me where the error is.
m00es
Enlighten me. please, if you think I'm wrong.
Cheers,
Russell
.
- Follow-Ups:
- Re: Reef Fish Statistics for Dummies: Testing Correlations
- From: Lou Thraki
- Re: Reef Fish Statistics for Dummies: Testing Correlations
- References:
- Reef Fish Statistics for Dummies: Applied Simple Regression
- From: Reef Fish
- Reef Fish Statistics for Dummies: Applied Simple Regression
- Prev by Date: Re: Test for uniform distribution for small sample size
- Next by Date: chi-squared question
- Previous by thread: Reef Fish Statistics for Dummies: Applied Simple Regression
- Next by thread: Re: Reef Fish Statistics for Dummies: Testing Correlations
- Index(es):
Relevant Pages
|