Re: Reef Fish Statistics for Dummies: Testing Correlations



On Oct 5, 4:02 am, "m00es" <m...@xxxxxxxxx> wrote:
Reef Fish wrote:
I hope this finally settled the m00es Lecture topic, which
dragged on for weeks and billions of wasted electrons
dozens of posts and NOISE, about how to test a hypothesis
about a correlation coefficient.No, it does not, because what you wrote is still not correct.

Reef Fish wrote:
It is HERE that m00es should have realized that his repeated
claim that "DATA is irrelevant" to the distribution of the
TEST STATISTICS is drastically wrong, from the APPLIED
Statistics and Data Analytic point of view.Data IS irrelevant for deriving the distribution of the test statistic.
This has nothing to do with a viewpoint. It's a fact. I don't need to
observe any data to derive that distribution.

I'll explain this again. Why don't you explicitly point out in this
proof the source of my error.

1) The model: Y = beta0 + beta1 x + e, where e ~ iid N(0, sigma^2)

2) beta1 = rho(X,Y) * SD(Y) / SD(X)

3) Therefore, beta1 = 0 iff rho(X,Y) = 0

(since SD(Y) and SD(X) can safely be assumed to be > 0).

Now we want to test H0: beta1 = 0. As you have said yourself, we can
use:

t = b1/s(b1)
s = r * sqrt(n-2) / sqrt( 1 - r^2 )

to test H0: beta1 = 0. Why?

4) under H0: beta1 = 0, t follows a t-distribution with n - 2 degrees
of freedom
5) s = t, so both MUST have the same distribution
6) we can also use the result from Hogg & Craig to see that s has a
t-distribution with n - 2 degrees of freedom under H0. Let's use my
quote:

m00> Hogg, R. V., & Craig, A. T. (1995). Introduction to mathematical
m00> statistics (5th ed.).

m00> On pages 478-480, the authors derive the distribution of r
m00> under the bivariate normal assumption and show that
m00> under rho = 0, r * sqrt(n-2)/ sqrt(1-r^2 ) is distributed t(n-2).
m00> Now, on page 480, the authors mention EXPLICITLY that
m00> a careful review of their proof reveals that nowhere was it
m00> necessary to assume that the two variables are bivariate
m00> normal. Only one of the variables must be normal.Under H0: beta1 = 0, then Y = beta0 + 0 X + e. Therefore Y ~ N(beta0,
sigma^2). We see that under H0, Y is normal and not a mixture
distribution. Therefore, s follows a t-distribution with n - 2 degrees
of freedom.

An important point: t (as well as s) only follows a t-distribution with
n - 2 degrees of freedom when beta1 = 0 holds!

7) When we reject H0: beta1 = 0, we automatically reject H0: rho = 0
and vice-versa.

q.e.d.

So, why don't you actually point out where the error is. And don't say:
Y follows a mixture distribution. Under H0, it does not.

But as I read RF's point, you don't *know* that the *data* support
H0 a priori, so you can't validly run the test to show that it does
*until* you check for normality. Kind of a Catch 22. :-) Your point
about deriving the distribution is moot in the case of the actual
process of doing the data analysis. IOW what you show is true
in theory under a set of assumptions may not be valid if those
assumptions are violated, and you need to test the validity of the
assumptions *before* proceeding. At least that is my reading of
the situation. I'll admit I've only skimmed much of the voluminous
exchange on this topic, in part because so much of it is repetitious
because neither of you are trying (it seems to me) to *understand*
what the other is saying, so I may have missed everyone's point
entirely.

If it would,
then NEITHER t NOR s would have a t-distribution with n - 2 degrees of
freedom. But since s = t, they both have the SAME distribution -- it
just won't be a central t-distribution when H0: beta1 != 0.

In fact, that's the whole idea of hypothesis testing:

(a) Assume H0 holds.

(b) Derive the distribution of the test statistic UNDER THE ASSUMPTION
THAT H0 holds (and Y is normal under that assumption).

(c) Then obtain data,

And here is where it seems to me you need:
(c-1) Test that the data satisfies the requirements of the
hypothesis test.

calculate the test statistic in the sample, and
see where the observed test statistic falls with respect to the
critical bounds according to the distribution under H0.

(d) When H0 holds, then using the critical bounds according to the
distribution under H0 guarantees that we will only reject H0 in alpha *
100% of the cases. But if H0 does not hold, then the distribution of
the test statistic (i.e., the distribution of t, which is the same as
the distribution of s) will be stochastically greater than the
distribution of t (= s) under H0. Therefore, the probability of
rejecting H0 increases, which is exactly what we would want.

But one more time: Under H0: beta1 = 0, both t and s have a central
t-distribution. Moreover, beta1 = 0 iff rho = 0. Therefore, rejecting
beta1 = 0 implies that we can reject rho = 0 and vice-versa.

So, please, enlighten me where the error is.

m00es

Enlighten me. please, if you think I'm wrong.

Cheers,
Russell

.



Relevant Pages

  • =?UTF-8?Q?Re:_Testing_Variance=C2=B4s_Homogeneity?=
    ... to define *confidence bands* as you stated. ... we don´t know C.V. from the mathematical expression, ... Constructing the samples with the Distribution ... Evaluating the test statistics W, ...
    (sci.stat.math)
  • =?UTF-8?Q?Re:_Testing_Variance=C2=B4s_Homogeneity?=
    ... You naivelly are using only parameters that included in a test statistics leading to, generally speaking, a Normal Standard Distribution. ... However (I´m loosing time too to teaching you) there are a kind of tests to which this is not so. ...
    (sci.stat.math)
  • The genesis of Critical Values
    ... Critical Values are a direct consequence of the Test Statistics Distribution Function. ... _2) Can I attain a set of the Test Statistics Values such that its cumulative frequencies be very close to the theoretical Distribution, difference in absolute value = d? ... The final prove is that I was able to obtain exactly the Critical Values in the case of known exact theoretical Sample Distributions. ...
    (sci.stat.math)
  • Re: sums of squares of Student-t variates
    ... The F distribution came up when I was working with a multivariate t-distribution and assuming mean and covariance C of the multivariate normal to be known. ... Then I wanted the distribution of the Mahanalobis distance based on C, so I effectively had independent normal rvs, not independent t rvs. ... Having said that, I doubt that 'daily stock market log returns' are independent, and the study of the 'distribution of stock market variance and volatility over N-day samples' suggests my comment might still be relevant. ...
    (sci.stat.math)
  • Re: sums of squares of Student-t variates
    ... The F distribution came up when I was ... working with a multivariate t-distribution (defined as a multivariate normal ... I doubt that 'daily stock market log returns' are ... realized stock market variance. ...
    (sci.stat.math)

Quantcast