Re: significance of correlation affected by sample size?




Herman Rubin wrote:
> In article <1132843929.037229.164490@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>,
> Thom <t.s.baguley@xxxxxxxxxxx> wrote:
> >Conceptually:
>
> >Statistical significance = effect size x sample size
>
> A far more correct statement is that the "statistical
> significance" is approximately a function of the effect
> size multiplied by the square root of the sample size.

Both are incorrectly stated because of the undefined
nature of "statistical significance". (See below)

This question/answer was given in the thread in June:

" Do the critical values of linear correlation depend on sample size?"

http://groups.google.com/group/sci.stat.math/msg/601db8302f0f2b2a?hl=en

In the above article in that thread, my stated result was
r is significant (two-tailed) if

RF> |R|* sqrt((n-2)/(1 - R*R)) > t(1-alpha/2;(n-2)).

RF> or equivalently, if |R| > t /(sqrt((n-2) + t*t))

where t is the critical value at alpha/2 for t with (n-2) df.

Since sqrt((n-2) + t*t)) is approximate sqrt(n) for large n,

an easy mnemonic device (using the asymptotic approx,)
is to think of the standard error of r as 1/sqrt(n).

Thus, the r is statistically significant at the 95% level if

I r I > 2/sqrt(100) = 0.2 if n = 100
and I r I > 2/sqrt(10000) = 0.02 if n = 10,000

and so on.


> Even this is not a good guide as to what action should
> be taken. To get that, one should look upon the
> problem as a decision problem.

It is always a bad idea to make any decision based on the
value OR significance of ANY correlation coefficient!

I have shown elsewhere that a highly significant r , with
p value smaller than 0.0001 say, could be a completely
USELESS result in a regression problem. (The actual
example was the SPSS Manual data re-analyzed).

RF> In the case of a regression, testing for the significance of R is
RF> ALWAYS the WRONG QUESTION. John Tukey said something
RF>to the effect that using R is sweep the data under the rug
RF> with a vengeance.

-- Bob.

.