Re: distribution of sample correlation coefficient



Greetings,

I have a nasty problem which I hope you can help me
with. I am wanting to use the measured correlation
coefficient r between two sets of samples (not from a
normal population) as a feature that will allow me to
estimate some other quantity d. I plan to do this
using a Bayesian methodology, (i.e. find p(d|r) using
p(r|d), where p(r|d) is learned from samples for
which d is known). Essentially, the form of the
relationship between d and the "true" correlation
coefficient rho is known and deterministic.
Therefore, I am looking for a simple expression for
r p(r|rho), which would be equivalent to knowing
p(r|d).

Is there a nice, simple parametric form for the
distribution of sample correlation coefficients, or
is this a lost cause? I am thinking a Beta
distribution might be a reasonable model as it is
generally used to model proportions... Just how
wrong/heretic would that be?

Kendall & Stuart's classic textbook points to a very
old paper by Fisher, who derives such a distribution
for the case where the samples are drawn from a
Gaussian distribution. Unfortunately, the math in
Fisher's paper is beyond my comprehension, and I know
for a fact that my samples will not be Gaussian (I do
have a rough idea of their distribution, though).

Any thoughts?

Many thanks in advance,
Cathy


A large-sample approximation for the sampling distribution of r can be obtained from Fisher's z. This variance-stabilizing transformation goes back to the Middle Ages.

Let r be the sample correlation coefficient from a bivariate normal. Let

z = (1/2)*ln[(1+r)/(1-r)]
and zeta = (1/2)*ln[(1+rho)/(1-rho)]

Then

sqrt(n-1)*(z-zeta) is approximately N(0,1) for n large.

Jack
.



Relevant Pages

  • Re: Bayesian estimation of structured correlation/covariance
    ... correlation between a pair of paths. ... >> should formulate this as covariance matrix estimation with the ... > large step of using the Wishart/ inverse Wishart distribution for the ... > switch from viewing your problem as one of estimating the covarince ...
    (sci.stat.math)
  • Re: multicollinearity in regression
    ... I could use Analysis of Covariance but 2 of the independent variables ... I'm guessing that in the model with LOGSIZE, the LOGSIZE coefficient is ... multicollinearity: it may be that you can then see a sensible approach ... I always find it helpful to calculate the correlation coefficient ...
    (sci.stat.consult)
  • Re: correlation of random variables
    ... > John D'Errico wrote: ... that which xcorr produces does indeed reflect the mean ... >> the true correlation coefficient between two random variables. ...
    (comp.soft-sys.matlab)
  • Re: generating data with specified spearman correlation
    ... >> use a bivariate uniform distribution for which the correlation ... > marginals) for which the Spearman correlation can be specified by ...
    (sci.stat.math)
  • Re: significance of correlation affected by sample size?
    ... > Thom wrote: ... the coefficient of Xj is r times sy (std. ... > Pearson r for simple regression or the partial correlation between ... > themselves rather useless in the sense already discussed. ...
    (sci.stat.edu)