Re: Correlations



On Apr 16, 6:18 pm, omri-piano <omrit1...@xxxxxxxxx> wrote:
On Apr 17, 1:12 am, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 15, 7:24 am, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 14, 5:53 am, omri-piano <omrit1...@xxxxxxxxx> wrote:
On Apr 13, 8:56 pm, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 13, 7:48 am, omri-piano <omrit1...@xxxxxxxxx> wrote:
Could anyone help out on this one?
Say X and Y are independent and each normally distributed ~N(0,1)
and Z=X+Y. Now, say we simulate a (very large) random sample of X
and Y and compute Z for each pair, so we have triplets {Zi,Xi,Yi}.
Then, we take all the triplets for which Zi=2 (or a narrow range
around this constant). It is easy to see that for this subset
of the sample, X will be inversely correlated with Y, i.e.,
Corr(X,Y) < 0.
[1] Can we analytically calculate what the correlation will
converge to (remember, we know the distributions of X and Y)?
[2] If, instead of Zi=2 we take the subset for which Zi>2.
Will X and Y be also correlated here? Can we analytically
calculate it as in [1]?
Thanks!
Omri.

For [2], rotate the axes:
let U = (X+Y)/sqrt[2], V = (Y-X)/sqrt[2].
U and V are independent standard normal.
X+Y > 2 implies U > sqrt[2].
The mean and variance of U|U>c are m = f[c]/Q[c] and v = 1-m(m-c),
where f and Q are the standard normal pdf and complementary cdf.
V|U is still standard normal.

The inverse transformations are X=(U-V)/sqrt[2], Y=(U+V)/sqrt[2].
var[X|U>c] = var[Y|U>c] = (v + 1)/2.
cov[(X,Y)|U>c] = (v - 1)/2.
corr[(X,Y)|U>c] = (v - 1)/(v + 1) = -m(m-c)/(2 - m(m-c)).
If c = sqrt[2] the correlation is -.728769

Thanks to all, especially Ray and Scott for the detailed answers.
Two follow-ups, if I may:
[A] What happens if we take 2<Zi<3, can you also quantify the
correlation of X and Y?

Same logic as before, except you need the variance of a standard
normal variable that has been truncated on both the left and right,
instead of only on the left. You should be able to work out the
rest from what has already been posted.

[B] If the distributions of X and Y are not standard normal,
but are instead say Gamma or Poisson distributed with known
parameters, can we still reach an analytical answer?

There may be some special cases that work out nicely,
but in general it is either difficult or impossible.

Hi Ray, a few Qs on your detailed develpment above:
- You write "m = f[c]/Q[c]" but I think you mean E[U|U>c]/Q[c],
since f[c] is just the PDF at c...?
- How did you get the variance of U|U>c, v = 1 - m(m-c)?

Here are some useful factoids about left-truncated
and doubly-truncated standard normal distributions.

I follow the Mathematica notation convention that uses square
bracket for function arguments and parentheses for grouping.

f[z] = standard normal pdf

For Z > c:

I0[c] = int_c^oo f[z] dz

= Q[c]

I1[c] = int_c^oo z f[z] dz

= f[c]

I2[c] = int_c^oo z^2 f[z] dz

= Q[c] + c*f[c]

m[c] = E[Z|Z>c] = I1[c]/I0[c]

= f[c] / Q[c]

w[c] = E[Z^2|Z>c] = I2[c]/I0[c]

= (Q[c] + c*f[c]) / Q[c]

= 1 + c*m[c]

v[c] = var[Z|Z>c] = w[c] - m[c]^2

= 1 - m[c]*(m[c] - c)

For a < Z < b:

I0[a,b] = Q[a] - Q[b]

= p[a,b]

m[a,b] = (I1[a] - I1[b]) / p[a,b]

= (f[a] - f[b]) / p[a,b]

v[a,b] = (I2[a] - I2[b])/p[a,b] - m[a,b]^2

= 1 + (a*f[a] - b*f[b])/p[a,b] - m[a,b]^2

- Also, a follow-up if I may: what happens if we start off from
X and Y being correlated, say corr[X,Y] = r, how does that affect
corr[(X,Y)|U>c] ?
Omri.

This is easier to understand if you draw the picture.
If X and Y are bivariate standard normal with correlation r then
U = (X+Y)/sqrt[2] and V = (Y-X)/sqrt[2] are independent zero-mean
normals with different variances: var[U] = 1+r, var[V] = 1-r.

Truncating U at c gives var[U|U>c] = (1+r)*v[c/sqrt[1+r]],
and var[V|U>c] = var[V] = 1-r. Then
X = (U-V)/sqrt[2], Y = (U+V)/sqrt[2],
var[X|U>c] = var[Y|U>c] = ((1+r)*v[c/sqrt[1+r]] + (1-r))/2,
cov[(X,Y)|U>c] = ((1+r)*v[c/sqrt[1+r]] - (1-r))/2,

(1+r)*v[c/sqrt[1+r]] - (1-r)
and corr[(X,Y)|U>c] = ----------------------------.
(1+r)*v[c/sqrt[1+r]] + (1-r)

Thanks, I understand.
[a] And if X and Y are not bivariate standard normal, but just
separately standard normal and correlated, is it the case that
we cannot progress analytically?

Right.

[b] For instance, if X and Y are biological properties, which
from a large sample we know are normal and corelated r, most
chances they are not jointly normal, right?

Nothing in nature is exactly univariate or bivariate normal. The
question is whether the actual distribution is close enough to
bivariate normal that analytic results derived assuming bivariate
normality will not be too wrong to be useful.
.



Relevant Pages

  • Re: Questions concerning T-tests
    ... The same goes for the K-S test of normality for each ... because the two distributions are not assumed to be ... The above solution is robust wrt using very large samples, ...
    (sci.stat.math)
  • Re: Gaussian distribution
    ... Gaussian marginals and not joint distributions. ... Instead of assuming a Gaussian model, ... Much of what is done does NOT depend on normality. ...
    (sci.stat.math)
  • Re: Questions concerning T-tests
    ... >wrt deviations from normality, ... The same goes for the K-S test of normality for each ... because the two distributions are not assumed to be ... then the Mann-Whitney test (mentioned ...
    (sci.stat.math)
  • Re: Normal curves
    ... I do understand the measure thoretic probability ..having some ... Usenet is /was for the inquisitve.. ... demographic variety) about the normality of values of question ... depending on the distributions, which will be different for different ...
    (sci.stat.math)
  • Re: Non-parametric correction for dependent variable
    ... Pearson's r does not "imply normality". ... shows the product moment correlation for any distributions. ... a Pearson r on the rank-transformed data -- can gain from ... correlations that could be computed within narrow age groups. ...
    (sci.stat.math)

Quantcast