Re: Correlations
- From: Ray Koopman <koopman@xxxxxx>
- Date: Thu, 16 Apr 2009 18:45:24 -0700 (PDT)
On Apr 16, 6:18 pm, omri-piano <omrit1...@xxxxxxxxx> wrote:
On Apr 17, 1:12 am, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 15, 7:24 am, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 14, 5:53 am, omri-piano <omrit1...@xxxxxxxxx> wrote:
On Apr 13, 8:56 pm, Ray Koopman <koop...@xxxxxx> wrote:
On Apr 13, 7:48 am, omri-piano <omrit1...@xxxxxxxxx> wrote:
Could anyone help out on this one?
Say X and Y are independent and each normally distributed ~N(0,1)
and Z=X+Y. Now, say we simulate a (very large) random sample of X
and Y and compute Z for each pair, so we have triplets {Zi,Xi,Yi}.
Then, we take all the triplets for which Zi=2 (or a narrow range
around this constant). It is easy to see that for this subset
of the sample, X will be inversely correlated with Y, i.e.,
Corr(X,Y) < 0.
[1] Can we analytically calculate what the correlation will
converge to (remember, we know the distributions of X and Y)?
[2] If, instead of Zi=2 we take the subset for which Zi>2.
Will X and Y be also correlated here? Can we analytically
calculate it as in [1]?
Thanks!
Omri.
For [2], rotate the axes:
let U = (X+Y)/sqrt[2], V = (Y-X)/sqrt[2].
U and V are independent standard normal.
X+Y > 2 implies U > sqrt[2].
The mean and variance of U|U>c are m = f[c]/Q[c] and v = 1-m(m-c),
where f and Q are the standard normal pdf and complementary cdf.
V|U is still standard normal.
The inverse transformations are X=(U-V)/sqrt[2], Y=(U+V)/sqrt[2].
var[X|U>c] = var[Y|U>c] = (v + 1)/2.
cov[(X,Y)|U>c] = (v - 1)/2.
corr[(X,Y)|U>c] = (v - 1)/(v + 1) = -m(m-c)/(2 - m(m-c)).
If c = sqrt[2] the correlation is -.728769
Thanks to all, especially Ray and Scott for the detailed answers.
Two follow-ups, if I may:
[A] What happens if we take 2<Zi<3, can you also quantify the
correlation of X and Y?
Same logic as before, except you need the variance of a standard
normal variable that has been truncated on both the left and right,
instead of only on the left. You should be able to work out the
rest from what has already been posted.
[B] If the distributions of X and Y are not standard normal,
but are instead say Gamma or Poisson distributed with known
parameters, can we still reach an analytical answer?
There may be some special cases that work out nicely,
but in general it is either difficult or impossible.
Hi Ray, a few Qs on your detailed develpment above:
- You write "m = f[c]/Q[c]" but I think you mean E[U|U>c]/Q[c],
since f[c] is just the PDF at c...?
- How did you get the variance of U|U>c, v = 1 - m(m-c)?
Here are some useful factoids about left-truncated
and doubly-truncated standard normal distributions.
I follow the Mathematica notation convention that uses square
bracket for function arguments and parentheses for grouping.
f[z] = standard normal pdf
For Z > c:
I0[c] = int_c^oo f[z] dz
= Q[c]
I1[c] = int_c^oo z f[z] dz
= f[c]
I2[c] = int_c^oo z^2 f[z] dz
= Q[c] + c*f[c]
m[c] = E[Z|Z>c] = I1[c]/I0[c]
= f[c] / Q[c]
w[c] = E[Z^2|Z>c] = I2[c]/I0[c]
= (Q[c] + c*f[c]) / Q[c]
= 1 + c*m[c]
v[c] = var[Z|Z>c] = w[c] - m[c]^2
= 1 - m[c]*(m[c] - c)
For a < Z < b:
I0[a,b] = Q[a] - Q[b]
= p[a,b]
m[a,b] = (I1[a] - I1[b]) / p[a,b]
= (f[a] - f[b]) / p[a,b]
v[a,b] = (I2[a] - I2[b])/p[a,b] - m[a,b]^2
= 1 + (a*f[a] - b*f[b])/p[a,b] - m[a,b]^2
- Also, a follow-up if I may: what happens if we start off from
X and Y being correlated, say corr[X,Y] = r, how does that affect
corr[(X,Y)|U>c] ?
Omri.
This is easier to understand if you draw the picture.
If X and Y are bivariate standard normal with correlation r then
U = (X+Y)/sqrt[2] and V = (Y-X)/sqrt[2] are independent zero-mean
normals with different variances: var[U] = 1+r, var[V] = 1-r.
Truncating U at c gives var[U|U>c] = (1+r)*v[c/sqrt[1+r]],
and var[V|U>c] = var[V] = 1-r. Then
X = (U-V)/sqrt[2], Y = (U+V)/sqrt[2],
var[X|U>c] = var[Y|U>c] = ((1+r)*v[c/sqrt[1+r]] + (1-r))/2,
cov[(X,Y)|U>c] = ((1+r)*v[c/sqrt[1+r]] - (1-r))/2,
(1+r)*v[c/sqrt[1+r]] - (1-r)
and corr[(X,Y)|U>c] = ----------------------------.
(1+r)*v[c/sqrt[1+r]] + (1-r)
Thanks, I understand.
[a] And if X and Y are not bivariate standard normal, but just
separately standard normal and correlated, is it the case that
we cannot progress analytically?
Right.
[b] For instance, if X and Y are biological properties, which
from a large sample we know are normal and corelated r, most
chances they are not jointly normal, right?
Nothing in nature is exactly univariate or bivariate normal. The
question is whether the actual distribution is close enough to
bivariate normal that analytic results derived assuming bivariate
normality will not be too wrong to be useful.
.
- Follow-Ups:
- Re: Correlations
- From: Ray Koopman
- Re: Correlations
- References:
- Correlations
- From: omri-piano
- Re: Correlations
- From: Ray Koopman
- Re: Correlations
- From: omri-piano
- Re: Correlations
- From: Ray Koopman
- Re: Correlations
- From: omri-piano
- Re: Correlations
- From: Ray Koopman
- Re: Correlations
- From: omri-piano
- Correlations
- Prev by Date: Re: Correlations
- Next by Date: Re: Coefficient of Determination: Experimental Design in Education
- Previous by thread: Re: Correlations
- Next by thread: Re: Correlations
- Index(es):
Relevant Pages
|