Re: Approximation of correlations
From: Richard Ulrich (Rich.Ulrich_at_comcast.net)
Date: 08/29/04
- Next message: Woodhouser: "Re: Confidence Interval from the covariance"
- Previous message: Aleks Jakulin: "Re: Explanation of Maximum Entropy"
- In reply to: Lou Pirog: "Re: Approximation of correlations"
- Next in thread: Glen: "Re: Approximation of correlations"
- Messages sorted by: [ date ] [ thread ]
Date: Sun, 29 Aug 2004 15:07:04 -0400
On Sun, 29 Aug 2004 14:34:55 GMT, Lou Pirog <lpirog@comcast.net>
wrote:
> Rich and Ray, thanks to both of you for your responses.
>
> Ray, I will follow up on the eigenvalues and establishing a criteria to
> maximize (like the determinant. Are there other examples of criteria?). It
> sounds like this approach could establish some boundaries on the missing
> values.
>
> Rich, the question is most definitely still open. You're exactly right
> in your determination of what information is currently available
> (i.e., B sets of VxV matrices and V sets of BxB matrices). What's
> ultimately needed is a (BxV) x (BxV) matrix (as Ray had placed at the end
> of his note) where each row/column intersection is for a particular pair
> of B/V combinations, say B(1)/V(5) and B(2)/V(3).
>
> I've added
> comments/questions to portions of your reply below:
Okay, I'm better oriented, but I'm not sure yet what
Ray was writing about, or if you and I are on the same page.
Here, I will expand on what I was saying about correlations
in discriminant function - in case that fits.
Last night, I wondered what meaning there could be in
certain cross-'correlations' but I see that I had already
given one specific context for understanding that, from
the D.F. example. I will show that, below.
I'm snipping the rest of the post, which raised some questions
one at a time. Also, I am borrowing Ray's layout for
correlations, since you are using it, too.
Here is Ray's layout, though I have changed some entries,
and I offer a different exposition. (Use Fixed font to view.)
Businesses X, Y.
Variables a, b, c.
Dots "." represent symmetrical entries.
X Y
a b c a b c
a 1 r r 1 r r
X b . 1 r . 1 r
c . . 1 . . 1
a . . . 1 r r
Y b . . . . 1 r
c . . . . . 1
I'm considering the model of two-group discriminant
function (DF), and then a bit more.
Consider in the table the intersection of X-X and Y-Y.
It is simple to say that these can represent the
correlations within the subsamples X and Y.
The "within-groups" correlation matrix that SPSS gives
is then *one* thing that could be denoted as the
'intersection' of X and Y, representing the pooled, separate
correlations. DF does its pooling of the sums of squares
and cross-products, if I recall correctly. The SS assume
a common variance, and use the separate means. If the
correlations are zero in each group, then the pooled
correlation will also be zero, despite an overall
non-zero correlation that could be induced by differences
in means.
(There are different notions of pooling that could be
used for unequal Ns and unequal variances. Just average
the correlations? I will skip that complexity here.)
The "total-groups" correlation matrix is another thing
that could be denoted by the 'union' of X and Y, achieved
by concatenating the two groups and computing r's. Group
differences can induce correlations: For instance,
height and vocabulary may have small correlations among
students 8 years old or those 16 years old. However, the
total pooled set shows that the taller students know more
words -- thus, a large r.
Now, the differences in means are also computable as r's,
of a sort, on the t-tests between groups. Those t's or r's
are what account for the differences between the two versions
of correlations that I just described. The DF analysis
is satisfied to present the univariate tests between the
groups, without ever bothering to generate what might be
called - by analogy with ANOVA - the "Between-groups"
correlation matrix. That would be defined, I imagine,
in some fashion by subtraction of the other two matrices,
probably using sums of squares rather than r's.
It probably could be computed alternatively by using the
two r's or R-squareds from the simple tests on means, along
with one matrix of the two, using formulas for partial
or multiple correlation. But it seems more compact and
more intelligible - for most purposes - to show the within
matrix and the total matrix; and then using the t-tests,
instead of generating the between matrix, which encodes
the amount of confounding that exists in the pooled r's.
How is this extend to multiple variables? DF uses a
simple within-pooling of multiple groups, and shows the
univariate tests. I think SPSS does not show the original
r's, but that could be an option of any package's presentation.
The original post could be asking for (as I see it) a
similar set of computations that I have just described,
done for each and every pair of groups. I wonder about
what it could be needed for.
I'm curious as to how much of this is helpful -
-- Rich Ulrich, wpilib@pitt.ed http://www.pitt.edu/~wpilib/index.html
- Next message: Woodhouser: "Re: Confidence Interval from the covariance"
- Previous message: Aleks Jakulin: "Re: Explanation of Maximum Entropy"
- In reply to: Lou Pirog: "Re: Approximation of correlations"
- Next in thread: Glen: "Re: Approximation of correlations"
- Messages sorted by: [ date ] [ thread ]