Re: How to visualize multiple variables correlation

From: Reef Fish (Large_Nassau_Grouper_at_Yahoo.com)
Date: 03/28/05


Date: 28 Mar 2005 04:36:24 -0800


demiourgos@gmail.com wrote:
> Well, the subj. says it all.
>
> I have too much variables to fit into single scatterplot (about
1000),
> yet I have to somehow plot the correlation between them.
>
> Any ideas ? Thanks.

Two well-known expressions come to mind:

"Less is more; more is less"
"Garbage in; garbage out"

and they seem to be appropriate characterization of your "problem", as
described.

Nearly all of the follow-up ideas suggested computer programs or
multivariate methods that would simply produce more garbage from
garbage.

These two ideas should have been considered FIRST:

1. A scatterplot will only enable you to visualize whether the
relation
    between a PAIR of variables is LINEAR, to decide whether a
correlation
    measure is appropriate to captain the linear relation.

2. The MULTIVARIATE relations among a set of variables are very POORLY
    indicated by the pairwise relations among the variables.

Be thankful that the suggested computer programs will not physically
work for your 1,000 variables. They will NEVER work, conceptually,
even if you have only 10 variables and all of the computer programs
will deliver garbage for you as you request.

Here is a very simple idea/example to illustrate the point (2) above.

The dreaded "multicollinearity" problem in a multiple linear regression
with 10 variables, say, can be the result of a set of variables in
which
EVERY pair of X's has a low correlation and yet the X'X matrix may be
singular or near-singular ("multicollinear") for that matrix to be non-
invertible, and the resulting regression to be "garbage", as in
"garbage
in, garbage out".

Your 1,000 variables only magnify the 10-variable "garbage" problem
10,000 fold.

The only thing I can say FOR SURE is that it is INAPPROPRIATE to start
yout problem with a scattermatrix and correlations.

You need data-reduction and the use of APPROPRIATE multivariabe
methods.

-- Bob.