Re: Sorting a correlation matrix with R-statistics
- From: Marc Schwartz <marc_schwartz@xxxxxxxxxxx>
- Date: Sat, 03 Feb 2007 14:44:17 -0600
Erkki.Komulainen@xxxxxxxxxxxxxxxxxxx wrote:
maureeze@xxxxxxxxx wrote:
:I am looking for a way to sort my matrix and find an easy way to
:locate the highest 25 correlations in a matrix of 1000 by 1000
:variables. I want to produce a list of the correlations, from the
:strongest correlation to the weakest. E.g. if the variables are called
:x1, x2, x3, .., xn, then there might be a list:
I would save the correlation matrix as a file. Its columns would then be
stacked into three colums: col1 = the code of x-variable, col 2 = the
code of y-variable and col 3 = the correlation. I would read this
file into a suitable programme and do the sorting. This can be done with
SPSS and I think that should be possible with similar products.
HTH
Erkki
First to the OP, please post R specific queries to the r-help e-mail list. Information on that is on the web page you cited in your post.
Second, to the respondent, are you suggesting that R is not an appropriate application for this simple task? Please.
To provide an example, using the 'swiss' dataset, which is available in R:
> cor(swiss)
Fertility Agriculture Examination Education
Fertility 1.0000000 0.35307918 -0.6458827 -0.66378886
Agriculture 0.3530792 1.00000000 -0.6865422 -0.63952252
Examination -0.6458827 -0.68654221 1.0000000 0.69841530
Education -0.6637889 -0.63952252 0.6984153 1.00000000
Catholic 0.4636847 0.40109505 -0.5727418 -0.15385892
Infant.Mortality 0.4165560 -0.06085861 -0.1140216 -0.09932185
Catholic Infant.Mortality
Fertility 0.4636847 0.41655603
Agriculture 0.4010951 -0.06085861
Examination -0.5727418 -0.11402160
Education -0.1538589 -0.09932185
Catholic 1.0000000 0.17549591
Infant.Mortality 0.1754959 1.00000000
# Create a 3 column data frame from the results
> DF <- as.data.frame.table(cor(swiss))
> DF
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
2 Agriculture Fertility 0.35307918
3 Examination Fertility -0.64588271
4 Education Fertility -0.66378886
5 Catholic Fertility 0.46368470
6 Infant.Mortality Fertility 0.41655603
7 Fertility Agriculture 0.35307918
8 Agriculture Agriculture 1.00000000
9 Examination Agriculture -0.68654221
10 Education Agriculture -0.63952252
11 Catholic Agriculture 0.40109505
12 Infant.Mortality Agriculture -0.06085861
13 Fertility Examination -0.64588271
14 Agriculture Examination -0.68654221
15 Examination Examination 1.00000000
16 Education Examination 0.69841530
17 Catholic Examination -0.57274181
18 Infant.Mortality Examination -0.11402160
19 Fertility Education -0.66378886
20 Agriculture Education -0.63952252
21 Examination Education 0.69841530
22 Education Education 1.00000000
23 Catholic Education -0.15385892
24 Infant.Mortality Education -0.09932185
25 Fertility Catholic 0.46368470
26 Agriculture Catholic 0.40109505
27 Examination Catholic -0.57274181
28 Education Catholic -0.15385892
29 Catholic Catholic 1.00000000
30 Infant.Mortality Catholic 0.17549591
31 Fertility Infant.Mortality 0.41655603
32 Agriculture Infant.Mortality -0.06085861
33 Examination Infant.Mortality -0.11402160
34 Education Infant.Mortality -0.09932185
35 Catholic Infant.Mortality 0.17549591
36 Infant.Mortality Infant.Mortality 1.00000000
# Now sort the above in decreasing order
# of the correlation coefficient
> with(DF[order(Freq, decreasing = TRUE), ]
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
8 Agriculture Agriculture 1.00000000
15 Examination Examination 1.00000000
22 Education Education 1.00000000
29 Catholic Catholic 1.00000000
36 Infant.Mortality Infant.Mortality 1.00000000
16 Education Examination 0.69841530
21 Examination Education 0.69841530
5 Catholic Fertility 0.46368470
25 Fertility Catholic 0.46368470
6 Infant.Mortality Fertility 0.41655603
31 Fertility Infant.Mortality 0.41655603
11 Catholic Agriculture 0.40109505
26 Agriculture Catholic 0.40109505
2 Agriculture Fertility 0.35307918
7 Fertility Agriculture 0.35307918
30 Infant.Mortality Catholic 0.17549591
35 Catholic Infant.Mortality 0.17549591
12 Infant.Mortality Agriculture -0.06085861
32 Agriculture Infant.Mortality -0.06085861
24 Infant.Mortality Education -0.09932185
34 Education Infant.Mortality -0.09932185
18 Infant.Mortality Examination -0.11402160
33 Examination Infant.Mortality -0.11402160
23 Catholic Education -0.15385892
28 Education Catholic -0.15385892
17 Catholic Examination -0.57274181
27 Examination Catholic -0.57274181
10 Education Agriculture -0.63952252
20 Agriculture Education -0.63952252
3 Examination Fertility -0.64588271
13 Fertility Examination -0.64588271
4 Education Fertility -0.66378886
19 Fertility Education -0.66378886
9 Examination Agriculture -0.68654221
14 Agriculture Examination -0.68654221
# Now, just take the first 25 rows
> with(DF[order(Freq, decreasing = TRUE)[1:25], ]
Var1 Var2 Freq
1 Fertility Fertility 1.00000000
8 Agriculture Agriculture 1.00000000
15 Examination Examination 1.00000000
22 Education Education 1.00000000
29 Catholic Catholic 1.00000000
36 Infant.Mortality Infant.Mortality 1.00000000
16 Education Examination 0.69841530
21 Examination Education 0.69841530
5 Catholic Fertility 0.46368470
25 Fertility Catholic 0.46368470
6 Infant.Mortality Fertility 0.41655603
31 Fertility Infant.Mortality 0.41655603
11 Catholic Agriculture 0.40109505
26 Agriculture Catholic 0.40109505
2 Agriculture Fertility 0.35307918
7 Fertility Agriculture 0.35307918
30 Infant.Mortality Catholic 0.17549591
35 Catholic Infant.Mortality 0.17549591
12 Infant.Mortality Agriculture -0.06085861
32 Agriculture Infant.Mortality -0.06085861
24 Infant.Mortality Education -0.09932185
34 Education Infant.Mortality -0.09932185
18 Infant.Mortality Examination -0.11402160
33 Examination Infant.Mortality -0.11402160
23 Catholic Education -0.15385892
See ?as.data.frame.table, ?cor, ?order and ?with
Spend some time reading "An Introduction to R", which is available with your installation or from the main R web site.
HTH,
Marc Schwartz
.
- Follow-Ups:
- Re: Sorting a correlation matrix with R-statistics
- From: Marc Schwartz
- Re: Sorting a correlation matrix with R-statistics
- References:
- Sorting a correlation matrix with R-statistics
- From: maureeze
- Re: Sorting a correlation matrix with R-statistics
- From: Erkki . Komulainen
- Sorting a correlation matrix with R-statistics
- Prev by Date: Re: Sorting a correlation matrix with R-statistics
- Next by Date: Re: Sorting a correlation matrix with R-statistics
- Previous by thread: Re: Sorting a correlation matrix with R-statistics
- Next by thread: Re: Sorting a correlation matrix with R-statistics
- Index(es):
Loading