Re: PCA C code contradictions
- From: jg.campbell.ng@xxxxxxxxx
- Date: 4 Jun 2006 09:34:58 -0700
shay wrote:
I have three seperate codes that run PCA. When I run my data through
them (or even a small synthetic data set of orthogonal variable
vectors) I get three different results.
Unfortunately there are many degrees of freedom, combinations of which
lead to a large set of possibilities and so slim chances of any two
implementations giving the same answer.
Those that this non-statistician can think of:
- diagonalise covariance matrix C= E[ (x - mu)(x - mu)']; ' = transpose
- use correlation matrix R = E[ x x'];
- looking at Murtagh's PCAcorr.java (which I assume is a direct port of
his C code), I see that in addition to subtracting the means he
'standardises', i.e. scales each component such that it has unit
standard deviation; in other words the covariance matrix of /scaled/
data.
No doubt there are variations. And in the example of PCA in Venables
and Ripley, Modern Applied Statistics with S, 4th ed., Springer, I see
that they use the iris data set, but take logs. I attempted to
replicate Murtagh's results in R (free clone of S), but I haven't
worked out how to 'standardise' the data (I'm an infrequent user of R).
The different codes are:
1. Accelrys Cerius2 version 4.11
2. http://astro.u-strasbg.fr/~fmurtagh/mda-sw/
3.
http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/cluster.pdf
Can anyone explain why differences exist and more importantly which is
the "correct" one?
Best regards,
Jon C.
.
- Follow-Ups:
- Re: PCA C code contradictions
- From: jg . campbell . ng
- Re: PCA C code contradictions
- From: Greg Heath
- Re: PCA C code contradictions
- From: Reef Fish
- Re: PCA C code contradictions
- References:
- PCA C code contradictions
- From: shay
- PCA C code contradictions
- Prev by Date: Re: PCA C code contradictions
- Next by Date: Re: PCA C code contradictions
- Previous by thread: Re: PCA C code contradictions
- Next by thread: Re: PCA C code contradictions
- Index(es):