Re: Question about using PCA to select major features from dataset
- From: Jonathan Campbell <jg.campbell.ng@xxxxxxxxx>
- Date: Mon, 14 Apr 2008 17:23:49 GMT
Jake wrote:
Hello all,
I need to use Principal components analysis(PCA) to help select most
important features from a dataset.
The features of the dataset are as follows: <f1, f2, f3, f4, f5>
The total number of record is 100. So that the dataset matrix is of
dimension 100x5.
After I run the PCA and I use the top 3 eigenvectors
'SelectedFeatureVectors '(corresponding the 3 largest eigenvalues) to
generate new dataset as follows,
NewDataSet = SelectedFeatureVectors x RawDataAdjustWithMean.
Now the NewDataSet is of dimension 100 x 3.
So my question is how do I know which features are selected for the
top three most important features?
In other words, which two features from <f1, f2, f3, f4, f5> has been
filtered from the NewDataSet?
If it's PCA as I know it, you will now have <n1, n2, n3> (n = new feature).
n1 = a11*f1 + a12*f2 ... a15*f5
n2 = a21*f1 ....
n3 = a31*f1 ... + a35*f5
The (a11, a12 ... a15) vector of coefficients is chose so as to maximise the variance of n1; (a22, ... a25) to maximise the variance of n2, subject to the constraint that the vector (a22, ...) is perpendicular to (a21, ...).
You might be able to infer some measure of 'most important' of the fns from size the coefficients (a11, a12, ...a15), but it's not trivial.
Best regards,
Jon C.
.
- Follow-Ups:
- References:
- Prev by Date: Re: find largest cluster?
- Next by Date: Re: Question about using PCA to select major features from dataset
- Previous by thread: Question about using PCA to select major features from dataset
- Next by thread: Re: Question about using PCA to select major features from dataset
- Index(es):