connection between preselecting parts of information with high variance and PCA



I posted this to sci.stat.consult and sci.stat.math as well. i am not
sure if the same people read all three groups.


hi all,

i am from the image processing / pattern recognition field and i have
stumbled upon an interesting mathematical problem / question. maybe it
will be trivial to you guys, but all the better :))

so, the problem is related to eigenvector decomposition, i.e. Karhunen-
Loeve's transform or as we call it - PCA (Principal Component
Analysis). Application area - Face Recognition. Images are rearranged
into vectors that represent points in n-dimensional (n being the
number of pixels in each image) space.

The idea is that eigenvectors of the covariance matrix of the set of
images (the set of points in n-D space) will decorrelate the data. The
first "most important" eigenvector (the one associated to the larges
eigenvaule) captures the direction with larges variance.... well, you
know the rest. You can then keep only a few of those eigenvectors with
larges eigenvalues and project all the images onto that new few-D
space (let's call it the k-D space, with k << n). By keeping the
vectors with largest eigenvalues you are keeping a large portion of
the energy of your data, or consequently you are keeping the most
information. eigenvectors are linear combinations of the original
dimensions, thus making PCA a simple rotation/stretch procedure.
Similarity of two images (this is the basic face recognition idea) can
then be determined by measuring the distance (e.g. Euclidean) between
the two projections in the lower dimensional space instead of doing it
in the high n-D space.

Now for the problem :) :

In my little experiment I used the preprocessed images (so the input
to calculating the covariance matrix and the rest of the PCA are not
pixels anymore, but some other coefficients - but this is irrelevant).
My "images", or to be precise, my matrices of coefficients are the
size of 128 x 128, rearranged to 1x16384 vectors (so n = 16384,
original space is n-D). I performed PCA on those "images" and then
performed a simple face recognition in the yielded k-space (k<<n). I
got recognition rate of e.g. 40% (so i recognized about 40% of images
correctly).

In the next experiment, instead of using all of the 16384 coefficients
per "image", i selected only a subset of those, the ones that for my
set of "images" have the larges variance. So i measured the variance
of each coefficient on the same spatial coordinates across all
"images", following the line of thoughts that the ones that change the
most for different persons are more important for discriminating them.
I kept 512 of them (i remembered their spatial locations and then
selected them from each image), thus going from 1x16384 to 1x512 per
"image". Now i did PCA on those 1x512 vectors and repeated the same
experiment as before (i kept k-D, where k was the same as in the
previous experiment) and got the recognition rate of 60%.

Some questions:

how come the results are so different? -i know the comparison is not
really fair because k << 16384 but only slightly smaller than 512 (k <
512), but nevertheless...

what is the connection between the two procedures in terms that PCA
finds the directions of most variance and I preselected the
coefficients with largest variances? shouldn't PCA do the same thing?
is the correct conclusion that the rest of the coefficients (the
remaining ~15000) are simply redundancy in terms of face recognition?

the main question: how does preselecting the coefficients with highest
variance across all images correlate to what standard PCA does? can
this be compared at all since one original space is 512-D and the
other 16384-D?

thanks for you help.
K.

.



Relevant Pages


Loading