Re: Fix of previously sent question on PCA

From: Paige Miller (paige.miller_at_kodak.com)
Date: 09/20/04


Date: Mon, 20 Sep 2004 08:42:00 -0400

Ross Clement wrote:
> Hi. Thanks for the reply.
>
> Paige Miller <paige.miller@kodak.com> wrote in message news:<cielso$9r2$1@news.kodak.com>...
>
>>Why not just use the first dimension? How do you know dimension 2 is
>>"real" as opposed to it being "noise"?
>
> Erm, this isn't my method, but a (possibly distorted) version of a
> standard method. The first two dimensions are used in the
> stereotypical version of this method because the texts are plotted
> onto a 2d graph.

But, if only one dimension is important, there's no value in forcing
a 2nd dimension just so you can plot things in 2D. Plot them in 1D!

>>Hmmm, Euclidean distances with PCA? I think there's a problem.
>>Mahalanobis distances are better suited to be used with PCA because
>>Mahalanobis specifically takes into account covariance ... in other
>>words, a 1 unit change in a certain direction may be more or less
>>meaningful than a 1 unit change in a perpendicular dimension.
>
> Actually, I should be using Mahalanobis distance or something similar
> anywhay, since there are strong correlations in the raw frequency
> data. The relative frequencies of one and two letter words for
> different texts show little correlation, but for longer lengths (e.g.
> six versus seven letter words), the correlation is very high, e.g. 0.8
> or even over 0.9.
>
> So, I guess that I'm finally going to have to bite the bullet and
> write code for Mahalanobis distance.

If you are using matrix multiplication, this should be very very
easy. So easy, in fact, that you can allocate an whole hour for it,
and then tell your boss you spent an hour on creating a program that
computes Mahalanobis distance (he will be very impressed), when in
fact you spent 20 seconds on it.

>>But it seems to me that PCA is not the technique you want to use for
>>this particular process. Discriminant analysis seems more
>>well-suited for this activity, because it solves the problem you
>>have: given a set of measures, what category does this new
>>observation fall into?
> <...snip...>

But more realistically, forget Mahalanobis. Go straight to
Discriminant Analysis. PCA isn't the right tool for the job.

-- 
Paige Miller
Eastman Kodak Company
paige dot miller at kodak dot com
http://www.kodak.com
"It's nothing until I call it!" -- Bill Klem, NL Umpire
"When you get the choice to sit it out or dance, I hope you dance" 
-- Lee Ann Womack


Relevant Pages

  • Re: Fix of previously sent question on PCA
    ... How do you know dimension 2 is ... > Mahalanobis distances are better suited to be used with PCA because ... I should be using Mahalanobis distance or something similar ... Discriminant analysis is used by others, but I haven't "got around" to ...
    (sci.stat.math)
  • Re: V
    ... exponentiation and not cardinal exponentiation ... Now i is called the first dimension ... and each term is countabel since ...
    (sci.math)
  • Re: V
    ... Now i is called the first dimension ... and each term is countabel since ... Or do you intend to have a countable set of constants but each one ...
    (sci.math)
  • Threads: Mahalanobis distance and Visualising PCA
    ... Things are becoming much clearer. ... and substituting pca scores for the raw values do not make ... dimension of the data ranging from white (dimension is useless in ... >From the samples I've seen so far, the first dimension is typically a ...
    (sci.stat.math)
  • Re: The 4th Dimension
    ... This line formed, is the FIRST dimension. ... then a plane is formed. ... > If the plane were moved upwards, it would form a cube, or the THIRD ...
    (sci.physics)

Loading