Re: Fix of previously sent question on PCA
From: Paige Miller (paige.miller_at_kodak.com)
Date: 09/20/04
- Next message: Paige Miller: "Re: Mahalanobis distance & covariance matrices"
- Previous message: George Kahrimanis: "Re: Confidence interval on mean for a set of numbers"
- In reply to: Ross Clement: "Re: Fix of previously sent question on PCA"
- Messages sorted by: [ date ] [ thread ]
Date: Mon, 20 Sep 2004 08:42:00 -0400
Ross Clement wrote:
> Hi. Thanks for the reply.
>
> Paige Miller <paige.miller@kodak.com> wrote in message news:<cielso$9r2$1@news.kodak.com>...
>
>>Why not just use the first dimension? How do you know dimension 2 is
>>"real" as opposed to it being "noise"?
>
> Erm, this isn't my method, but a (possibly distorted) version of a
> standard method. The first two dimensions are used in the
> stereotypical version of this method because the texts are plotted
> onto a 2d graph.
But, if only one dimension is important, there's no value in forcing
a 2nd dimension just so you can plot things in 2D. Plot them in 1D!
>>Hmmm, Euclidean distances with PCA? I think there's a problem.
>>Mahalanobis distances are better suited to be used with PCA because
>>Mahalanobis specifically takes into account covariance ... in other
>>words, a 1 unit change in a certain direction may be more or less
>>meaningful than a 1 unit change in a perpendicular dimension.
>
> Actually, I should be using Mahalanobis distance or something similar
> anywhay, since there are strong correlations in the raw frequency
> data. The relative frequencies of one and two letter words for
> different texts show little correlation, but for longer lengths (e.g.
> six versus seven letter words), the correlation is very high, e.g. 0.8
> or even over 0.9.
>
> So, I guess that I'm finally going to have to bite the bullet and
> write code for Mahalanobis distance.
If you are using matrix multiplication, this should be very very
easy. So easy, in fact, that you can allocate an whole hour for it,
and then tell your boss you spent an hour on creating a program that
computes Mahalanobis distance (he will be very impressed), when in
fact you spent 20 seconds on it.
>>But it seems to me that PCA is not the technique you want to use for
>>this particular process. Discriminant analysis seems more
>>well-suited for this activity, because it solves the problem you
>>have: given a set of measures, what category does this new
>>observation fall into?
> <...snip...>
But more realistically, forget Mahalanobis. Go straight to
Discriminant Analysis. PCA isn't the right tool for the job.
-- Paige Miller Eastman Kodak Company paige dot miller at kodak dot com http://www.kodak.com "It's nothing until I call it!" -- Bill Klem, NL Umpire "When you get the choice to sit it out or dance, I hope you dance" -- Lee Ann Womack
- Next message: Paige Miller: "Re: Mahalanobis distance & covariance matrices"
- Previous message: George Kahrimanis: "Re: Confidence interval on mean for a set of numbers"
- In reply to: Ross Clement: "Re: Fix of previously sent question on PCA"
- Messages sorted by: [ date ] [ thread ]
Relevant Pages
|