Re: Cluster analysis for beginners
- From: Jerry Dallal <gdallal@xxxxxxxxxxxxxxxxxxxx>
- Date: Fri, 30 Mar 2007 12:57:12 -0400
illywhacker wrote:
On Mar 30, 1:52 pm, Jerry Dallal <gdal...@xxxxxxxxxxxxxxxxxxxx> wrote:illywhacker wrote:On Mar 30, 1:36 am, Jerry Dallal <gdal...@xxxxxxxxxxxxxxxxxxxx> wrote:In fixed level testing, one picks a level at which to perform a test.illywhacker wrote:As I believe someone has replied to you before now: calling it aOn Mar 29, 4:38 pm, David Winsemius <doe_s...@xxxxxxxxxxx> wrote:This is a "joke", of course, that results from thinking of P values asSidney <milan_y...@xxxxxx> wrote innews:24466740.1175159875339.JavaMail.jakarta@xxxxxxxxxxxxxxxxxxxxxx:1) Classical hypothesis testing is fatally flawed. No well-defined
alternative is specified, and the probability of the data is not
calculated. Rather the probability of a set of unobserved data points
is
calculated. As Jeffreys famously put it: "A hypothesis that may be
true may
be rejected because it has not predicted observable results that have
not
occurred". There is a mass of literature on this.
posterior probabilities. If P values are thought of in terms of fixed
level tests, Jeffreys' comment makes no sense.
'joke' may save you the trouble of bothering to think too hard about
its implications for your practice, but it does not, alas, remove the
force of the remark.
This implies a test statistic and a basis for choosing a critical
region, a set of outcomes that have a probability under the null equal
to the level of the test. The selection of a critical region implies
alternative hypotheses.
Then, the data are collected and the analyst merely looks to see whether
the outcome falls into the critical region. In is in this sense that
Jeffreys' comment has no meaning. Of course the frequentist is
concerned about outcomes that haven't happened. That's what the
critical region is about. Even the Bayesian understands it, despite
disagreeing with the approach.
A P value can be defined as the smallest level of significance for which
the result of the data collection will fall into a "similarly
constructed" critical region (for example, X>k for some k determined by
the level of the test). Stick with this definition, and Jeffreys'
comment is blunted.
If, on the other hand, one defines a P value as "the probability of
events as or more extreme", one plays straight man to Jeffreys'
jokester: "Why should I care about the probability of events I haven't
seen?!"
Lots or words, but beside the point. Jeffreys' remark is a catchhy
phrase designed to draw attention to the very real flaws in classical
confidence intervals and hypothesis testing. I gave references in
another post, including an immediately accessible paper. Perhaps you
would like to tell me which of the examples in Jaynes' paper is wrong?
And then tell me where Cox's theorem goes wrong?
There are lots of problems with frequentist statistical methods (CIs that aren't betworthy, the inabilty to speak directly to whether a hypothesis is true or false,...). There are problems with just about any system one can name when it comes time to put theory into practice. My comments were directed specifically at the Jeffreys quotation, to wit, there is an *excellent* reason why frequentists concern themselves with the probability of events they haven't seen.
The OP is self-described as a beginner. Hence, it is useful to point out *why*, within the theory, it's fine to be concerned about these events.
I'm happy to let you and others slog it out over which approach is best.
Hypothesis testing without an alternative will always be flawed,
because there is always at least one model that predicts the data (or
the sets of unobserved data that classical hypothesis testing likes to
calculate with) with certainty, and which will therefore always be
better than any other hypothesis. Why should we discard this model?
Prior knowledge of course. And if prior knowledge about this model,
why not others? And now the whole thing is up in the air.
I notice you did not address this.
illywhacker;
Hypothesis testing without an alternative seems impossible. There may be exceptions escaping me at the moment, but how is one to construct a critical region without specifying alternatives explicitly or implicitly?
.
- Follow-Ups:
- Re: Cluster analysis for beginners
- From: illywhacker
- Re: Cluster analysis for beginners
- References:
- Cluster analysis for beginners
- From: Sidney
- Re: Cluster analysis for beginners
- From: David Winsemius
- Re: Cluster analysis for beginners
- From: illywhacker
- Re: Cluster analysis for beginners
- From: Jerry Dallal
- Re: Cluster analysis for beginners
- From: illywhacker
- Re: Cluster analysis for beginners
- From: Jerry Dallal
- Re: Cluster analysis for beginners
- From: illywhacker
- Cluster analysis for beginners
- Prev by Date: Re: Cluster analysis for beginners
- Next by Date: Re: Thousands of solutions manuals
- Previous by thread: Re: Cluster analysis for beginners
- Next by thread: Re: Cluster analysis for beginners
- Index(es):
Relevant Pages
|
|