Re: distribution of an outlier?
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 18 Apr 2005 08:29:33 -0700
dave@xxxxxxxxxxx wrote:
> Mat,
> >
> Furthermore note that as usual the errors from the regression must be
> Gaussian .
And the usual WRONG method of tagging possible "outliers" is to put
some number of asterisks behind those residuals that are two, three,
or more standard deviations from zero (SAS does that).
The idea behind such tagging is that if you generate ONE observation
form a Normal(0, sigma) population, then there is a rather small
probability that the random deviate is more than 3 sigma away from 0.
But in the analysis of residuals in a regression, the "outliers" are
always the LARGEST observed residuals, or the maximum order statistics!
Thus, if you do a regression analysis with 10,000 observations, say,
you will find pages full of "***" in SAS because there's nothing
unusual about MANY observed residuals more than 3 std dev. away from
zero. It would take a MUCH larger observed residual to be considered
a candidate for an "outler".
>
> For more http://www.autobox.com/outlier.html
I found MANY questionable points in the exposition given in that link.
It did identify the problem I stated above:
*> Outlier points ( points above or below 3 standard deviations )
*> are immediately identified and thus may be deleted from the next
*> stage of the analysis. The flaw in the above logic is obvious.
For different "obvious reasons". But the use of a fixed "3-sigma"
as the detection rule (based on alternative estimates of sigma)
remains throughout the link, with complete disregard of the sample
size n and the maximum-order-statistics.
Furthermore, the DETECTION of outliers is an entirely different
matter from the DELECTION of outliers.
Any DELETION of outliers is a CRIME unless you can fully justify its
deletion. One can always do a different analyses with or without
the rogue observation, or use some robust procedure(s) that are
robust to the presence of a small number of outliers (RARE observations
DO naturally occur, rarely of course).
*> Some would argue that the outliers can be identified via an
*> "influential observation approach" or "cook's distance approach".
*> Essentially this detection scheme focuses on the effect of its
*> deletion on the residual sum of squares. But this approach usually
*> fails because the outlier is an "unusual value" to its prediction
*> and that prediction requires a model.
This is a very poor characterization (so much so that it could be
considered "wrong" on the role of Cook's distance and the notion of
"influential observation" vs "outliers".
> Dave Reilly
> AUTOMATIC FORECASTING SYSTEMS
> http://www.autobox.com
> 215-675-0652
-- Bob.
.
- Follow-Ups:
- Re: distribution of an outlier?
- From: dave
- Re: distribution of an outlier?
- References:
- distribution of an outlier?
- From: Mat
- Re: distribution of an outlier?
- From: dave
- distribution of an outlier?
- Prev by Date: Re: distribution of an outlier?
- Next by Date: Re: distribution of an outlier?
- Previous by thread: Re: distribution of an outlier?
- Next by thread: Re: distribution of an outlier?
- Index(es):
Relevant Pages
|