Re: Residual Plot Question
- From: shiling99@xxxxxxxxx
- Date: 26 Jan 2006 10:03:31 -0800
> > So it makes more sense to
> > PLOT residual against all X,
>
> Doesn't make more sense; There is no specific assumption of the
> independence of the errors on the observed Xs in a multiple regression.
> In a simple regression, the order of the X is always the same as the
> order of the fitted Y.
>
Suppose that data generation process as
y(i)=x(i)*beta + e(i) where x >0;
e(i)=x(i)^(alpha)*normal (*)
This will violate the assumption 4. Plot residual against x will give
some idea how 'much' the assumption is violated. Say |e(i)| roughly
stays the same or goes bigger as x is biggger/small. Varance behaves in
(*) is not uncommon in economeric analysis. For example, when X measure
is 100, then e_std=5, when 1000, e_std=25.
Of course, residual against x plot has patterns may not violate the
assumption 4. Instead it may violate the assumption 1 or 3 or both.
The regression model may have a specification error.
Residual plot is tried to verify(roughly) how one's model satisfied the
assumptions. If it cannot pass one's eye balls, I doubt it can pass
statistic tests.
I really don't understand plot yhat against residual. What is the
purpose of doing it? That is my question to OP.
BTW I enjoy your comment and all your other postings.
Reef Fish wrote:
> shiling99@xxxxxxxxx wrote:
>
> > regression assumptions
> >
> > 1) y=x*beta + e
> > 2) x is n*k with rank k
> > 3) E(e|x)=0
> > 4) E(e*e'|x)=sigma^2*I
> >
> > 5) X is a nonstochastic matrix
> > 6) e|x -- N(0,sigma^2*I)
> >
> > Regression residual is an etimate of e above.
>
> Almost perfect, so far. 6) is the i.i.d. N(0, sigmasq) I spoke about
> in
> my "Linear Independence vs Stochastic Independence" post.
>
> To check the stochastic independence of the residuals, as well as
> the homoscedasticity of the residuals, one must examine the
> sequence of residuals in some "meaningful ordering" of the FITTED
> model.
>
> The only unequivocally meaningful order of the fitted model is Y-hat,
> because it lies on the fitted hyperplane regardless of the goodness
> or badness of fit.
>
> Your initial question was why its ok to plot the residuals against
> Y-hat,
> but not Y. The "meaningful order" is the key to your answer, since
> the observed Y can be in any order of its magnitude and the regression
> fit will be the SAME, leading to the same unique ordering of the Y-hat,
> that's why the residual plots should NOT be against the observed Ys.
>
> A poorly fitted regression tends to have the effect of the large
> residuals
> associated with large Ys and vice versa, whereas no such systemic
> relation should occur when the residuals are plotted against Y-hat.
>
> > So it makes more sense to
> > PLOT residual against all X,
>
> Doesn't make more sense; There is no specific assumption of the
> independence of the errors on the observed Xs in a multiple regression.
> In a simple regression, the order of the X is always the same as the
> order of the fitted Y.
>
> > residual histogram,
>
> That's an inferior method of checking for the Normality assumption in
> your assumption (6). A histogram check or (PP plot) has nothing to
> do with the other two components (independence and homoscedasticity)
> of the usual regression assumptions.
>
> > residual time plot if
> > it is under time series analysis, etc.
>
> That's correct because the time sequence is a "meaningful order" in
> the independence assumption. But even then, the Y-hat may supercede
> the natural time order in such problems. Example: the 1975 SPSS
> Manual example, in which multiple time series were analyzed as a
> multiple regression of the values from one series to the values of the
> other three series when the OBSERVED DATA were in a time sequence
> but the sequence of TIME (actual year of measurement) was NOT
> considered as one of the variables, though it turned out to be relevant
> in the eventual consideration of the fitted regression model.
>
> > All these plots may give visual
> > idea about how about how 'BAD' your model is.
> >
> > HTH
>
> Yes, but how good or how bad must be related to WHAT assumption
> or requirement you are seeking in the plots.
>
> That's WHY the discussion that it was an exercise in futility for
> sehwail and Richard Ulrich to be blindly looking at the NORMALITY
> of the independent variables X's, in view of your stated
> regression assumption (5), which is standard; and all distributional
> assumptions are about the stochastic errors, conditioned on those
> X's, so that the distribution of the data matrix X is completely and
> totally IRRELEVANT!
>
> Graphical methods in statistics are worth a thousand words or a
> thousand analytic tests -- BUT you have to know WHAT you are
> looking for in those graphical methods that are relevant to the
> statistical procedure at hand!
>
> -- Reef Fish Bob.
.
- Follow-Ups:
- Re: Residual Plot Question
- From: Reef Fish
- Re: Residual Plot Question
- References:
- Residual Plot Question
- From: Shiquiliq
- Re: Residual Plot Question
- From: shiling99
- Re: Residual Plot Question
- From: Reef Fish
- Residual Plot Question
- Prev by Date: Re: Ranked Probability Skill Score problem
- Next by Date: sample size vs. principal component analysis (PCA)
- Previous by thread: Re: Residual Plot Question
- Next by thread: Re: Residual Plot Question
- Index(es):
Relevant Pages
|
|