Re: r-Squared Question
- From: "Reef Fish" <Large_Nassau_Grouper@xxxxxxxxx>
- Date: 13 Jul 2005 21:19:37 -0700
Jerry Dallal wrote:
> Reef Fish wrote:
> >
> > Jerry Dallal wrote:
> >
> >>I wrote:
> >>
> >>
> >>>It depends how you defined R2. If you define it as the square of the
> >>>correlation between observed and predicted, then it's a weakness.
> >
> >
> > What do you mean "a weakness"?
> >
> > For OLS fitted regression, R^2 is ALWAYS the correlation between
> > the observed Y and the fitted Y.
>
> You have to read the thread.
Didn't think it necessary, even now. Reading what YOU wrote sufficed.
>
> The values "Y-Yhat" below are my own calculation. The rest is from the
> person posing the question, to wit: The correlation between Y and Yhat
> is 1. If one defines R^2 for any model (not necessarily linear LS) as
> the square of the correlation between observed and predicted, then R^2
> for this example is 1. Was that an indication of a weakness in R^2 as a
> summary measure?
The perception of "weakness" was your OWN, as seen below:
>
> My point was that while, as you say, "For OLS fitted regression, R^2 is
> ALWAYS the [square of the] correlation between the observed Y and the
> fitted Y.", R^2 is not defined that way.
That's correct. It is defined as RegSS/TotSS, (see Neter, Kutner
et al or any regression textbook will do).
That's where the "proportion of variation fitted by the regresssion"
interpretation comes from. R-square ranges from 0 (random scatter
fitted by a horizontal line) to 1 for a "perfect linear fit.
> Rather it is usually defined
> as 1-ResSS/TSS (or RegSS/TSS),
No. But it's equivalent to the usual RegSS/TotSS because
RegSS + SSE (your ResSS) = TotSS.
> which, for OLS, *happens* to be the
> square of the correlation between Y and Yhat.
That's correct.
That's also WHY the correlation coefficient has DIFFERENT
interpretations in a simple regression context, depending on
which UNIT it is expressed. The correlation r is a SIGNED
measure of linear association. The Multiple R which is the
absolute value of r, is the correlation between the observed
Y and the fitted Y, and R-squared has still a THIRD
interpretation, as defined by RegSS/TotSS.
I wrote about this on June 15:
RF> On the topic related to Correlation and Causation, Harry wrote
RF> (page 17-21) on explaining the interpretation of R-square in a
RF> regression (where R is exactly the same as the correlation
RF> coefficient |r| between X and Y in a simple regression:
RF>
RF> 1. The word "explained" is sometimes erroneous thought to
RF> connote causation whereas it refers only to deviations
RF> of fitted values from the overall mean, without any
RF> implication that the regression model that produced
RF> these fitted values has captured any causal scheme
RF> underlying the data.
> [I realize there are
> *many* way to approach R^2.]
I included ALL of the correct ways above, on R and R^2.
> If one uses the formal definition of R^2
> to calculate it for this example, R^2 turns out to be -0.03, which says
> the problem is with the model, not R^2.
This is your ERROR, Jerry.
The definition of Multiple R^2 CANNOT lead to a negative value!
It is what some economists messed with and called Adjusted R^2 that
can take on negative values, to minus infinity, I think.
That is NOT statistics. That's Quackery of the Social Sciences.
A NEGATIVE R^2 should have WARNED you that it's Quackery.
Furthermore, there is NO adjustment of R^2 necessary nor does it
gain anything in the adjustment!
Jerry, I think you've been OVER-EXPOSED to social scientists to
have picked up the Quackery of a negative R^2.
Learn your regression from STATISTICS books!
-- Bob.
> >
> >>>However if you define it as 1 - ResSS/TSS, then, for an arbitrary model
> >>>fitting procedure, R2 isn't even constrained to the interval [0,1],
> >>>since ResSS might exceed TSS.
> >>>
> >>>Here
> >>> > X Y YHat Y-Yhat
> >>> > 1 101 97 4
> >>> > 2 102 99 3
> >>> > 3 103 101 2
> >>> > 4 104 103 1
> >>> > 5 105 105 0
> >>> > 6 106 107 -1
> >>> > 7 107 109 -2
> >>> > 8 108 111 -3
> >>> > 9 109 113 -4
> >>> > 10 110 115 -5
> >>>
> >>>Here, TSS=82.5 and ResSS=85, so R^2 = 1-85/82.5 = -0.03, and the fitted
> >>>line predicts worse than always using the sample mean.
> >>
.
- Follow-Ups:
- Re: r-Squared Question
- From: Jerry Dallal
- Re: r-Squared Question
- References:
- r-Squared Question
- From: Predictor
- Re: r-Squared Question
- From: Radford Neal
- Re: r-Squared Question
- From: Predictor
- Re: r-Squared Question
- From: Jerry Dallal
- Re: r-Squared Question
- From: Jerry Dallal
- Re: r-Squared Question
- From: Reef Fish
- Re: r-Squared Question
- From: Jerry Dallal
- r-Squared Question
- Prev by Date: Re: Biased and unbiased std dev
- Next by Date: Re: Biased and unbiased std dev
- Previous by thread: Re: r-Squared Question
- Next by thread: Re: r-Squared Question
- Index(es):
Relevant Pages
|