Re: r-Squared Question
- From: Jerry Dallal <gdallal@xxxxxxxxxxxxxxxxxxxxxxxx>
- Date: Tue, 12 Jul 2005 21:16:37 -0300
Predictor wrote:
Let's assume some observed data, which I hope makes my question clearer:
X Y 1 101 2 102 3 103 4 104 5 105 6 106 7 107 8 108 9 109 10 110
The relationship here is obvious, but bare with me. Assume that some regression procedure (obviously not least squares) produces a linear model, YHat:
X Y YHat 1 101 97 2 102 99 3 103 101 4 104 103 5 105 105 6 106 107 7 107 109 8 108 111 9 109 113 10 110 115
YHat has a correlation ( r ) with Y of 1.0. r-squared is hence 1.0. What I'm getting at is: the r-squared is at its best possible value, yet the model is obviously suboptimal. Have I gone wrong somewhere, or is this a fundamental weakness of r-squared?
It depends how you defined R2. If you define it as the square of the correlation between observed and predicted, then it's a weakness. However if you define it as 1 - ResSS/TSS, then, for an arbitrary model fitting procedure, R2 isn't even constrained to the interval [0,1], since ResSS might exceed TSS.
Here > X Y YHat Y-Yhat > 1 101 97 4 > 2 102 99 3 > 3 103 101 2 > 4 104 103 1 > 5 105 105 0 > 6 106 107 -1 > 7 107 109 -2 > 8 108 111 -3 > 9 109 113 -4 > 10 110 115 -5
Here, TSS=82.5 and ResSS=85, so R^2 = 1-85/82.5 = -0.03, and the fitted line predicts worse than always using the sample mean.
.
- Follow-Ups:
- Re: r-Squared Question
- From: Jerry Dallal
- Re: r-Squared Question
- References:
- r-Squared Question
- From: Predictor
- Re: r-Squared Question
- From: Radford Neal
- Re: r-Squared Question
- From: Predictor
- r-Squared Question
- Prev by Date: Re: r-Squared Question
- Next by Date: Re: test of normality
- Previous by thread: Re: r-Squared Question
- Next by thread: Re: r-Squared Question
- Index(es):