Re: r-Squared Question





Jerry Dallal wrote:
> Reef Fish wrote:
> >
> > Jerry Dallal wrote:
> >
> >>I wrote:
> >>
> >>
> >>>It depends how you defined R2. If you define it as the square of the
> >>>correlation between observed and predicted, then it's a weakness.
> >
> >
> > What do you mean "a weakness"?
> >
> > For OLS fitted regression, R^2 is ALWAYS the correlation between
> > the observed Y and the fitted Y.
>
> You have to read the thread.

Didn't think it necessary, even now. Reading what YOU wrote sufficed.
>
> The values "Y-Yhat" below are my own calculation. The rest is from the
> person posing the question, to wit: The correlation between Y and Yhat
> is 1. If one defines R^2 for any model (not necessarily linear LS) as
> the square of the correlation between observed and predicted, then R^2
> for this example is 1. Was that an indication of a weakness in R^2 as a
> summary measure?

The perception of "weakness" was your OWN, as seen below:
>
> My point was that while, as you say, "For OLS fitted regression, R^2 is
> ALWAYS the [square of the] correlation between the observed Y and the
> fitted Y.", R^2 is not defined that way.

That's correct. It is defined as RegSS/TotSS, (see Neter, Kutner
et al or any regression textbook will do).

That's where the "proportion of variation fitted by the regresssion"
interpretation comes from. R-square ranges from 0 (random scatter
fitted by a horizontal line) to 1 for a "perfect linear fit.

> Rather it is usually defined
> as 1-ResSS/TSS (or RegSS/TSS),

No. But it's equivalent to the usual RegSS/TotSS because
RegSS + SSE (your ResSS) = TotSS.

> which, for OLS, *happens* to be the
> square of the correlation between Y and Yhat.

That's correct.

That's also WHY the correlation coefficient has DIFFERENT
interpretations in a simple regression context, depending on
which UNIT it is expressed. The correlation r is a SIGNED
measure of linear association. The Multiple R which is the
absolute value of r, is the correlation between the observed
Y and the fitted Y, and R-squared has still a THIRD
interpretation, as defined by RegSS/TotSS.

I wrote about this on June 15:

RF> On the topic related to Correlation and Causation, Harry wrote
RF> (page 17-21) on explaining the interpretation of R-square in a
RF> regression (where R is exactly the same as the correlation
RF> coefficient |r| between X and Y in a simple regression:
RF>
RF> 1. The word "explained" is sometimes erroneous thought to
RF> connote causation whereas it refers only to deviations
RF> of fitted values from the overall mean, without any
RF> implication that the regression model that produced
RF> these fitted values has captured any causal scheme
RF> underlying the data.


> [I realize there are
> *many* way to approach R^2.]

I included ALL of the correct ways above, on R and R^2.


> If one uses the formal definition of R^2
> to calculate it for this example, R^2 turns out to be -0.03, which says
> the problem is with the model, not R^2.

This is your ERROR, Jerry.

The definition of Multiple R^2 CANNOT lead to a negative value!

It is what some economists messed with and called Adjusted R^2 that
can take on negative values, to minus infinity, I think.

That is NOT statistics. That's Quackery of the Social Sciences.

A NEGATIVE R^2 should have WARNED you that it's Quackery.
Furthermore, there is NO adjustment of R^2 necessary nor does it
gain anything in the adjustment!

Jerry, I think you've been OVER-EXPOSED to social scientists to
have picked up the Quackery of a negative R^2.

Learn your regression from STATISTICS books!

-- Bob.


> >
> >>>However if you define it as 1 - ResSS/TSS, then, for an arbitrary model
> >>>fitting procedure, R2 isn't even constrained to the interval [0,1],
> >>>since ResSS might exceed TSS.
> >>>
> >>>Here
> >>> > X Y YHat Y-Yhat
> >>> > 1 101 97 4
> >>> > 2 102 99 3
> >>> > 3 103 101 2
> >>> > 4 104 103 1
> >>> > 5 105 105 0
> >>> > 6 106 107 -1
> >>> > 7 107 109 -2
> >>> > 8 108 111 -3
> >>> > 9 109 113 -4
> >>> > 10 110 115 -5
> >>>
> >>>Here, TSS=82.5 and ResSS=85, so R^2 = 1-85/82.5 = -0.03, and the fitted
> >>>line predicts worse than always using the sample mean.
> >>

.



Relevant Pages

  • Re: Problem related to a linear regression
    ... iteratively refine this function F until you get the best correlation ... In your Test_2.gif I assume that the pale blue line is ... in the first regression you minimised squared deviations in the y- ... and in the second you minimised squared deviations in the x- ...
    (sci.math)
  • Re: A basic question on Canonical Correlation Analysis
    ... > Reduced-rank Regression, and I really did not mean Principal-Component ... The Canadian Journal of Statistics 8: ... of the problem of Canonical Correlation Analysis. ... > inputs, find its least principal component, then project each sample ...
    (sci.stat.math)
  • Re: significance of correlation affected by sample size?
    ... >> value OR significance of ANY correlation coefficient! ... >> USELESS result in a regression problem. ... handed out in said conference). ...
    (sci.stat.edu)
  • Re: Problem related to a linear regression
    ... The best correlation is found by minimising ... the iterations but *before* doing the regression between Xf and Y. The ... the ODR already in the iterative part? ...
    (sci.math)
  • Re: Orthogonal Distance Regressions in R (or anywhere else)
    ... separate variances in a variance-components-concept. ... That rotation does not affect the correlation between x and y, ... But there are arbitray many other solutions for two error-terms, ... I assume, orthogonal regression finds the previous solution, ...
    (sci.stat.math)