Re: Pearson R squared, a percentage?
- From: "The Qurqirish Dragon" <qurqirishd@xxxxxxx>
- Date: 7 Nov 2006 07:35:59 -0800
JeeBee wrote:
Dear Math readers,
Recently, I read some theory on correlation, and came across
Pearson's R. There was something I cannot figure out and started
to browse the web.
I understand how to determine correlation etc. And what a value of
R means varying from -1 to 1. (we don't need to compute, only guess the value of R).
This, I can also understand (useful page for me):
http://cnx.org/content/m10952/latest/
up to the fact where they divide by sqrt(sum(x^2) * sum(y^2)).
But the real mystery to me is the following:
At college I've been told that R^2 (R squared, coefficient of determination?)
denotes a percentage. At some web page I've found this:
"percent of variance explained. For instance, if r^2 is .25,
then the independent variable is said to explain 25% of the
variance in the dependent variable."
Can somebody further explain this? I have no clue where this
comes from.
Thanks in advance,
JeeBee.
When doing regression (fitting a curve to data), if the curve hits
every data point, then the R^2 value for that curve fit would be 1, and
so the relationship between the dependant and independant variables is
completely explained by the curve.
Now, generally the curve will not hit all the data points- particularly
since regression is normally done with a simple curve (linear,
quadratic, exponential, etc.) and this will not hit data exactly (in
general). For example, if you drive 10 miles to work, and it takes you
15 minutes to do so, you could approximate your distance along the trip
as:
distance (in miles) = 2/3 * time (in minutes).
However, if you plot where you are at many points in time, you will
most likely not be on this curve (traffic lights, turns,
acceleration/deceleration time, changing speed limits, etc.)
Let's say you want to know how much your average speed (2/3 miles per
minute = 40 miles per hour) explains your position, as opposed to the
effect of any of the other things I listed.
Let's say you do a linear regression and get an R^2 value of .99. This
would imply that 99% of the relation between time and distance can be
explained by the average speed. (for example, you drive along a
straight road with no lights and a 40 mph speed limit.
Now, what if you get an R^2 value of .5? This would imply only 50% of
the relationship can be explained this way. Perhaps you travel some of
the time on a 65 mph highway, and some in 25 mph city streets with
traffic lights, so your numbers do not line up well.
What if you have a teleporter, but it takes 15 minutes to charge up?
Thus, you travel the 10 miles in 15 minutes, but the motion is only at
the end. You'll get an R^2 value close to 0 here, as there is no
relationship between the time and distance.
This is particularly helpful in multiple-regression, where you want to
analyze the relative impact of several factors, but those are a bit
more complicated to calculate.
.
- References:
- Pearson R squared, a percentage?
- From: JeeBee
- Pearson R squared, a percentage?
- Prev by Date: Re: History, Mystery and Chemistry of E=mc2.
- Next by Date: Calculator from Google gadget
- Previous by thread: Re: Pearson R squared, a percentage?
- Next by thread: An infinite debate
- Index(es):
Relevant Pages
|