Re: Problem related to a linear regression
- From: Ray Koopman <koopman@xxxxxx>
- Date: Fri, 02 Nov 2007 09:20:23 -0700
On Nov 2, 5:56 am, MET <Marcel.E.Tschu...@xxxxxxxxx> wrote:
Thank you Matt for having had a closer look at my problem.
On 2 Nov., 05:06, matt271829-n...@xxxxxxxxxxx wrote:
I'm kind of confused. Initially you found the values of A and B that
give the best fit of the line y = A + B*x to the data points (Xf, Y)?
Yes. Without having looked at it in detail I did expect that this
would provide the 'best' estimation between Xf and Y.
Then you applied the transformation Xf' = A + B*Xf, where Xf' is the
transformed X-value?
Yes, finding out that A and B describe a regression line for the data
points (Xf',Y) corresponds to Y=Xf', i.e. the data points are arranged
'best' around the line corresponding to 100 percent correlation. In a
'normal' case I would have expected that this line would also be the
'best' estimate between y and x. My confusion comes from the situation
that by looking at the Y/Xf'-graph the line Y=Xf' doesn't (yet) seem
to be the 'best' estimate between Y and Xf'.
(This is what I asume you mean by "the X-values
were scaled with A and B": you're transforming x so as to take the
I used the word 'scaled' since A and B don't affect the correlation
found for Xf; they just move the data points in the xy plane. It is
apparently in this context not the correct term.
line y = A + B*x to the line y = x.) And now you're wondering if this
choice of A and B actually gives the best fit of the transformed data
points (Xf', Y) to the line y = x?
Yes. As explained above it actually is the 'best' fit related to the
correlation (data points nearest to Y=Xf') but probably not for the
'best' estimate of Y from Xf'.
Doesn't it amount to exactly the same thing? In the first case you're
choosing A and B so as to minimise the sum of (Y - (A + B*Xf))^2, and
in the second case you're choosing A and B so as to minimise the sum
of (Y - Xf')^2, where Xf' = A + B*Xf?
Maybe I misunderstood it...
In the first case (for finding A and B) I minimise the sum of (Y-
Xf)^2. Oooooops, while writing this I just realise that it is for this
reason that I get with A and B Y=Xf'. (It really helps sometime to
explain what one is doing to recognise the problem.) The reason for
choosing (Y-Xf)^2 is the fact that Y and Xf refer to the same physical
parameter, Y the observed values and Xf the values estimated from
other parameters. (It's actually not for a calibration, but it is this
type of problem.)
For finding the 'best' (linear) relation between Xf' and Y (Y=C+D*Xf')
the sum of (Xf'-(Y-C)/D)^2 were minimised. In the xy graph this
regression line 'looks' then also more to represent the 'best'
estimate of Y from the Xf' values.
Finally remains the following question: which procedure is under the
numerical aspect preferable, a direct one (Xf <-> Y) or - as done now
- the indirect one (Xf <-> Xf' <-> Y) with Xf being a non-linear
function?
Thank you for your help!
Regards Marcel
Minimizing sum (Y - (A + B*X))^2 requires B = r*Sy/Sx, where
r is the correlation and Sy & Sx are the standard deviations.
X' = A + B*X has standard deviation Sx' = B*Sx = r*Sy.
Then minimizing sum (X' - (Y-C)/D)^2 requires 1/D = r*Sx'/Sy = r^2,
and D will always be > 1.
In the first regression, you minimized error along the Y axis, but
in the second you switched to minimizing along the (rescaled) X axis.
Perhaps you should look into "orthogonal distance" regression,
which accomodates error on both axes simultaneously.
.
- Follow-Ups:
- References:
- Problem related to a linear regression
- From: MET
- Re: Problem related to a linear regression
- From: matt271829-news
- Re: Problem related to a linear regression
- From: MET
- Problem related to a linear regression
- Prev by Date: Re: Does This Series Look Familiar?
- Next by Date: Re: Is a line segment composed of points?
- Previous by thread: Re: Problem related to a linear regression
- Next by thread: Re: Problem related to a linear regression
- Index(es):
Relevant Pages
|