Re: Problem related to a linear regression



First of all: Thank you for taking so much time for this!

On 3 Nov., 16:51, matt271829-n...@xxxxxxxxxxx wrote:

OK, I'll assume you're doing the following (perhaps omitting some
detail that is not relevant to the question).

1. You have a set of Y-values, which are your observations.
yes
2. You have a function F which takes a bunch of parameters and
produces one value corresponding to each Y, which we'll call Xf. You
iteratively refine this function F until you get the best correlation
between Xf and Y.
yes
However, Xf is *not* a direct estimate of Y, since
the correlation line need not be anywhere near Y = Xf.
I'm not sure what you mean with the direct estimate.
Regarding the correlation line: I'm not sure since we have the same
physical parameter once observed and once estimated.
3. You regress Xf and Y, yielding A and B, such that y = A + B*x is
the best fit line to the (Xf, Y) data.
yes
4. You now define Xf' = A + B*Xf (your "shifting/scaling" of the x-
axis) and plot Xf' against Y. You note that the best fit line to this
plot does not appear to be the line y = x as one would expect.
yes

In your Test_2.gif (Y versus Xf') I assume that the pale blue line is
the same as the pale blue line in Test_1.gif (Y versus Xf), except
shifted and scaled along with the data points. By definition this line
is y = x. I assume that the white line is the result of the second
regression.
yes
Now, if you did another least squares regression on (Xf',
Y) in the same way as you did the first on (Xf, Y) then, as far as I
can see, you would inevitably end up with the line y = x. The fact
that you get a different line (the white line) is, we think, because
in the first regression you minimised squared deviations in the y-
direction, and in the second you minimised squared deviations in the x-
direction. I think if you "want the white line" then you might just as
well do the first regression in the same way as you're currently doing
the second, and not bother doing the second.

To me, the pale blue line at Test_2.gif looks no better or worse than
the pale blue line at Test_1.gif. In other words, I don't think the
shifting/scaling to get from Test_1.gif to Test_2.gif has anything to
do with the problem.
I agree, this was only done to bring the results from F into the value
range of Y since the idea is to provide a function for estimating Y.
As much for my own interest as anything else, I
did a test plot athttp://img461.imageshack.us/img461/3523/regressionvj2.gif.
A very need way to show the problem in a simplified form!
The blue regression line is found by minimising squared deviations in
the y-direction, and the green regression line by minimising squared
deviations in the x-direction. You can see they are markedly
different, which I think duplicates the effect you're seeing.
Indeed very nicely.
So, which method should you use? Well, since you want to estimate Y
from Xf', it would seem that minimising squared y-distances (or
possibly minimising absolute y-differences) makes more sense than
minimising squared x-differences or ODR, even though it may "look
wrong".
Well, that's how I started by minimising - ignoring now u and v -
SUM(Y-F)^2
Given the F-parameters that you have, you want the generated
Xf' to be as near as possible to the corresponding real Y;
yes
you are not
interested in the extent to which you need to vary the parameters
supplied to F in order to get Xf' to exactly equal the real Y.
If I understand you right here: yes; the Y values have a 'natural'
standard deviation, i.e. it would be wrong to reduce the standard
deviation obtained by an increased number of observations below their
'natural' limit.
Finally, it strikes me that this is a kind of roundabout way of doing
things (probably you have very good reasons!).
That's why I mentioned at the very beginning that I probably make a
detour.
As mentioned above I started finding the parameters of a correlating
function by minimising SUM(Y-F)^2, then I tried to 'match' the results
to the size range of the observed values and gaining the impression
that the estimations could be further improved when looking at the
graph.
I don't really see why
you don't fix up your optimisation of F so that it gives you the final
estimate for Y (using whatever best fit criteria you choose), thus
obviating the need for the separate regression step.
But how? The problem is still how the values of F providing the 'best'
correlation can be 'matched' to the observed values Y?

.



Relevant Pages

  • Re: r-Squared Question
    ... to wit: The correlation between Y and Yhat ... > My point was that while, as you say, "For OLS fitted regression, R^2 is ... interpretation, as defined by RegSS/TotSS. ...
    (sci.stat.math)
  • Re: A basic question on Canonical Correlation Analysis
    ... > Reduced-rank Regression, and I really did not mean Principal-Component ... The Canadian Journal of Statistics 8: ... of the problem of Canonical Correlation Analysis. ... > inputs, find its least principal component, then project each sample ...
    (sci.stat.math)
  • Re: significance of correlation affected by sample size?
    ... >> value OR significance of ANY correlation coefficient! ... >> USELESS result in a regression problem. ... handed out in said conference). ...
    (sci.stat.edu)
  • Re: Problem related to a linear regression
    ... The best correlation is found by minimising ... the iterations but *before* doing the regression between Xf and Y. The ... the ODR already in the iterative part? ...
    (sci.math)
  • Re: Orthogonal Distance Regressions in R (or anywhere else)
    ... separate variances in a variance-components-concept. ... That rotation does not affect the correlation between x and y, ... But there are arbitray many other solutions for two error-terms, ... I assume, orthogonal regression finds the previous solution, ...
    (sci.stat.math)

Loading