Re: theta = (X'X)^-1*X'y




David A. Heiser wrote:
"Reef Fish" <Large_Nassau_Gr0uper@xxxxxxxxx> wrote in message
news:1158589090.568217.92360@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Reef Fish wrote:
pekkajarvela@xxxxxxxxx wrote:
y = X*theta

=> theta = (X'X)^-1*X'*y

1. Is there a simple proof why this theta will minimize square sum in
matrix notation (y-X*theta)(y-X*theta)'?

Yes.

2. Sometimes you see that so called Marquardt is added to solution,
theta = (X'X+lambda*I)^-1*X'*y. Are there any rules how this lambda
should be chosen? This method is sometimes called as Tikhonov
regularisation. What does this regularisation mean?

This is better-known as the shrinkage estimator.

I should have mentioned Ridge Regression also, which uses the shrinkage
method to change the sign of some estimates (based on the FAULTY theory
about the signs of multiple regressions). In short, it is a
regression motivated
by erroneous theory, and resulted in unwarranted and undesirable
results.

It was quite popular in the 1970s, but has since gone out of favor,
just like
most fads. One of my Ph.D. students, Amit Mitra, did a Monte Carlo
study
of comparing a dozen or so of the Ridge estimators and basically found
none
of them to have any merit.

-- Reef Fish Bob.
++++++++++++++++++++++++++++
Intersting comment here. I have used it at times, and others in engineering
and the physical sciences have used it. It was a method to deal with data
values from "contaminations" or data recording errors or others "outside the
population" that unduly influenced the set of regression coefficient values.

The kind of Ridge Regression I was talking about was popularized by
Hoerl and
Kennard in the 1960-70 in Technometrics, where the prime motivation was
that
of getting the "wrong sign" for some of the regression coefficients, as
if those
coefficients were attached to the independent variable X, rather than
the
effect of X IN THE PRESENCE of all other variables (hence the
information
of the particial correlations rather than simple correlations).

The field of applications was dominated by the "wrong sign" excuse and
the
practice of "ridge trace" was to increase lambda and watch the changes
of
the estimated Ridge regression coefficients until the "wrong sign"
becomes
the "right sign" in the eyes of the misinterpreters.

Perhaps we are talking about different kinds of Ridge Regressions that
arose from different reasons and contexts. But the ones I've seen are
plenty, and NONE of them had anything to do with "contaminated and
missing data" issues. So, there may well be an entirely different
class
of regression called "Ridge Regression".

I am very sure of the techniques of those dozens of methods I was
talking
about because I directed an entire Ph.D. dissertation on "Ridge
Regression"
and the references cited by the student were plenty and of very wide
coverage, in the statistics related journals/

-- Reef Fish Bob.


I can see if you do monte carlo's from an identified population, it would
not be of much use. But there is a lot of "contaiminated and missing data"
that shows up manufacturing processes, where you have a fairly good idea of
what the "range of valid coefficient values should be". It came out of the
chemical industry back in the 1950's. There are still articles in
"Technometrics" on ways to deal with this problem. With all the current
methods for "inventing values for missing data" and more powerfull
techniques for finding "outliers", the method no longer is in the "vogue".

David Heiser

.



Relevant Pages

  • Regression with an EM algorithm
    ... historical hurricanes happened in Florida between 1935 - 2008. ... This second data is missing 19 values of losses. ... it is necessary to treat the missing data of losses for a further ... I chose to make a regression with EM-algorithm: ...
    (comp.soft-sys.matlab)
  • Re: Regression with an EM algorithm
    ... I need to solve with  MATLAB  (especially with EM algorithm) this ... historical hurricanes happened in Florida between 1935 - 2008. ... it is necessary to treat the missing data of losses for a further ... I chose to make a regression with EM-algorithm: ...
    (comp.soft-sys.matlab)
  • Re: theta = (XX)^-1*Xy
    ... This is better-known as the shrinkage estimator. ... I should have mentioned Ridge Regression also, ... -- Reef Fish Bob. ... But there is a lot of "contaiminated and missing data" ...
    (sci.stat.math)
  • Re: theta = (XX)^-1*Xy
    ... First of all, in Ridge Regression, the X'X matrix is ... The ordinary Ridge subtracts a fixed lambda from the ... Multiplying the diagonals of XX' by 1+delta, ...
    (sci.stat.math)
  • Re: R^2 and beta coefficients in multiple regression
    ... vector of standardized regression coefficients in ... regression coefficients. ... the Beta vector should equal R2. ... the sum of squares of the regression weights would be unchanged, ...
    (sci.stat.math)