Re: theta = (X'X)^-1*X'y
- From: "Reef Fish" <Large_Nassau_Gr0uper@xxxxxxxxx>
- Date: 18 Sep 2006 16:19:20 -0700
David A. Heiser wrote:
"Reef Fish" <Large_Nassau_Gr0uper@xxxxxxxxx> wrote in message
news:1158589090.568217.92360@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
++++++++++++++++++++++++++++
Reef Fish wrote:
pekkajarvela@xxxxxxxxx wrote:
y = X*theta
=> theta = (X'X)^-1*X'*y
1. Is there a simple proof why this theta will minimize square sum in
matrix notation (y-X*theta)(y-X*theta)'?
Yes.
2. Sometimes you see that so called Marquardt is added to solution,
theta = (X'X+lambda*I)^-1*X'*y. Are there any rules how this lambda
should be chosen? This method is sometimes called as Tikhonov
regularisation. What does this regularisation mean?
This is better-known as the shrinkage estimator.
I should have mentioned Ridge Regression also, which uses the shrinkage
method to change the sign of some estimates (based on the FAULTY theory
about the signs of multiple regressions). In short, it is a
regression motivated
by erroneous theory, and resulted in unwarranted and undesirable
results.
It was quite popular in the 1970s, but has since gone out of favor,
just like
most fads. One of my Ph.D. students, Amit Mitra, did a Monte Carlo
study
of comparing a dozen or so of the Ridge estimators and basically found
none
of them to have any merit.
-- Reef Fish Bob.
Intersting comment here. I have used it at times, and others in engineering
and the physical sciences have used it. It was a method to deal with data
values from "contaminations" or data recording errors or others "outside the
population" that unduly influenced the set of regression coefficient values.
The kind of Ridge Regression I was talking about was popularized by
Hoerl and
Kennard in the 1960-70 in Technometrics, where the prime motivation was
that
of getting the "wrong sign" for some of the regression coefficients, as
if those
coefficients were attached to the independent variable X, rather than
the
effect of X IN THE PRESENCE of all other variables (hence the
information
of the particial correlations rather than simple correlations).
The field of applications was dominated by the "wrong sign" excuse and
the
practice of "ridge trace" was to increase lambda and watch the changes
of
the estimated Ridge regression coefficients until the "wrong sign"
becomes
the "right sign" in the eyes of the misinterpreters.
Perhaps we are talking about different kinds of Ridge Regressions that
arose from different reasons and contexts. But the ones I've seen are
plenty, and NONE of them had anything to do with "contaminated and
missing data" issues. So, there may well be an entirely different
class
of regression called "Ridge Regression".
I am very sure of the techniques of those dozens of methods I was
talking
about because I directed an entire Ph.D. dissertation on "Ridge
Regression"
and the references cited by the student were plenty and of very wide
coverage, in the statistics related journals/
-- Reef Fish Bob.
I can see if you do monte carlo's from an identified population, it would
not be of much use. But there is a lot of "contaiminated and missing data"
that shows up manufacturing processes, where you have a fairly good idea of
what the "range of valid coefficient values should be". It came out of the
chemical industry back in the 1950's. There are still articles in
"Technometrics" on ways to deal with this problem. With all the current
methods for "inventing values for missing data" and more powerfull
techniques for finding "outliers", the method no longer is in the "vogue".
David Heiser
.
- Follow-Ups:
- Re: theta = (X'X)^-1*X'y
- From: Scott Seidman
- Re: theta = (X'X)^-1*X'y
- References:
- theta = (X'X)^-1*X'y
- From: pekkajarvela
- Re: theta = (X'X)^-1*X'y
- From: Reef Fish
- Re: theta = (X'X)^-1*X'y
- From: Reef Fish
- Re: theta = (X'X)^-1*X'y
- From: David A. Heiser
- theta = (X'X)^-1*X'y
- Prev by Date: 3D Meshing
- Next by Date: Re: Dice problem - follow up
- Previous by thread: Re: theta = (X'X)^-1*X'y
- Next by thread: Re: theta = (X'X)^-1*X'y
- Index(es):
Relevant Pages
|