Re: theta = (X'X)^-1*X'y




pekkajarvela@xxxxxxxxx wrote:
y = X*theta

=> theta = (X'X)^-1*X'*y

1. Is there a simple proof why this theta will minimize square sum in
matrix notation (y-X*theta)(y-X*theta)'?

Yes.

2. Sometimes you see that so called Marquardt is added to solution,
theta = (X'X+lambda*I)^-1*X'*y. Are there any rules how this lambda
should be chosen? This method is sometimes called as Tikhonov
regularisation. What does this regularisation mean?

This is better-known as the shrinkage estimator. Stein's resutt
guarantees
that there is some lambda>0 that minimizes the MSE, but that's an
existence theorem without telling how to find it. In practice, there
had
been suggestions on how to estimate lambda-hat from the data. But
however lambda is chosen, there is no quarantee for anything.


3. I could find theta = (X'X)^-1*X'*y by gradient methods such as
deepest descent but it is said that these methods sometimes end in
local minimum instead of global.

Why do you need to use any gradient method? It's a straightforward
matrix inversion that yields the minimum.

Is there a way to used MCMC or some
other bayesian method to find the global minimum? I understand that in
gradient method you always evaluate the gradient of squares sum (SS)
sufrace and try to find in which direction SS diminishes most rapidly.
Then you update the parameter vector with a little step to that
direction, dtheta, so that new parameter vector is theta_new =
theta_old + dtheta. But what is the update scheme in stochastic
methods? How do you change theta in stochastic methods in order to end
in minimum? Is it according to some distribution?

I think your ideas about the computation of the solution of the OLS
estimate
of a linear model is ill-founded.

-- Reef Fish Bob.

Cheers,
-PJ

.



Relevant Pages

  • theta = (XX)^-1*Xy
    ... Is there a simple proof why this theta will minimize square sum in ... theta_old + dtheta. ... How do you change theta in stochastic methods in order to end ...
    (sci.stat.math)
  • Re: theta = (XX)^-1*Xy
    ... Is there a simple proof why this theta will minimize square sum in ... -- Reef Fish Bob. ... gradient method you always evaluate the gradient of squares sum ... How do you change theta in stochastic methods in order to end ...
    (sci.stat.math)
  • Re: theta = (XX)^-1*Xy
    ... Is there a simple proof why this theta will minimize square sum in ... You could use partial differentiation, but the easist way is to write ... Department of Statistics, University of Warwick, Coventry CV4 7AL, UK ...
    (sci.stat.math)