Re: Hessian and covariance matrix



>From Tim.DeMeyer@xxxxxxxx wrote on Mon Jul 4 14:05:48 MDT 2005
>(quasi)-Newton algorithms for parameterestimation use the
>Hessian matrix. I read that the inverse of the Hessian is an asymptotic
>estimation of the covariance matrix of those "parameters". I'm
>interested in this stuff so I have a few questions:
>
>1/ Is this completely true (cause I recall having read somewhere that
>it is twice the inverse of the Hessian...)

First, it's important to distinguish between quasi-Newton methods,
the Gauss-Newton (and Levenberg-Marquardt) method, and Newton's method.

In quasi-Newton methods for nonlinear optimization, the matrix B_k is
a very crude approximation to the Hessian, which is not even certain
to converge to the true Hessian as the solution approaches optimality.
Quasi-Newton methods are typically not used to solve nonlinear least
squares problems. However, if you're using a quasi-Newton method to
solve a least squares problem, then it would definitely not be
appropriate to use B_k to estimate the covariance of the fitted
parameters. There are times when it's appropriate to use a
quasi-Newton method on such problems, particularly when the residuals
are large and the Gauss-Newton approximation is poor. In such cases,
you really need to use some other way to get at the covariance of the
fitted parameters. Do not use the inverse of B_k to obtain confidence
intervals for the fitted parameters!

More commonly, Gauss-Newton (and Levenberg-Marquardt) methods are used
to solve least squares problems. Here, the Hessian of the least
squares problem is approximated as 2*J'*J, where J is the Jacobian.
This ignores second derivative terms, but when this approximation is used
within Newton's method it tends to work well in practice on problems with
small residuals.

Note that there is a factor of 2 in this approximation to the Hessian. There's
also a factor of 2 in the approximation to the gradient of the objective
function that appears on the right hand side of the Gauss-Newton equations.
These are often simplified from

(2*J'*J)*deltax=-2*J'*f

to

(J'*J)*deltax=-J'*f.

This simplification is perfectly correct.

In computing the covariance matrix of the fitted parameters, we treat the
problem as if it were a linear least squares problem, linearizing around
the parameter values that minimize Chi^2.

For a linear least squares problem Ax=b, you get the covariance matrix
(A'*A)^(-1), even though the Hessian of the least squares problem is
2*A'*A. Similarly, for the nonlinear least squares problem, the covariance
matrix that you get by linearization is (J'*J)^(-1), without the factor
of 2!

Note that depending on how strongly nonlinear the problem is, and how
imprecise the data are, this linearization may be a very poor way to
get at confidence intervals for the fitted parameters. It's not hard
to construct examples where it works poorly.

OK, so what do we know about the solution of a nonlinear regression
problem?

If we assume that measurement errors are independent, normally
distributed, with known standard deviations then the least squares
solution provides an MLE. This MLE is asymptotically unbiased and
minimum variance. You can also compute an approximate covariance
matrix for the fitted parameters from (J'*J)^(-1), which is
asymptotically correct. You can't say much of anything for the real
situation in which you have a finite data set- the estimator isn't
unbiased, isn't necessarily minimum variance, and the covariance
matrix is only approximate because the problem isn't really
represented correctly by linearizing around the optimal parameters.
However, most people who perform nonlinear regressions go ahead and
compute approximate confidence intervals and regions for the fitted
parameters.

In the case where you don't know the measurement standard deviations
but assume that they are equal, then you can still compute an estimate
and an approximate covariance matrix. In this case, you have to estimate
the measurement standard deviation from the Chi^2 value, and the
approximate covariance matrix becomes s^2*(J'*J)^(-1). These estimates
are also asymptotically unbiased and minimum variance estimates. This
time, the t distribution gets used in computing confidence intervals for
the fitted parameters.

>2/ Is there an internet resource which gives a proof for this, or an
>understandable explanation?

I'd recommend textbooks rather than internet resources for something
like this. See for example Seber and Wild, Nonlinear Regression.

>3/ Is this always true, or only true when the parameter estimation is
>based on likelihood (or other...)

This question is somewhat vague. The logic of "covariance of the
fitted parameters" applies to the classical nonlinear regression model
in which you assume that measurement errors are independent, normally
distributed, with mean 0 and known variances (or variances assumed to
be equal), and the estimate is computed by solving the nonlinear least
squares problem. This happens to be a particular kind of maximum
likelihood estimation. If you change any of these things, then the
theory might not apply any longer. For example, all of this falls
apart if you regularize your problem and perform "penalized maximum
likelihood estimation."

If you would be more specific about the parameter estimation problem
that you are trying to solve, then we might be able to provide more
help. It might also help if you directed your questions to a more
appropriate news group, such as sci.stat.math.



--
Brian Borchers borchers@xxxxxxx
Department of Mathematics http://www.nmt.edu/~borchers/
New Mexico Tech Phone: 505-835-5813
Socorro, NM 87801 FAX: 505-835-5366
.



Relevant Pages

  • Re: Hessian and covariance matrix
    ... >>Hessian matrix. ... >>estimation of the covariance matrix of those ... > squares problem is approximated as 2*J'*J, ...
    (sci.math.num-analysis)
  • Re: Bayesian estimation of structured correlation/covariance
    ... > should formulate this as covariance matrix estimation with the ... large step of using the Wishart/ inverse Wishart distribution for the ... correlation case" and possibly for the "known equal correlation case", ...
    (sci.stat.math)
  • Re: Matrix / Estimation stuff
    ... >jax wrote: ... >your estimation algorithm and measurement uncertainty. ... >> I understand how to derive a jacobian matrix from a given process ... >A Jacobian matrix can transform a covariance matrix in one coordinate ...
    (sci.math.num-analysis)
  • Re: problems with the identification if parameters
    ... grey box estimation is basically solving a nonlinear least ... squares problem. ... compare(z, nlgr) ...
    (comp.soft-sys.matlab)
  • Re: Matrix / Estimation stuff
    ... i have a few questions regarding some math and estimation stuff. ... covariance matrix" is a 2 by 2 matrix. ... > I understand how to derive a jacobian matrix from a given process ... > If i got a nonlinear process, ...
    (sci.math.num-analysis)