Re: derivative of the matrix log



In article <1162850241_105@xxxxxxxxxxxxxxxx>,
Pouya D. Tafti <p.d.tafti@xxxxxxxx> wrote:

Hello everyone,

In one of the exercises for a noncredit course that I am
attending it has been asked to show the following:

d logA = A^-1 dA. (1)

Here A is a nonsingular matrix and d supposedly denotes
differentiation -- I am interpreting it as being with
respect to a scalar parameter, but I may be wrong.

It makes no difference whether it is a scalar
parameter or not. The characterization of df(A)
is the limit of (f(A+e*dA)-f(A))/e.

However, the assumed result cannot be correct
unless A and dA commute; if it were, transpose,
use (1) again, and transpose back. The A^-1 and
dA are interchanged.

There is a theorem like (1) which is very useful
in statistics, and easily proved from the old version
of how to compute a determinant; it is that

d log |A| = tr (A'^-1)dA.

This is true because the multilinearity of the determinant
shows that the derivative of the determinant with respect
to an element of a matrix is the corresponding element of
the adjoint, and the inverse is the transpose of the
adjoint divided by the determinant. The determinant being
a function of one real value, the rest holds.

Below is a description of my incomplete attempt at deriving
(1). As you will see, I could certainly benefit from some
advice.

First of all, I remember (somewhat vaguely) from undergrad
school that matrix exponentials may be defined by the
absolutely convergent series

I + sum_i>0 A^i / i!.

I don't remember having seen matrix logarithms before, but
if A is symmetric positive-definite with eigen-decomposition
USU', then

logA := U logS U'

poses itself as an agreeable definition, as it satisfies

exp logA = log expA = A.

For arbitrary A this definition may not work; but then
again, regarding log as the inverse of exp one may generally
write

A = I + sum_i>0 (logA)^i / i!. (2)

Now if for some matrix X, X and dX commute, one can show
that

d(X^n) = n X^(n-1) dX.

Using this and assuming that (2) can be differentiated
term-by-term (by some extension of the corresponding scalar
result), I can derive (1), but only if logA and d(logA)
commute. Is this extra condition really necessary for (1)
to hold? If yes, how restrictive is it? If no, how can one
prove (1) without it?

Thanks very much,
--
Pouya D. Tafti
p dot d dot tafti at ieee dot org


--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hrubin@xxxxxxxxxxxxxxx Phone: (765)494-6054 FAX: (765)494-0558
.



Relevant Pages