Pseudoinverse
|
In mathematics, and in particular linear algebra, the pseudoinverse <math>A^+<math> of a <math>m \times n<math> matrix is a generalization of the inverse matrix [IG2003]. More precisely, this article talks about the Moore-Penrose pseudoinverse which was apparently independently described by Moore [Moore1920] and Penrose [Penrose1955]. A common use of the pseudoinverse is as an approximate or 'best' (Least squares) solution to a system of linear equations (see below under Applications (http:#Applications)). The pseudoinverse is defined and unique for all matrices whose entries are real or complex numbers. Usually, the pseudoinverse is computed using singular value decomposition.
Contents |
Properties of the pseudoinverse
<math>A^+<math> is the unique matrix which satisfies the following criteria:
- <math>(AA^+)^* = AA^+<math> (That is, <math>AA^+<math> is hermitian).
- <math>(A^+A)^* = A^+A<math>
- <math>A A^+A = A<math>
- <math>A^+A A^+ = A^+<math>
Here <math>M^*<math> is the conjugate transpose of a matrix M.
An alternative way to define the pseudoinverse is via a limiting process:
- <math>A^+ = \lim_{\delta \to 0} (A^T A + \delta^2 I)^{-1} A^T
= \lim_{\delta \to 0} A^T (A A^T + \delta^2 I)^{-1}<math>
(see Tikhonov regularization). These exist, even if <math>(A A^T)^{-1}<math> and <math>(A^T A)^{-1}<math> do not exist.
Useful rules
- <math>(A^+)^+ = A <math>
- <math>0^+ = 0^T <math>
- <math>(A^T)^+ = (A^+)^T <math>
- <math>(\alpha A)^+ = \alpha^{-1} A^+ <math>, for <math>\alpha \ne 0.<math>
Special cases
If the columns of <math>A<math> are linearly independent, <math>(A^T A)^{-1}<math> does exist. In this case the <math>\delta<math> summand vanishes in the first limit expression above and <math>A^+<math> is a left inverse.
- <math>A^+ = (A^T A)^{-1} A^T<math>
- <math>A^+ A= 1 . <math>
If the rows of <math>A<math> are linearly independent, <math>(A A^T)^{-1}<math> does exist. In this case the <math>\delta<math> summand vanishes in the second limit expression above and <math>A^+<math> is a right inverse.
- <math>A^+ = A^T(A A^T)^{-1}<math>
- <math>A A^+ = 1 . <math>
If both columns and rows are linearly independent (that is for quadratic, non-singular matrices), the Pseudoinverse is identical with the inverse, it has all properties of the inverse.
- <math>A A^+ = 1 <math>
- <math>A^+ A = 1 <math>
- <math>A^+ = A^{-1} . <math>
If A and B are such matrices that the product <math>AB<math> is defined and either one of them is unitary, then it holds that <math>(AB)^+ = B^+A^+<math>.
It is possible to define a pseudoinverse for scalars and vectors, too. This amounts to treating these as matrices. The pseudoinverse of a scalar <math>x<math> is zero if <math>x<math> is zero and the reciprocal of <math>x<math> otherwise:
- <math>x^+ = \left\{\begin{matrix} 0 & \mbox{if }x=0
\\ x^{-1} & \mbox{otherwise} \end{matrix}\right. <math>
The pseudoinverse of the null vector is the transposed null vector. The pseudoinverse of other vectors is the transposed vector divided by its squared magnitude:
- <math>x^+ = \left\{\begin{matrix} 0^T & \mbox{if }x = 0
\\ {x^T \over x^T x} & \mbox{otherwise} \end{matrix}\right. <math>
For proof, simply check that these definitions meet the defining criteria for the pseudoinverse.
Finding the pseudoinverse of a matrix
Let k be the rank of a <math>m \times n<math> matrix A. Then A can be decomposed as <math>A = BC<math>, where B is a <math>m \times k<math>-matrix and C is a <math>k \times n<math> matrix. Then
- <math>
A^+ = C^*(CC^*)^{-1}(B^*B)^{-1}B^*. <math>
If k is equal to m or n, then B or C can be chosen to as the identity matrix and the formula reduces to <math>A^+ = A^*(AA^*)^{-1}<math> or <math>A^+ = (A^*A)^{-1}A^*.<math>
A computationally simpler way to get the pseudoinverse is using the singular value decomposition (SVD).
If <math>A = U\Sigma V^*<math> be the singular value decomposition of A, then <math>A^+ = V\Sigma^+ U^*.<math> For a diagonal matrix such as <math>\Sigma<math>, we get the pseudoinverse by inverting each non-zero element on the diagonal.
If a pseudoinverse is already known for a given matrix, and the pseudoinverse is desired for a related matrix, the pseudoinverse for the related matrix can be computed using specialized algorithms that may need less work. In particular, if the related matrix differs from the original one by only a changed, added or deleted row or column, incremental algorithms exist that exploit the relationship [I am hesitant to write these down here, as I am not sure whether they provide an advantage over SVD at all. When I worked with them, I was not aware of the SVD method or at least I don't remember having been. Send me a note and I may find the time to write them up for WP].
Applications
The pseudoinverse provides a Least squares solution approximation to a system of linear equations (SLE) [Penrose1956].
Given a SLE <math>A x = b<math>, we look for an <math>x<math> that minimizes <math>\|A x - b\|^2<math>, where <math>\|..\|<math> denotes the Euclidean norm.
The general solution to an inhomogeneous SLE <math>A x = b<math> is the sum of a single solution of the inhomogeneous system and the general solution of the corresponding homogeneous system <math>A x = 0<math>.
Lemma: If <math>(A A^T)^{-1}<math> exists, the solution <math>x<math> can always be written as the sum of the pseudoinverse solution of the inhomogeneous system and a solution of the homogeneous system:
- <math>x = A^T(A A^T)^{-1}b + (1 - A^T(A A^T)^{-1} A)y.<math>
Proof:
- <math>\begin{matrix}A x &=& A A^T(A A^T)^{-1}b + A y - A A^T(A A^T)^{-1} A y \\
\ &=& b + A y - A y \\ \ &=& b \end{matrix}.<math>
Here, the vector <math>y<math> is arbitrary (apart from the dimensionality). In both summands, the pseudoinverse <math>A^T(A A^T)^{-1}<math> appears. If we write it as <math>A^+<math>, the equation looks like this:
- <math>x = A^+ b + (1 - A^+ A)y.<math>
The first summand is the pseudoinverse solution. In the sense of the least squares error, it is the best linear approximation to the actual solution. This means that the correction summand has minimal euclidean norm. The second summand represents a solution of the homogeneous system <math>A y = 0<math>, because <math>(1 - A^+ A)<math> is the projection on the kernel (null space) of A, while <math>(A^+A) = A^T (A A^T)^{-1} A<math> is the projection onto the image (range) of A (the space spanned by the column vectors of A).
References
- [IG2003] Adi Ben-Israel, Thomas N.E. Greville: Generalized Inverses. ISBN 0387002936, Springer-Verlag (2003)
- [Moore1920] E. H. Moore: On the reciprocal of the general algebraic matrix. Bull.Amer.Math.Soc. 26, 394-395 (1920)
- [Penrose1955] Roger Penrose: A generalized inverse for matrices. Proc. Cambridge Philos. Soc. 51, 406-413 (1955)
- [Penrose1956] Roger Penrose: On best approximate solution of linear matrix equations. Proc. Cambridge Philos. Soc. 52, 17-19 (1956)