Covariance matrix
|
In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions, of the concept of the variance of a scalar-valued random variable.
If <math>X<math> is a column vector with <math>n<math> scalar random variable components, and <math>\mu_k<math> is the expected value of the kth element of <math>X<math>, i.e. <math>\mu_k = \mathrm{E}(X_k)<math>, then the covariance matrix is defined as:
- <math>
\Sigma=\mathrm{E} \left[
\left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top
\right] <math>
- <math>
= \begin{bmatrix}
\mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)], & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)], & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)], & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)], & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)], & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)], & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)]
\end{bmatrix} <math>
The <math>(i,j)<math> element is the covariance between <math>X_i<math> and <math>X_j<math>.
This concept generalizes to higher dimensions the concept of variance of a scalar-valued random variable <math>X<math>, defined as
- <math>
\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2] \, <math>
where <math>\mu = \mathrm{E}(X)<math>.
Contents |
Conflicting nomenclatures and notations
Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector <math>X<math>, because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector <math>X<math>. Unfortunately, several different conventions jar to some degree with each other:
Standard notation:
- <math>
\operatorname{var}(\textbf{X}) = \mathrm{E} \left[
(\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top
\right] <math>
Also standard notation (unfortunately conflicting with the above):
- <math>
\operatorname{cov}(\textbf{X}) = \mathrm{E} \left[
(\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{X} - \mathrm{E}[\textbf{X}])^\top
\right] <math>
Also standard notation:
- <math>
\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[
(\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top
\right] <math> (the "cross-covariance" between two random vectors)
The first two of these usages conflict with each other. The first and third are in perfect harmony. The first notation is found in William Feller's universally admired two-volume book on probability.
Properties
With scalar-valued random variables <math>X<math>, we have the identity
- <math>\mathrm{var}(a X) = a^2 \mathrm{var}(X)<math>
if <math>a<math> is constant, i.e., not random. If <math>X<math> is an <math>n \times 1<math> column vector-valued random variable and <math>A<math> is an <math>m \times n<math> constant (i.e., non-random) matrix, then <math>A x<math> is an <math>m \times 1<math> column vector-valued random variable, whose variance must therefore be an <math>m \times m<math> matrix. It is
- <math>
\operatorname{var} (A \textbf{X}) = A \Sigma A^\top. <math>
For cross-covariance matrices we have
- <math>\operatorname{cov}(AX,BY)=A(\operatorname{cov}(X,Y))B^\top. \,<math>
This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way. This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.
Complex random vectors
The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:
- <math>
\operatorname{var}(z) = \operatorname{E} \left[
(z-\mu)(z-\mu)^{*}
\right] <math>
where the complex conjugate of a complex number <math>z<math> is denoted <math>z^{*}<math>.
If <math>Z<math> is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:
- <math>
\operatorname{E} \left[
(Z-\mu)(Z-\mu)^{*}
\right] <math>
where <math>Z^{*}<math> denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.
Estimation
The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a <math>1 \times 1<math> matrix than as a mere scalar. See estimation of covariance matrices.
External references
- Covariance Matrix (http://mathworld.wolfram.com/CovarianceMatrix.html) at Mathworld