Mercer's theorem
|
In mathematics and functional analysis Mercer's theorem is a representation of a symmetric positive-definite function on a square as a sum of a convergent sequence of product functions. This is one of the most notable results of the work of James Mercer. It is an important theoretical tool in the theory of integral equations; it is also used in the Hilbert space theory of stochastic processes, for example the Karhunen-Loève theorem (cf. Karhunen-Loève transform).
Contents |
Introduction
To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation. A kernel is a continuous function that maps
- <math> K: [a,b] \times [a,b] \rightarrow \mathbb{R}<math>
such that K(x, s) = K(s, x).
K is said to be non-negative definite iff
- <math> \sum_{i=1}^n\sum_{j=1}^n K(x_i, x_j) c_i c_j \geq 0<math>
for all finite sequences of points x1,...,xn of [a, b] and all choices of real numbers c1,...,cn. (Cf. positive-definite function.)
Associated to K is a linear operator on functions defined by the integral
- <math> [T_K \phi](x) =\int_a^b K(x,s) \phi(s) d s <math>
For technical considerations we assume φ can range through the space L2[a,b] of square-integrable real-valued functions. Since T is a linear operator, we can talk about eigenvalues and eigenfunctions of T.
Theorem. Suppose K is a continuous symmetric non-negative definite kernel. Then there is an orthonormal basis {ei}i of L2[a,b] consisting of eigenfunctions of TK such that corresponding sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation
- <math> K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t) <math>
where the convergence is absolute and uniform.
Details
We now explain in greater detail the structure of the proof of Mercer's theorem, particularly how it relates to spectral theory for compact linear operators.
- The map K → TK is injective.
- TK is a non-negative symmetric compact operator on L2[a,b]; moreover K(x, x) ≥ 0.
To show compactness, show that the image of the unit ball of L2[a,b] under TK equicontinuous and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C([a,b]) with the uniform norm and a fortiori in L2[a,b].
Now apply the spectral theorem for compact operators on Hilbert spaces to TK to show the existence of the orthonormal basis {ei}i of L2[a,b]
- <math> \lambda_i e_i(t)= [T_K e_i](t) = \int_a^b K(t,s) e_i(s) ds <math>
If λi ≠ 0, the eigenvector ei is seen to be continuous on [a,b]. Now
- <math> \sum_{i=1}^\infty \lambda_i |e_i(t) e_i(s)| \leq \sup_{x \in [a,b]} |K(x,x)|^2 <math>
which shows that the sequence
- <math> \sum_{i=1}^\infty \lambda_i e_i(t) e_i(s) <math>
converges absolutely and uniformly to a kernel K0 which is easily seen to define the same operator as the kernel K. Hence K=K0 from which Mercer's theorem follows.
Trace
The following is immediate:
Theorem. Suppose K is a continuous symmetric non-negative definite kernel; TK has a sequence of nonnegative eigenvalues {λi}i. Then
- <math> \int_a^b K(t,t) dt = \sum_i \lambda_i <math>
This shows that the operator TK is a trace class operator and
- <math> \operatorname{trace}(T_K) = \int_a^b K(t,t) dt <math>
Generalizations
The first generalization replaces the interval [a,b] with any compact Hausdorff space and Lebesgue measure on [a,b] is replaced by a finite countably additive measure μ on the Borel algebra of X whose support is X. This means that μ(U) > 0 for any open subset U of X. Then essentially the same result holds:
Theorem. Suppose K is a continuous symmetric non-negative definite kernel on X. Then there is an orthonormal basis {ei}i of L2μ(X) consisting of eigenfunctions of TK such that corresponding sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation
- <math> K(s,t) = \sum_{j=1}^\infty \lambda_j \, e_j(s) \, e_j(t) <math>
where the convergence is absolute and uniform on X.
The next generalization deals with representations of measurable kernels.
Let (X, M, μ) be a σ-finite measure space. An L2 (or square integrable) kernel on X is a function
- <math> K \in L^2_{\mu \otimes \mu}(X \times X) <math>
L2 kernels define a bounded operator TK by the formula
- <math> \langle T_K \phi, \psi \rangle = \int_{X \times X} K(y,x) \phi(y) \psi(x) d [\mu \otimes \mu](y,x) <math>
TK is a compact operator (actually it is even a Hilbert-Schmidt operator). If the kernelK is symmetric, by the compact spectral theorem, TK has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {ei}i (regardless of separability).
Theorem. If K is a symmetric non-negative definite kernel on(X, M, μ), then
- <math> K(y,x) = \sum_{i \in \mathbb{N}} \lambda_i e_i(y) e_i(x) <math>
where the convergence in the L2 norm.
References
- Adriaan Zaanen, Linear Analysis, North Holland Publishing Co., 1960
- Richard Courant and David Hilbert, Methods of Mathematical Physics, vol 1, Interscience 1953,
- Robert Ash, Information Theory, Dover Publications, 1990