Completeness (statistics)
|
Suppose a random variable X (which may be a sequence (X1, ..., Xn) of scalar-valued random variables), has a probability distribution belonging to a known family of probability distributions, parametrized by θ, which may be either vector- or scalar-valued. A function g(X) is an unbiased estimator of zero if the expectation E(g(X)) remains zero regardless of the value of the parameter θ. Then X is a complete statistic precisely if it admits no such unbiased estimator of zero.
For example, suppose X1, X2 are independent, identically distributed random variables, normally distributed with expectation θ and variance 1. Then X1 — X2 is an unbiased estimator of zero. Therefore the pair (X1, X2) is not a complete statistic. On the other hand, the sum X1 + X2 can be shown to be a complete statistic. That means that there is no non-zero function g such that
- <math>E(g(X_1+X_2))<math>
remains zero regardless of changes in the value of θ. That fact may be seen as follows. The probability distribution of X1 + X2 is normal with expectation 2θ and variance 2. Its probability density function is therefore
- <math>{\rm constant}\cdot\exp\left(-(x-2\theta)^2/4\right).<math>
The expectation above would therefore be a constant times
- <math>\int_{-\infty}^\infty g(x)\exp\left(-(x-2\theta)^2/4\right)\,dx.<math>
A bit of algebra reduces this to
- <math>[{\rm a\ nowhere\ zero\ function\ of\ }\theta]\times\int_{-\infty}^\infty
h(x)\,e^{x\theta}\,dx{\rm\ where\ }h(x)=g(x)\,e^{-x^2/4}.<math>
As a function of θ this is a two-sided Laplace transform of h(x), and cannot be identically zero unless h(x) zero almost everywhere.
One reason for the importance of the concept is the Lehmann-Scheffé theorem, which states that a statistic that is complete, sufficient, and unbiased is the best unbiased estimator, i.e., the one that has a smaller mean squared error than any other unbiased estimator, or, more generally, a smaller expected loss, for any convex loss function.