Cochran's theorem
|
In statistics, Cochran's theorem is used in the analysis of variance.
Suppose U1, ..., Un are independent standard normally distributed random variables, and an identity of the form
- <math>
\sum_{i=1}^n U_i^2=Q_1+\cdots + Q_k <math>
can be written where each Qi is a sum of squares of linear combinations of the Us. Then if
- <math>
r_i+\cdots +r_k=n <math>
where ri is the rank of Qi, Cochran's theorem states that the Qi are independent, and Qi has a chi-square distribution with ri degrees of freedom.
Cochran's theorem is the converse of Fisher's theorem.
Example
If X1, ..., Xn are independent normally distributed random variables with mean μ and standard deviation σ then
- <math>U_i=(X_i-\mu)/\sigma<math>
is standard normal for each i.
It is possible to write
- <math>
\sum U_i^2=\sum\left(\frac{X_i-\overline{X}}{\sigma}\right)^2 + n\left(\frac{\overline{X}-\mu}{\sigma}\right)^2 <math>
(here, summation is from 1 to n, that is over the observations). To see this identity, multiply throughout by <math>\sigma<math> and note that
- <math>
\sum(X_i-\mu)^2= \sum(X_i-\overline{X}+\overline{X}-\mu)^2 <math>
and expand to give
- <math>
\sum(X_i-\overline{X})^2+\sum(\overline{X}-\mu)^2+ 2\sum(X_i-\overline{X})(\overline{X}-\mu). <math>
The third term is zero because it is equal to a constant times
- <math>\sum(\overline{X}-X_i),<math>
and the second term is just n identical terms added together.
Combining the above results (and dividing by σ2), we have:
- <math>
\sum\left(\frac{X_i-\mu}{\sigma}\right)^2= \sum\left(\frac{X_i-\overline{X}}{\sigma}\right)^2 +n\left(\frac{\overline{X}-\mu}{\sigma}\right)^2 =Q_1+Q_2. <math>
Now the rank of Q2 is just 1 (it is the square of just one linear combination of the standard normal variables). The rank of Q1 can be shown to be n − 1, and thus the conditions for Cochran's theorem are met.
Cochran's theorem then states that Q1 and Q2 are independent, with Chi-squared distribution with n − 1 and 1 degree of freedom respectively.
This shows that the sample mean and sample variance are independent; also
- <math>
(\overline{X}-\mu)^2\sim \frac{\sigma^2}{n}\chi^2_1. <math> To estimate the variance σ2, one estimator that is often used is
- <math>
\hat{\sigma^2}= \frac{1}{n}\sum\left( X_i-\overline{X}\right)^2 <math>.
Cochran's theorem shows that
- <math>
\hat{\sigma^2}\sim \frac{\sigma^2}{n}\chi^2_{n-1} <math>
which shows that the expected value of <math>\hat{\sigma}^2<math> is σ2n/(n − 1).
Both these distributions are proportional to the true but unknown variance σ2; thus their ratio is independent of σ2 and because they are independent we have
- <math>
\frac{\left(\overline{X}-\mu\right)^2} {\frac{1}{n}\sum\left(X_i-\overline{X}\right)^2}\sim F_{1,n} <math>
where F1,n is the F-distribution with 1 and n degrees of freedom (see also Student's t-distribution).pl:twierdzenie Cochrana it:Teorema di Cochran