Conditional expectation
|
In probability theory, a conditional expectation is the expected value of a real random variable with respect to a conditional probability distribution.
Contents |
Special cases
In the simplest case, if A is an event whose probability is not 0, then
- <math> \operatorname{P}_A(S) =
\frac{\operatorname{P}( A \cap S)}{\operatorname{P}(A)} <math>
is a probability measure on A and E(X | A) is the expectation of X with respect to this probability PA. In case X is a discrete random variable (that is a random variable which with probability 1 takes on only a countable number of values), and with finite first moment, the expectation is explicitly given by the infinite
- <math> \operatorname{E}(X | A) = \sum_r r \cdot \operatorname{P}_A\{X = r\} = \sum_r r \cdot \frac{\operatorname{P}(A \cap \{X = r\})}{\operatorname{P}(A)} <math>
where {X = r} is the event that X takes on the value r. Since X has finite first moment, it can be shown this sum converges absolutely. Note that the sum is countable since {X = r} has probability 0 for only countable many values of r.
Note that if X is the indicator function of an event S then E(X | A) is just the conditional probability PA(S).
If Y is another real random variable, then for each value of y we consider the event {Y = y}. (Reminder for those less-than-accustomed to the conventional language and notation of probability theory: this paragraph is an example of why case-sensitivity of notation must not be neglected, since capital Y and lower-case y refer to different things.) The conditional expectation E(X | Y = y) is shorthand for E(X | {Y = y}). Of course in general this may not be defined since {Y = y} may have zero probability.
The way out of this limitation is as follows: Note that if both X and Y are discrete random variables then for any subset B of Y
- <math> \operatorname{E}(X \ \mathbf{1}_{Y^{-1}(B)}) = \sum_{r \in B} \operatorname{E}(X| Y = r) \operatorname{P}\{Y=r\}.<math>
where 1 is the indicator function. For general random variables Y, P{Y=r} is zero. As a first step in dealing with this problem, let us consider the case Y has a continuous distribution function. This means there is a non-negative integrable function φY on R which is the density of Y. This means
- <math> \operatorname{P}\{Y \leq a\} = \int_{-\infty}^a \phi_Y(s) \ ds <math>
for any a in R. We can then show the following: for any integrable random variable X, there is a function g on R such that
- <math> \operatorname{E}(X \, \mathbf{1}_{Y \leq a}) = \int_{-\infty}^a g(t) \phi_Y(t) \ dt. <math>
This function g is a suitable candidate for the conditional expectation.
In order to handle the general case, we need more powerful mathematical machinery.
Mathematical formalism
Let X, Y be real random variables on some probability space (Ω, M, P) where M is the σ-algebra of measurable sets on which P is defined. We consider two measures on R:
- Q defined by Q(B) = P(Y−1(B)) for every Borel subset B of R is a probability measure on the real line R. Now
- PX given by
- <math> \operatorname{P}_X(B) = \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega). <math>
If X is an integrable random variable, then PX is absolutely continuous with respect to Q. In this case, it can be shown the Radon-Nikodym derivative of PX with respect to Q exists; moreover it is uniquely determined almost everywhere with respect to Q. This random variable is the conditional expectation of X given Y, or more accurately a version of the conditional expectation of X given Y.
It follows that the conditional expectation satisfies
- <math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} \operatorname{E}(X|Y)(\theta) \ d \operatorname{Q}(\theta) <math>
for any Borel subset B of R.
Conditioning as factorization
In the definition of conditional expectation that we provided above, the fact Y is a real random variable is irrelevant: Let U be a measurable space, that is a set equipped with a σ-algebra of subsets. A U-valued random variable is a function Y: Ω → U such that Y−1(B) is an element of M for any measurable subset B of U.
We consider the measure Q on U given as above: Q(B) = P(Y−1(B)) for every measurable subset B of U. Q is a probability measure on the measurable space U defined on its σ-algebra of measurable sets.
Theorem. If X is an integrable real random variable on Ω then there is one and up to equivalence a.e. relative to Q, only one integrable function g such that for any measurable subset B of U:
- <math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(u) \ d \operatorname{Q} (u). <math>
There are a number of ways of proving this; one as suggested above, is to note that the expression on the left hand side defines as a function of the set B a countably additive probability measure on the measurable subsets of U. Moreover, this measure is absolutely continuous relative to Q. Indeed Q(B) = 0 means exactly that Y−1(B) has probability 0. The integral of an integrable function on a set of probability 0 is itself 0. This proves absolute continuity.
The defining condition of conditional expectation then is the equation
- <math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} \operatorname{E}(X|Y)(u) \ d \operatorname{Q} (u). <math>
We can further interpret this equality by considering the abstract change of variables formula to transport the integral on the right hand side to an integral over Ω:
- <math> \int_{Y^{-1}(B)} X(\omega) \ d \operatorname{P}(\omega) = \int_{Y^{-1}(B)} [\operatorname{E}(X|Y) \circ Y](\omega) \ d \operatorname{P} (\omega). <math>
This equation can be interpreted to say that the following diagram is commutative in the average.
- Missing image
Conditional_expectation_commutative_diagram.png
A diagram, commutative in an average sense.
The equation means that the integrals of X and the composition E(X|Y)ˆY over sets of the form Y−1(B) for B measurable are identical.
Conditioning relative to a subalgebra
There is another viewpoint for conditioning involving σ-subalgebras N of the σ-algebra M. This version is a trivial specialization of the preceding: we simply take U to be the space Ω with the σ-algebra N and Y the identity map. We state the result:
Theorem. If X is an integrable real random variable on Ω then there is one and up to equivalence a.e. relative to P, only one integrable function g such that for any set B belonging to the subalgebra N
- <math> \int_{B} X(\omega) \ d \operatorname{P}(\omega) = \int_{B} g(\omega) \ d \operatorname{P} (\omega). <math>
This form of conditional expectation is usually written: E(X|N). This version is preferred by probabilists. One reason is that on the space of square-integrable real random variables (in other words, real random variables with finite second moment) the mapping X → E(X|N) is the self-adjoint orthogonal projection
- <math> L^2_{\operatorname{P}}(X;M) \rightarrow L^2_{\operatorname{P}}(X;N). <math>
Basic properties
Let (Ω,M,P) be a probability space.
- Conditioning with respect to a σ-subalgebra N is linear on the space of integrable real random variables.
- E(1|N) = 1
- Jensen's inequality holds: If f is a convex function,then
- <math> f(\operatorname{E}(X|N) ) \leq \operatorname{E}(f \circ X |N).<math>
- Conditioning is a contractive projection
- <math> L^s_P(X;M) \rightarrow L^s_P(X;N) <math>
for any s ≥1.
See also
Law of total probability, Law of total expectation, Law of total variance, Law of total cumulance (This fourth item generalizes the other three.)
References
- William Feller, An Introduction to Probability Theory and its Applications, vol 1, 1950
- Paul A. Meyer, Probability and Potentials, Blaisdell Publishing Co., 1966