Conditional distribution
|
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value.
For discrete random variables, the conditional probability mass function can be written as P(Y = y | X = x). From the definition of conditional probability, this is
- <math>P(Y=y|X=x) = \frac{P(X=x\ \mathrm{and}\ Y=y)}{P(X=x)}= \frac{P(X=x|Y=y) P(Y=y)}{P(X=x)}.<math>
Similarly for continuous random variables, the conditional probability density function can be written as pY|X(y | x) and this is
- <math>p_{Y|X}(y|x) = \frac{p_{X,Y}(x,y)}{p_X(x)}= \frac{p_{X|Y}(x|y)p_Y(y)}{p_X(x)}<math>
where pX,Y(x, y) gives the joint distribution of X and Y, while pX(x) gives the marginal distribution for X.
The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem: Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.
If for discrete random variables P(Y = y | X = x) = P(Y = y) for all x and y, or for continuous random variables pY|X(y | x) = pY(y) for all x and y, then Y is said to be independent of X (and this implies that X is also independent of Y).
Seen as a function of y for given x, P(Y = y | X = x) is a probability and so the sum over all y (or integral if it is a density) is 1. Seen as a function of x for given y, it is a likelihood function, so that the sum over all x need not be 1.