Gibbs sampling
|
In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution (as with a histogram), or to compute an integral (such as an expected value). Gibbs sampling is a special case of the Metropolis-Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm. The algorithm is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was devised by Geman and Geman (citation below), some decades after the passing of Gibbs, and is also called the Gibbs sampler.
Gibbs sampling is applicable when the joint distribution is not known explicitly, but the conditional distribution of each variable is known. The Gibbs sampling algorithm is to generate an instance from the distribution of each variable in turn, conditional on the current values of the other variables. It can be shown (see, for example, Gelman et al.) that the sequence of samples comprises a Markov chain, and the stationary distribution of that Markov chain is just the sought-after joint distribution.
Gibbs sampling is particularly well-adapted to sampling the posterior distribution of a Bayesian network, since Bayesian networks are typically specified as a collection of conditional distributions. BUGS (link below) is a program for carrying out Gibbs sampling on Bayesian networks.
Implementation
Suppose that a sample X is taken from a distribution depending on a parameter <math>\theta \in \Theta<math>, with prior distribution <math> g(\theta_1, \ldots , \theta_d)<math> where <math>|\Theta| = d <\infty<math>. It may be that d is very large and numerical integration to find the marginal densities of the <math>\theta_i<math> will be too computationally expensive. Instead a Markov chain is created on the space <math>\Theta<math> as follows:
- Pick a random suffix <math>1 \leq j \leq d<math>
- Pick a new value for <math>\theta_j<math> according to <math>g(\theta_1, \ldots , \theta_{j-1} , \, \cdot \, , \theta_{j+1} , \ldots , \theta_d )<math>
Then it can be shown that this is a reversible markov chain with invariant distribution g as follows. Define <math>x \sim_j y<math> if <math>x_i = y_i<math> for all <math>j \not = i<math> and let <math>p_{xy}<math> denote the probability of a jump from <math>x \in \Theta<math> to <math>y \in \Theta<math>. Then
- <math>p_{xy} = \frac{1}{d}\frac{g(y)}{\sum_{z \in \Theta: z \sim_j y} g(z) } <math>
so
- <math>
\begin{matrix} g(x) p_{xy} & = & \frac{1}{d}\frac{ g(x) g(y)}{\sum_{z \in \Theta: z \sim_j y} g(z) } \\ & = & \frac{1}{d}\frac{ g(y) g(x)}{\sum_{z \in \Theta: z \sim_j x} g(z) } \\ & = & g(y) p_{yx} \\ \end{matrix} <math>
so the detailed balance equations are satisfied showing that the chain is reversible, and that it has invariant distribution g.
In practice, the suffix j is not chosen at random, and the chain cycles through the suffices in order.
References
- George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
- A.E. Gelfand and A.F.M. Smith. "Sampling-Based Approaches to Calculating Marginal Densities". J. American Statistical Association, 85:398-409, 1990.
- Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. London: Chapman and Hall. First edition, 1995. (See Chapter 11.)
- S. Geman and D. Geman. "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741, 1984.
- C.P. Robert and G. Casella. "Monte Carlo Statistical Methods" (second edition). New York: Springer-Verlag, 2004.
External links
- The BUGS Project - Bayesian inference Using Gibbs Sampling (http://www.mrc-bsu.cam.ac.uk/bugs)