Markov chain Monte Carlo
|
Markov chain Monte Carlo (MCMC) methods, sometimes called random walk Monte Carlo methods, are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. The quality of the sample improves as a function of the number of steps.
Usually it is not hard to construct a Markov Chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing---the stationary distribution is reached quickly starting from an arbitrary position. Tools for proving rapid mixing include arguments based on conductance and the coupling method.
Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past can produce exact samples, at the cost of additional computation and an unbounded (though finite on average) running time.
The most common application of these algorithms is numerically calculating multi-dimensional integrals. In these methods, an ensemble of "walkers" moves around randomly. At each point where the walker steps, the integrand value at that point is counted towards the integral. The walker then may make a number of tentative steps around the area, looking for a place with reasonably high contribution to the integral to move into next. Random walk methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are correlated. A Markov chain is constructed in such a way as to have the integrand as its equilibrium distribution. Surprisingly, this is often easy to do.
Overview
These Markov chain Monte Carlo methods are ones where the direction the walker is likely to move depends only on where the walker is, and what the function value is in the area. These methods are easy to implement and analyse, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. This problem is called "slow mixing".
More sophisticated algorithms use some method of preventing the walker from doubling back. For example, in "self avoiding walk" or SAW routines, the walker remembers where it has been before (at least for a few steps), and avoids stepping on those locations again. These algorithms are harder to implement, but may exhibit faster convergence (i.e. fewer steps for an accurate result). Various statistical problems can occur — for example, what happens when a walker paints itself into a corner?
Multi-dimensional integrals often arise in Bayesian statistics and computational physics, so random walk Monte Carlo methods are widely used in those fields.
Random walk algorithms
- Adaptive rejection sampling: A variant of rejection sampling that modifies the proposal density on the fly.
- Metropolis-Hastings algorithm: Generates a random walk using a proposal density and a method for rejecting proposed moves.
- Gibbs sampling: Requires all the conditional distributions of the target distribution to be known in closed form. Gibbs sampling has the advantage that it does not display random walk behaviour. However, it can run into problems when variables are strongly correlated. When this happens, a technique called simultaneous over-relaxation can be used.
- Hybrid Markov chain Monte Carlo: Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics where the potential function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid MCMC is that proposals move across the sample space in larger steps and are therefore less correlated and converge to the target distribution more rapidly.
- Slice sampling: Depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. This method alternates uniform sampling in the vertical direction with uniform sampling from the horizontal `slice' defined by the current vertical position.
- Reversible Jump
References
- George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
- A.E. Gelfand and A.F.M. Smith. "Sampling-Based Approaches to Calculating Marginal Densities". J. American Statistical Association, 85:398-409, 1990.
- Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. London: Chapman and Hall. First edition, 1995. (See Chapter 11.)
- S. Geman and D. Geman. "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741, 1984.
- C.P. Robert and G. Casella. "Monte Carlo Statistical Methods" (second edition). New York: Springer-Verlag, 2004.