Markov chain
|
In mathematics, a (discrete-time) Markov chain, named after Andrei Markov, is a discrete-time stochastic process with the Markov property. In such a process, the past is irrelevant for predicting the future given knowledge of the present.
There are also continuous-time Markov chains.
A Markov chain is a sequence X1, X2, X3, ... of random variables. The range of these variables, i.e., the set of their possible values, is called the state space, the value of Xn being the state of the process at time n. If the conditional probability distribution of Xn+1 on past states is a function of Xn alone, then:
- <math> P(X_{n+1}=x|X_0, X_1, X_2, \ldots, X_n) = P(X_{n+1}=x|X_n). \, <math>
Where x is some state of the process. The identity above identifies the Markov property.
Andrei Markov produced the first results (1906) for these processes. A generalization to countably infinite state spaces was given by Kolmogorov (1936). Markov chains are related to Brownian motion and the ergodic hypothesis, two topics in physics which were important in the early years of the twentieth century, but Markov appears to have pursued this out of a mathematical motivation, namely the extension of the law of large numbers to dependent events.
Contents |
Properties of Markov chains
A Markov chain is characterized by the conditional distribution
- <math> P(X_{n+1}| X_n)\, <math>
which is called the transition probability of the process. This is sometimes called the "one-step" transition probability. The probability of a transition in two, three, or more steps is derived from the one-step transition probability and the Markov property:
- <math> P(X_{n+2}|X_n) = \int P(X_{n+2},X_{n+1}|X_n)\,dX_{n+1}
= \int P(X_{n+2}|X_{n+1}) \, P(X_{n+1}|X_n) \, dX_{n+1}<math>
Likewise,
- <math> P(X_{n+3}|X_n) = \int P(X_{n+3}|X_{n+2}) \int P(X_{n+2}|X_{n+1}) \, P(X_{n+1}|X_n) \, dX_{n+1} \, dX_{n+2}<math>
These formulas generalize to arbitrary future times n + k by multiplying the transition probabilities and integrating k times.
The marginal distribution P(Xn) is the distribution over states at time n. The initial distribution is P(X0). The evolution of the process through one time step is described by
- <math> P(X_{n+1}) = \int P(X_{n+1}|X_n)\,P(X_n)\,dX_n <math>
This is a version of the Frobenius-Perron equation. There may exist one or more state distributions π such that
- <math> \pi(X) = \int P(X|Y)\,\pi(Y)\,dY<math>
where Y is just a convenient name for the variable of integration. Such a distribution π is called a stationary distribution or steady-state distribution. A stationary distribution is an eigenfunction of the conditional distribution function, associated with the eigenvalue 1.
Whether or not there is a stationary distribution, and whether or not it is unique if it does exist, are determined by certain properties of the process. Irreducible means that every state is accessible from every other state. A process is periodic if there exists at least one state to which the process will continually return with a fixed time period (greater than one). Aperiodic means that there is no such state. Positive recurrent means that the expected return time is finite for every state. Sometimes the terms indecomposable, acyclic, and persistent are used as synonyms for "irreducible", "aperiodic", and "recurrent", respectively. When the state space of a Markov chain is not irreducible, it may be partitioned into a set of (irreducible) communicating classes, each of which may be classified as above. The problem of classification is an important one in the mathematical study of Markov chains and related stochastic processes.
If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Then the average of a function f over samples of the Markov chain is equal to the average with respect to the stationary distribution,
- <math> \lim_{n\rightarrow\infty}\; \frac{1}{n} \; \sum_{k=0}^{n-1} f(X_k)
= \int f(X)\,\pi(X)\,dX <math>
In particular, this holds for f equal to the identity function. Thus the average of sample values over time is equal to the expected value of the stationary distribution.
Furthermore, the equivalence of averages also holds if f is the indicator function on some subset A of the state space.
- <math> \lim_{n\rightarrow\infty}\; \frac{1}{n} \; \sum_{k=0}^{n-1} \chi_A(X_k)
= \int_A \pi(X)\,dX = \mu_{\pi}(A) <math>
where μπ is the measure induced by π. This makes it possible to approximate the stationary distribution by a histogram or other density estimate of a sequence of samples.
Markov chains in discrete state spaces
If the state space is finite, the transition probability distribution can be represented as a matrix, called the transition matrix, with the (i, j)'th element equal to
- <math>P(X_{n+1}=i\mid X_n=j) \,<math>
For a discrete state space, the integrations in the k-step transition probability are summations, and can be computed as the k'th power of the transition matrix. That is, if P is the one-step transition matrix, then Pk is the transition matrix for the k-step transition.
Writing P for the transition matrix, a stationary distribution is a vector which satisfies the equation
- <math> P\pi^* = \pi^*<math>.
In this case, the stationary distribution <math>\pi^*<math> is an eigenvector of the transition matrix, associated with the eigenvalue 1.
If the transition matrix P is irreducible, and aperiodic, then Pk converges elementwise to a matrix in which each column is the unique stationary distribution <math>\pi^*<math>, with
- <math>\lim_{k\rightarrow\infty}P^k\pi=\pi^*<math>,
independent of the initial distribution <math>\pi<math>. This is stated by the Perron-Frobenius theorem.
A transition matrix which is positive (that is, every element of the matrix is positive) is irreducible and aperiodic. A matrix is a stochastic matrix if and only if it is the matrix of transition probabilities of some Markov chain.
Note: In this formulation, element (i, j) is the probability of a transition from j to i. An equivalent formulation is sometimes given with element (i, j) equal to the probability of a transition from i to j. In that case the transition matrix is just the transpose of the one given here. Also, the stationary distribution of the system is then given by the left eigenvector of the transition matrix, instead of the right eigenvector.
Scientific applications
Markov chains are used to model various processes in queueing theory and statistics, and can also be used as a signal model in entropy coding techniques such as arithmetic coding. Markov chains also have many biological applications, particularly population processes, which are useful in modelling processes that are (at least) analogous to biological populations. Hidden Markov models have been used in bioinformatics as well, e.g. for coding region/gene prediction.
Markov processes can also be used to generate superficially "real-looking" text given a sample document: they are used in various pieces of recreational "parody generator" software (see Jeff Harrison, Mark V Shaney). Markov chains have also been used in music composition.
A recent application of Markov chains is in geostatistics. That is, Markov chains are used in two to three dimensional stochastic simulations of discrete variables conditional on observed data. Such an application is called "Markov chain geostatistics", similar with kriging geostatistics. The Markov chain geostatistics is still in developing.
See also
References
- A.A. Markov. "Rasprostranenie zakona bol'shih chisel na velichiny, zavisyaschie drug ot druga". Izvestiya Fiziko-matematicheskogo obschestva pri Kazanskom universitete, 2-ya seriya, tom 15, pp 135-156, 1906.
- A.A. Markov. "Extension of the limit theorems of probability theory to a sum of variables connected in a chain". reprinted in Appendix B of: R. Howard. Dynamic Probabilistic Systems, volume 1: Markov Chains. John Wiley and Sons, 1971.
- Leo Breiman. Probability. Original edition published by Addison-Wesley, 1968; reprinted by Society for Industrial and Applied Mathematics, 1992. ISBN 0-89871-296-3. (See Chapter 7.)
- J.L. Doob. Stochastic Processes. New York: John Wiley and Sons, 1953. ISBN 0-471-52369-0.
External links
- (pdf) Markov Chains chapter in American Mathematical Society's introductory probability book (http://www.dartmouth.edu/~chance/teaching_aids/books_articles/probability_book/Chapter11.pdf)
- Markov Chains (http://crypto.mat.sbg.ac.at/~ste/diss/node6.html)
- Generating Text (http://www.cs.bell-labs.com/cm/cs/pearls/sec153.html) (About generating random text using a Markov chain.)
- The World's Largest Matrix Computation (http://www.mathworks.com/company/newsletters/news_notes/clevescorner/oct02_cleve.html) (Google's PageRank as the stationary distribution of a random walk through the Web.)
- Disassociated Press (http://www.gnu.org/software/emacs/manual/html_node/Dissociated-Press.html) in Emacs approximates a Markov process
- [1] (http://rapt.us/toys/chain.pl) Markov chains used to produce semi-coherent English.de:Markow-Kette