Zipf-Mandelbrot law
|
Template:Probability distribution
In probability theory and statistics, the Zipf-Mandelbrot law is a discrete probability distribution. Also known as the Pareto-Zipf law, it is a power-law distribution on ranked data, named after the Harvard linguistics professor George Kingsley Zipf (1902-1950) who suggested a simpler distribution called Zipf's law, and the mathematician Benoit Mandelbrot (born November 20, 1924), who subsequently generalized it.
The probability mass function is given by:
- <math>f_k(N,q,s)=\frac{1/(k+q)^s}{H_{N,q,s}}<math>
where <math>H_{N,q,s}<math> is given by:
- <math>H_{N,q,s}=\sum_{i=1}^N \frac{1}{(i+q)^s}<math>
which may be thought of as a generalization of a harmonic number. In the limit as <math>N<math> approaches infinity, this becomes the Hurwitz zeta function <math>\zeta(q,s)<math>. For finite <math>N<math> and <math>q=0<math> the Zipf-Mandelbrot law becomes Zipf's law. For infinite <math>N<math> and <math>q=0<math> it becomes a Zeta distribution.
Applications
The distribution of words ranked by their frequency in a random corpus of text is generally a power-law distribution, known as Zipf's law.
If one plots the frequency rank of words contained in a large corpus of text data versus the number of occurrences or actual frequencies, one obtains a power-law distribution, with exponent close to one (but see Gelbukh and Sidoro 2001).
External links
- Z. K. Silagadze: Citations and the Zipf-Mandelbrot's law (http://arxiv.org/PS_cache/physics/pdf/9901/9901035.pdf)
- NIST: Zipf's law (http://www.nist.gov/dads/HTML/zipfslaw.html)
- W. Li's References on Zipf's law (http://linkage.rockefeller.edu/wli/zipf/)