Hypergeometric distribution
|
Template:Probability distribution In probability theory and statistics, the hypergeometric distribution is a discrete probability distribution that describes the number of successes in a sequence of n draws from a finite population without replacement.
A typical example is the following: There is a shipment of N objects in which D are defective. The hypergeometric distribution describes the probability that in a sample of n distinctive objects drawn from the shipment exactly k objects are defective.
In general, if a random variable X follows the hypergeometric distribution with parameters N, D and n, then the probability of getting exactly k successes is given by
- <math> P_k(N,D,n) = {{{D \choose k} {{N-D} \choose {n-k}}}\over {N \choose n}}<math>
The probability is positive when k is between max{ 0, D + n − N } and min{ n, D }.
The formula can be understood as follows: There are <math> N \choose n <math> possible samples (without replacement). There are <math> D \choose k <math> ways to obtain k defective objects and there are <math> {N-D} \choose {n-k} <math> ways to fill out the rest of the sample with non-defective objects.
When the population size is large (i.e. N is large) the hypergeometric distribution can be approximated reasonably well with a binomial distribution with parameters n (number of trials) and p = D / N (probability of success in a single trial).
The fact that the sum of the probabilities, as k runs through the range of possible values, is equal to 1, is essentially Vandermonde's identity from combinatorics.
Related distributions
- <math>Y \sim \mathrm{Binomial}(n = n, p = D/N)<math> is a binomial distribution as <math>Y = \lim_{n \to \infty} X<math> where <math>X \sim \mathrm{Hypergeometric}(D, N, n)<math>.de:Hypergeometrische Verteilung
es:Distribución hipergeométrica fr:Loi hypergéométrique it:Variabile casuale Ipergeometrica