Hardy-Weinberg principle
|
Hardy-Weinberg.gif
The Hardy–Weinberg principle (HWP) (also Hardy–Weinberg equilibrium (HWE), or Hardy–Weinberg law) states that, under certain conditions, after one generation of random mating, the genotype frequencies at a single gene locus will become fixed at a particular equilibrium value. It also specifies that those equilibrium frequencies can be represented as a simple function of the allele frequencies at that locus.
In the simplest case of a single locus with two alleles A and a with allele frequencies of p and q, respectively, the HWP predicts that the genotypic frequencies for the AA homozygote to be p2, the Aa heterozygote to be 2pq and the other aa homozygote to be q2. The Hardy–Weinberg principle is an expression of the notion of a population in "genetic equilibrium" and is a basic principle of population genetics.
Contents |
5.1 Generalization for more than two alleles |
Assumptions
The original assumptions for Hardy–Weinberg equilibrium (HWE) were the population under consideration is idealised, that is:
- infinite (or effectively so, so as to eliminate genetic drift)
- sexually reproducing
- randomly mating
- diploid
and experience:
In other words, the population must be large, randomly mate, and also not undergo evolution.
Causes of deviation
When the Hardy–Weinberg assumptions are not met this can cause deviations from expectation, but depending which assumption is not met, such deviations may or may not be statistically detectable. Deviations can be caused by the Wahlund effect, inbreeding, assortative mating, selection, or genetic drift. Assortative mating will only change the genotype frequencies of those genes that are desired. Genetic drift is particularly active in small population sizes. Deviations caused by selection, however, often require a significant selection coefficient in order to be detected which is why the test for deviations from Hardy–Weinberg proportions is considered a weak test for selection.
Derivation
A more statistical description for the HWP, is that the alleles for the next generation for any given individual are chosen independently. Consider two alleles, A and a, with frequencies p and q, respectively, in the population then the different ways to form new genotypes can be derived using a Punnett square, where the size of each cell is proportional to the fraction of each genotypes in the next generation:
Females | |||
---|---|---|---|
A (p) | a (q) | ||
Males | A (p) | AA (p2) | Aa (pq) |
a (q) | aA (qp) | aa (q2) |
So the final three possible genotype frequencies, in the offspring, if the alleles are drawn independently become:
- <math>f(\mathbf{AA}) = p^2\,<math>
- <math>f(\mathbf{Aa}) = 2pq\,<math>
- <math>f(\mathbf{aa}) = q^2\,<math>
This is normally achieved in one generation, except if a population is created by bringing together males and females with different allele frequencies, in which case, equilibrium is reached in two generations.
Sex linkage
Where the a gene is sex-linked, the heterogametic sex (e.g. males in humans) have only one copy of the gene and are effectively haploid for that gene. So the genotype frequency at equilibrium is therefor p and q for the heterogametic sex but p^2, 2pq and q^2 for the homogametic sex.
For example in humans red-green colorblindness is caused by an X-linked recessive allele. The frequency in males is about 1 in 12, (or 0.083) whereas it affects about 1 in 250 women (0.004).
If a population is brought together with males and females with different allele frequencies, then the allele frequency of the male population follows that of the female population because each receives its X chromosome from its mother. The population converges on equilibrium, within about six generations maximum.
Generalizations
Generalization for more than two alleles
The Hardy–Weinberg principle may be generalized to more than two alleles. Consider an extra allele frequency, <math>r<math>. The two-allele case is the binomial expansion of <math>(p+q)^2<math>, and thus the three-allele case is:
<math>(p+q+r)^2=p^2 + r^2 + q^2 + 2pq +2pr + 2qr<math>
More generally, consider the alleles A1 ... Ai given by the allele frequencies p1 to pi</i>,
- <math>(p_1 + ... + p_i)^2<math>
giving for all homozygotes:
- <math>f(A_i A_i) = p_i^2<math>
and for all heterozygotes:
- <math>f(A_i A_j) = 2p_ip_j<math>
Generalization for polyploidy
The Hardy–Weinberg principle may also be generalized to polyploid systems, that is to populations which have more than two copies of each chromosome. Consider again only two alleles. The diploid case is the binomial expansion of:
- <math>(p + q)^2<math>
and therefore the polyploid case is the binomial expansion of:
- <math>(p + q)^c<math>
where c is the ploidy, for example with tetraploid (c = 4):
Genotype | Frequency |
---|---|
<math> \mathbf A \mathbf A \mathbf A \mathbf A <math> | <math>p^4<math> |
<math> \mathbf A \mathbf A \mathbf A \mathbf a<math> | <math>4p^3 q<math> |
<math> \mathbf A \mathbf A \mathbf a \mathbf a<math> | <math>6p^2q^2<math> |
<math> \mathbf A \mathbf a \mathbf a \mathbf a<math> | <math>4pq^3<math> |
<math> \mathbf a \mathbf a \mathbf a \mathbf a<math> | <math>q^4<math> |
Complete generalization
The completely generalized formula is the multinomial expansion of <math>(p_1 + \cdots + p_n)^n<math>:
- <math>(p_1 + \cdots + p_n)^n = \sum_{k_1, \ldots, k_n, k_1 + \cdots +k_n=n} {n \choose k_1, \ldots, k_n}
p_1^{k_1} \cdots p_n^{k_n} <math>
Applications
The Hardy–Weinberg principle may be applied in two ways, either a population is assumed to be in Hardy–Weinberg proportions, in which the genotype frequencies for can be calculated, or if the genotype frequencies of all three genotypes are known, the can be tested for deviations that are statistically significant.
Application to cases of complete dominance
Suppose that the phenotypes of AA and Aa are indistinguishable i.e. that there is complete dominance. Assuming that the Hardy–Weinberg principle applies to the population, then q can still be calculated from f(aa):
- <math>q = \sqrt {f(aa)}<math>
and p can be calculated from q. And thus an estimate of f(AA) and f(Aa) derived from p^2 and 2pq respectively. Note however, such a population cannot be tested for equilibrium using the significance tests below because it is assumed a priori.
Significance tests for deviation
Testing deviation from the HWP is generally performed using Pearson's chi-squared test, using the observed genotype frequencies obtained from the data and the expected genotype frequencies obtained using the HWP. For systems where there are large numbers of alleles, this may result in data with many empty possible genotypes and low genotype counts, because there are often not enough individuals present in the sample to adequately represent all genotype classes. If this is the case, then the asymptotic assumption of the chi-square distribution, will no longer hold, and it may be necessary to use a form of Fisher's exact test, which requires a computer to solve.
Example χ2 test for deviation
These data are from E.B. Ford (1971) on the Scarlet tiger moth, for which the phenotypes of a sample of the population were recorded. Genotype-phenotype distinction is assumed to be negligibly small. The null hypothesis is that the population is in Hardy–Weinberg proportions, and the alternative hypothesis is that the population is not in Hardy–Weinberg proportions.
Genotype | White-spotted (AA) | Intermediate (Aa) | Little spotting (aa) | Total |
---|---|---|---|---|
Number | 1469 | 138 | 5 | 1612 |
From which allele frequencies can be calculated:
<math>p<math> <math>= {2 \times obs(AA) + obs(Aa) \over 2 \times (obs(AA) + obs (Aa) + obs(aa))}<math> <math>= {1469 \times 2 + 138 \over 2 \times (1469+138+5)}<math> <math>= { 3976 \over 3224} <math> <math>= 0.954<math>
and
<math>q<math> <math>= 1 - p<math> <math>= 1 - 0.954<math> <math>= 0.046<math>
So the Hardy–Weinberg expectation is:
<math>Exp(AA) = p^2n = 0.954^2 \times 1612 = 1467.4<math>
<math>Exp(Aa) = 2pqn = 2 \times 0.954 \times 0.046 \times 1612 = 141.2<math>
<math>Exp(aa) = q^2n = 0.046^2 \times 1612< = 3.4<math>
Pearson's chi-square test states:
<math> \chi^2<math> <math>= \sum {(O - E)^2 \over E}<math> <math> = {(1469 - 1467.4)^2 \over 1467.4} + {(138 - 141.2)^2 \over 141.2} + {(5 - 3.4)^2 \over 3.4}<math> <math> = 0.001 + 0.073 + 0.756 <math> <math> = 0.83 <math>
There is 1 degree of freedom. (degrees of freedom for χ2 squared tests are normally n - 1, where n is the number of genotype classes. However, an extra degree of freedom is lost because the expected values were calculated from the observed values). The 5% significance level for 1 degree of freedom is 3.84, and since the χ2 value is less than this, the null hypothesis that the population is in Hardy–Weinberg equilibrium is not rejected.
F-statistics
In F-statistics, the measure F is the observed frequency of heterozygotes over that expected from Hardy–Weinberg equilibrium:
- <math> F = \frac{\operatorname{O}(f(\mathbf{Aa}))} {\operatorname{E}(f(\mathbf{Aa}))}, \!<math>
where the expected value from Hardy–Weinberg equilibrium is given by
- <math> \operatorname{E}(f(\mathbf{Aa})) = 2\, p\, q, \!<math>
For example, for Ford's data above;
<math>F<math> <math>= {138 \over 141.2}<math> <math>=0.977<math>
History
Godfrey_Harold_Hardy.jpg
Wilhelm_Weinberg.jpg
Mendelian genetics was rediscovered in 1900. However, it remained somewhat controversial for several years as it was not then known how it could cause continuous characters. Udny Yule (1902) argued against Mendelism because he thought that dominant alleles would increase in the population. The American William E. Castle (1903) showed that without selection, the genotype frequencies would remain stable. Karl Pearson (1903) found one equilibrium position with values of p = q = 0.5. Reginald Punnett, unable to counter Yule's point, introduced the problem to G. H. Hardy, a British mathematician, with whom he played cricket. Hardy was a pure mathematician and held applied mathematics in some contempt; his view of biologists use of mathematics comes across in his 1908 paper where he describes this as "very simple".
- To the Editor of Science: I am reluctant to intrude in a discussion concerning matters of which I have no expert knowledge, and I should have expected the very simple point which I wish to make to have been familiar to biologists. However, some remarks of Mr. Udny Yule, to which Mr. R. C. Punnett has called my attention, suggest that it may still be worth making...
- Suppose that Aa is a pair of Mendelian characters, A being dominant, and that in any given generation the number of pure dominants (AA), heterozygotes (Aa), and pure recessives (aa) are as p:2q:r. Finally, suppose that the numbers are fairly large, so that mating may be regarded as random, that the sexes are evenly distributed among the three varieties, and that all are equally fertile. A little mathematics of the multiplication-table type is enough to show that in the next generation the numbers will be as (p+q)^2:2(p+q)(q+r):(q+r)^2, or as p1:2q1:r1, say.
- The interesting question is — in what circumstances will this distribution be the same as that in the generation before? It is easy to see that the condition for this is q^2 = pr. And since q_1^2 = p_1r_1, whatever the values of p, q, and r may be, the distribution will in any case continue unchanged after the second generation
The principle was thus known as Hardy's law in the English-speaking world until Stern (1943) pointed out that it had first been formulated independently in 1908 by the German physician Wilhelm Weinberg (see Crow 1999).
References
- Castle, W. E. (1903). The laws of Galton and Mendel and some laws governing race improvement by selection. Proc. Amer. Acad. Arts Sci.. 35: 233–242.
- Crow, J.F. (1999). Hardy, Weinberg and language impediments. Genetics 152: 821-825. link (http://www.genetics.org/cgi/content/full/152/3/821?maxtoshow=&HITS=10&hits=10&RESULTFORMAT=&titleabstract=Punnett&searchid=1109675768213_849&stored_search=&FIRSTINDEX=0&journalcode=genetics)
- Ford, E.B. (1971). Ecological Genetics, London.
- Hardy, G. H. (1908). "Mendelian proportions in a mixed population". Science 28: 49–50. ESP copy (http://www.esp.org/foundations/genetics/classical/hardy.pdf)
- Pearson, K. (1904). Mathematical contributions to the theory of evolution. XI. On the influence of natural selection on the variability and correlation of organs. Philosophical Transactions of the Royal Society of London, Ser. A 200: 1–66.
- Stern, C. (1943). "The Hardy–Weinberg principle". Science 97: 137–138. JSTOR stable url (http://links.jstor.org/sici?sici=0036-8075%2819430205%293%3A97%3A2510%3C137%3ATHL%3E2.0.CO%3B2-8)
- Weinberg, W. (1908). "Über den Nachweis der Verebung beim Menschen". Jahreshefte des Vereins für vaterländische Naturkunde in Württemberg 64: 368–382.
- Yule, G. U. (1902). Mendel's laws and their probable relation to intra-racial heredity. New Phytol. 1: 193–207, 222–238.
Topics in population genetics |
---|
Key concepts: Hardy-Weinberg law | Fisher's fundamental theorem | neutral theory |
Selection: natural | sexual | artificial | ecological |
Genetic drift: small population size | population bottleneck | founder effect |
Founders: Ronald Fisher | J.B.S. Haldane | Sewall Wright |
Related topics: evolution | microevolution | evolutionary game theory | fitness landscape |
List of evolutionary biology topics |