Haplotype
|
A haplotype, a contraction of the phrase "haploid genotype", is the genetic constitution of an individual chromosome. In the case of diploid organisms such as humans, the haplotype will contain one member of the pair of alleles for each site. A haplotype can refer to only one locus or to an entire genome. A genome-wide haplotype would comprise half of a diploid genome, including one allele from each allelic gene pair.
In a second meaning it refers to a set of single nucleotide polymorphisms found to be statistically associated on a single chromatid. With this knowledge, the identification of a few alleles of a haplotype block unambiguously identifies all other polymorphic sites in this region. Such information is most valuable to investigate the genetics behind common diseases and collected by the International HapMap Project.
A genotype is distinct from a haplotype because an individual's genotype may not uniquely define that individual's haplotype. As an example, consider two loci, each with two possible alleles, the first locus being either A or a, the second locus being B or b. If the genotype of an individual was found to be AaBb, there are two possible sets of haplotypes, corresponding to which pairs happen to occur on the same chromosome:
haplotype at chromosome 1 | haplotype at chromosome 2 | |
---|---|---|
haplotype set 1 | AB | ab |
haplotype set 2 | Ab | aB |
In this case, more information would be required to determine which particular set of haplotypes occur in the individual (i.e. which alleles appear on the same chromosome).
Given the genotypes for a number of individuals, the haplotypes can be inferred by haplotype resolution or haplotype phasing techniques. These methods work by applying the observation that certain haplotypes are common in certain genomic regions. Therefore given a set of possible haplotype resolutions, these methods choose those which use fewer different haplotypes overall. The specifics of this method vary - some are based on parsimony, while others use likelihood functions in combinations with algorithms such as EM (Expectation-Maximization) or MCMC (Markov Chain Monte Carlo).