Loading icon

The Critical Role of Genotype Phasing in Haplotype Determination

Post banner image
Share:

Genotype phasing is a crucial process in genetics, specifically in the determination of haplotypes from genotype data. This post will explore the concept of phasing, its importance for haplotype determination, and various algorithms used in phasing, with a focus on Expectation Maximization-like methods.

What is Phasing?

Phasing is the process of deducing the arrangement of alleles on a parent's chromosome, or haplotypes, from genotype data. Genotypes provide information about the genetic variants present in an individual, but they don't specify which variants are inherited from which parent. Phasing addresses this by determining which alleles are on the same chromosome and hence inherited together​​​​.

Importance of Phasing in Haplotype Determination
Haplotypes, combinations of alleles at different polymorphic sites on the same DNA molecule, are crucial in genetics for various reasons:

-They help in imputation of alleles at ungenotyped loci.

-Identifying genomic regions shared identical by descent (IBD).

-Detecting and correcting genotype errors.

-Analyzing parent-of-origin effects and compound heterozygosity.

-Enhancing association testing in genetics research​​.

Challenges in Phasing

The main challenge in phasing is the complexity of distinguishing between alleles inherited from each parent, especially with current sequencing technologies that often generate short sequencing reads, making it difficult to assemble individual homologous chromosome pairs​​.

Algorithms for Phasing

1. PULSAR Method
A novel method, PULSAR (Phasing Using Lineage Specific Alleles/Rare variants), phases genotypes from whole-genome sequence data in pedigrees. This algorithm focuses on identifying lineage-specific alleles (LSAs) to infer haplotype segments that are shared IBD within pedigrees. PULSAR is noted for its high accuracy and the ability to perform genotype error correction and imputation without requiring reference panels essential for other population-based phasing algorithms​​.

2. GLIMPSE Method
GLIMPSE (Genotype Likelihoods Imputation and Phasing Method) is designed for large-scale studies and leverages large reference panels. It's particularly effective for low-coverage sequencing datasets and is noted for its computational efficiency and improved imputation accuracy across the full allele frequency range​​​​.

3. Hybrid Approaches
Recent studies suggest that hybrid or combined approaches, which integrate population-based phasing using software like SHAPEIT, genome-wide sequencing read data or parental genotypes, and large reference panels, provide a fast and efficient way to produce highly accurate phase-resolved genomes. This approach can also incorporate a majority voting scheme for constructing a consensus haplotype for enhanced performance and site coverage​​.

Performance Metrics for Phasing

Evaluating the performance of phasing algorithms involves several metrics:

-Percentage of phased single nucleotide variants (SNVs).

-Switch error rate (SER), indicating phasing accuracy.

-Haplotype block length, assessing the completeness and quality of phased haplotypes​​.

Conclusion

Genotype phasing plays a fundamental role in understanding the genetic makeup of individuals and populations. With advancements in phasing algorithms and the increasing complexity and size of genomic data sets, efficient and accurate phasing methods like PULSAR, GLIMPSE, and hybrid approaches are becoming more important. These methods not only enhance our understanding of genetic inheritance patterns but also contribute significantly to the fields of disease research, population genetics, and personalized medicine.

Reference:

Blackburn, A. N., Blondell, L., Kos, M. Z., Blackburn, N. B., Peralta, J. M., Stevens, P. T., ... & Göring, H. H. (2020). Genotype phasing in pedigrees using whole-genome sequence data. European Journal of Human Genetics, 28(6), 790-803.
Choi, Y., Chan, A. P., Kirkness, E., Telenti, A., & Schork, N. J. (2018). Comparison of phasing strategies for whole human genomes. PLoS genetics, 14(4), e1007308.
Rubinacci, S., Ribeiro, D. M., Hofmeister, R. J., & Delaneau, O. (2021). Efficient phasing and imputation of low-coverage sequencing data using large reference panels. Nature Genetics, 53(1), 120-126.