Understanding pLI and Missense Z Scores in gnomAD
The calculation of pLI (probability of being loss-of-function intolerant) and Missense Z scores in the gnomAD (Genome Aggregation Database) is a critical component of understanding genetic variation and its potential impact on human health. These metrics are employed to gauge a gene's sensitivity to different types of mutations, namely loss-of-function (LoF) and missense mutations.
The pLI score is particularly focused on evaluating a gene's tolerance to LoF mutations, which are changes in the DNA sequence that are expected to disrupt the gene's function. It uses an expectation-maximization algorithm to estimate the probability that a gene is intolerant to LoF variations based on observed and expected variant counts. A gene with a high pLI score is deemed intolerant to LoF mutations, suggesting that such mutations are likely deleterious and could lead to disease. This score is crucial for distinguishing genes that, when mutated, could lead to severe phenotypes due to haploinsufficiency, where a single functional copy of the gene is not sufficient for normal function.
On the other hand, the Missense Z score quantifies a gene's intolerance to missense mutations, which are changes in the DNA sequence that result in the substitution of one amino acid for another in the protein product. This score is derived by comparing the observed count of missense variants to an expected count based on a statistical model that accounts for the mutation spectrum. A positive Z score indicates fewer variants observed than expected, suggesting increased constraint and intolerance to variation, while a negative Z score suggests that a gene has more variants than expected, indicating tolerance to variation.
Furthermore, the introduction of 3D mutational constraint quantification, as detailed in a study published in Nature Communications, adds another layer of complexity and accuracy to understanding genetic variation. This approach involves mapping single nucleotide variants (SNVs) from gnomAD to human reference protein sequences and their 3D structures, thereby identifying the impact of mutations in the context of protein folding and function. By quantifying the constraint on amino acid sites in this manner, researchers can better understand the structural and functional implications of missense variants, providing insights into how mutations affect protein stability and activity. This method highlights the importance of considering the 3D spatial context of amino acid residues, revealing that long-range interactions in the protein structure can significantly influence the effect of mutations.
Together, these metrics provide a comprehensive view of the genetic constraint and offer valuable insights into the functional impact of genetic variation. They are instrumental in identifying genes that are crucial for normal biological functions and are sensitive to mutations, which can inform studies on genetic diseases, evolutionary biology, and the development of therapeutic strategies.
Reference:
Fuller, Z. L., Berg, J. J., Mostafavi, H., Sella, G., & Przeworski, M. (2019). Measuring intolerance to mutation in human genetics. Nature Genetics, 51(5), 772-776.
Koch, L. (2020). Exploring human genomic diversity with gnomAD. Nature Reviews Genetics, 21(8), 448-448.
Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alföldi, J., Wang, Q., ... & MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature, 581(7809), 434-443.
Li, B., Roden, D. M., & Capra, J. A. (2022). The 3D mutational constraint on amino acid sites in the human proteome. Nature communications, 13(1), 3273.