Low-Frequency Coding Variants in Multiple Sclerosis: Quantifying Rare Genetic Risk and Refining Immune Mechanisms
Genome-wide association studies (GWAS) have transformed multiple sclerosis (MS) genetics by identifying hundreds of common-variant risk signals, yet they also exposed a persistent “missing heritability” problem: common variants explain a meaningful but incomplete fraction of genetic liability. In this Cell study, the International Multiple Sclerosis Genetics Consortium directly interrogates whether low-frequency and rare coding variants—alleles often poorly tagged by common SNP arrays and therefore frequently invisible to GWAS—account for an additional, measurable component of MS risk. The work is positioned explicitly against two recurrent explanations for unexplained heritability (epistasis among common variants versus contributions from rarer alleles), and it builds on prior evidence suggesting that strong epistatic effects are limited in MS and that familial linkage signals do not indicate a small number of high-penetrance loci driving typical disease risk.
Study Design and Variant Spectrum Interrogated
The investigators assembled a large, multi-cohort case-control dataset totaling 68,379 individuals (32,367 MS cases and 36,012 controls) recruited across Australia, multiple European countries, and the United States. Rather than sequencing, they used exome-focused genotyping (Illumina HumanExome BeadChip content, complemented by a custom “MS chip” incorporating exome content) to economically assay low-frequency coding variation at scale. After stringent quality control, they meta-analyzed 120,991 low-frequency coding variants across autosomal exons, including a large set of non-synonymous variants and a smaller set of nonsense variants, applying mixed-model association to mitigate confounding from population structure and using a conservative multiple-testing threshold for exome-wide inference.
Exome-Wide Signals: Known Loci and Novel Coding Associations
At the single-variant level, the study reports genome-wide significant associations between MS risk and seven coding variants in six genes outside the extended MHC locus. Two signals—TYK2 (p.Pro1104Ala) and GALC (p.Asp84Asp)—fall within regions previously implicated by GWAS and are consistent with established common-variant architecture. The remaining associations are notable precisely because they are not in substantial linkage disequilibrium with known common-variant signals and are not readily imputable from standard GWAS reference panels, illustrating a practical route by which biologically interpretable risk alleles can evade discovery in common-variant studies. The newly highlighted genes include PRF1 (p.Ala91Val), PRKRA (p.Asp33Gly and p.Pro11Leu, in strong linkage with one another), HDAC7 (p.Arg166His), and NLRP8 (p.Ile942Met).
Partitioning Heritability: Quantifying the Contribution of Low-Frequency Coding Variation
Recognizing that single-variant significance testing will miss many real effects in the low-frequency regime, the authors also estimate variance explained using restricted maximum-likelihood heritability modeling across cohorts and meta-analyze those estimates. Their central quantitative claim is that low-frequency coding variants (minor allele frequency below 5%) explain a non-trivial share of MS liability beyond common variation, with a mean estimate of approximately 4.1% on the liability scale (reported alongside an 11.34% estimate on the observed scale). When further stratified, rare variants (minor allele frequency below 1%) account for a substantial fraction of this low-frequency component (mean estimate ~3.2% on the liability scale; ~9.0% on the observed scale), supporting a polygenic architecture that extends into rarer coding alleles even when most individual variants are not detectable at exome-wide significance in current sample sizes.
Mechanistic Signals Pointing to Immune Regulation Pathways
A key interpretive layer of the paper is that the novel coding associations are not mechanistically arbitrary; rather, they converge on immunological processes long suspected in MS while also broadening the implicated immune compartments. PRF1 encodes perforin, central to cytotoxic lymphocyte biology and also relevant to regulatory T cell (Treg) function; the cited p.Ala91Val variant has been associated with reduced cytotoxic efficiency and altered cytokine behavior in natural killer cells, aligning with hypotheses of dysregulated IFN-γ biology. HDAC7 is positioned as a modulator of FOXP3-mediated transcriptional repression and thymic T cell development, thus linking MS risk to pathways governing Treg lineage stability and selection processes. PRKRA provides a plausible connection to NF-κB signaling and interferon responses through double-stranded RNA sensing pathways, while NLRP8 is described as an innate immune receptor, collectively reinforcing a model in which both adaptive and innate immune regulation—and not solely peripheral adaptive activation—contributes to genetic susceptibility.
Why Coding Low-Frequency Variants Matter for Causal Inference
A practical implication emphasized by the authors is that low-frequency coding variants often exhibit limited linkage disequilibrium with neighboring polymorphisms, reducing the ambiguity that commonly accompanies GWAS fine-mapping in non-coding regions. In addition, coding changes can be more directly interpretable in terms of altered protein sequence, and therefore can be experimentally tractable starting points for mechanistic follow-up (e.g., targeted cellular assays, protein functional studies, or precise genome editing). This combination—clearer gene attribution and more immediate functional hypotheses—helps explain why the study identifies genes that “would not have been found” by common-variant association alone, despite extensive prior GWAS efforts in MS.
Limitations and Next Steps Toward a More Complete Genetic Architecture
The study’s conclusions should be read in light of important constraints inherent to exome genotyping arrays: coverage is incomplete for the rarest alleles and is influenced by the reference populations used to design the array content, implying that the estimated contribution of rare coding variation is likely conservative. Moreover, statistical association—even when coding and low-frequency—does not specify the relevant cell type, developmental window, or directionality of immune dysregulation without dedicated functional validation. The most direct next steps follow logically from the paper’s own framing: expansion to larger and more diverse cohorts (to improve power and assess ancestry specificity), increased use of sequencing (to capture ultra-rare and population-specific coding alleles), and systematic experimental dissection of implicated variants and genes (particularly across Treg biology, IFN-γ signaling contexts, and NF-κB pathway regulation) to translate genetic signals into actionable mechanisms.
Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.
References:
Mitrovič, M., Patsopoulos, N. A., Beecham, A. H., Dankowski, T., Goris, A., Dubois, B., ... & Cotsapas, C. (2018). Low-frequency and rare-coding variation contributes to multiple sclerosis risk. Cell, 175(6), 1679-1687.
