Loading icon

Validating MHC Tag SNPs for Multiple Sclerosis Across Global Populations: Why Ancestry Matters for HLA Inference

Post banner image
Share:

Genome-wide studies have consistently shown that the strongest inherited risk for multiple sclerosis (MS) resides in the Major Histocompatibility Complex (MHC), particularly classical HLA loci that shape antigen presentation and adaptive immune recognition. In this article, Boullerne and colleagues focus on three MS-relevant alleles: HLA-DRB115:01 (a well-established risk allele in many European-descent populations), HLA-DRB115:03 (a risk allele enriched in African populations), and HLA-DQB106:02 (often co-inherited with DRB115:01 in some ancestries), alongside HLA-A02:01, a protective allele whose effects have been less explored outside selected cohorts. The central motivation is practical: HLA alleles are expensive and technically complex to type at high resolution, so many studies infer (“tag”) these alleles using nearby single nucleotide polymorphisms (SNPs) that are correlated through linkage disequilibrium (LD)—but those tag relationships are frequently assumed to generalize beyond the original discovery population without direct validation.

The problem with “one-size-fits-all” tag SNPs across ancestries
Tag SNP strategies exploit the fact that segments of the MHC can exhibit strong LD, allowing a small number of SNPs to act as surrogates for nearby HLA alleles. However, LD structure is highly population-dependent because it reflects demographic history, recombination, drift, and admixture; therefore, a tag SNP with excellent performance in one population can fail in another. The authors emphasize that, for MS in particular, most tag SNP discovery and validation has historically been performed in European-descent cohorts, leaving substantial uncertainty in African, East Asian, South Asian, and admixed American populations. They also highlight a key conceptual pitfall: even alleles that are strongly correlated in one ancestry (e.g., DRB115:01 and DQB106:02 within the classical DR15 haplotype) may not be correlated to the same extent elsewhere, undermining the common shortcut of treating a single SNP as a universal proxy.

Study design: multi-ancestry validation using 1000 Genomes HLA-typed samples
To address generalizability explicitly, the investigators used the 1000 Genomes reference panel, leveraging 2,504 healthy individuals distributed across 26 populations in five broad regions (Africa, the Americas, East Asia, Europe, and South Asia), with high-quality HLA typing available for HLA-DRB1, HLA-DQB1, and HLA-A in most participants after filtering for ambiguity and coverage. They evaluated 19 candidate SNPs reported or proposed to tag the target HLA alleles, drawing genotypes from Ensembl and assessing performance via three complementary indices: (i) LD strength (reported as (R^2)), (ii) diagnostic classification metrics (sensitivity and specificity of the SNP-defined tag versus typed HLA), and (iii) concordance in minor allele frequency (MAF) between the SNP and the target HLA allele. They defined “high tagging performance” conservatively—LD (R^2 \ge 0.90), 100% sensitivity, and ≥95% specificity—mirroring thresholds used in clinically motivated tagging contexts such as pharmacogenomics.

Key findings for HLA-DRB115:01: strong in several European groups, uneven elsewhere
For DRB115:01, multiple historically used tag SNPs performed extremely well in certain European populations—most notably, several were in perfect LD in the British (GBR) cohort, and multiple SNPs demonstrated high sensitivity/specificity in European subgroups such as CEU, FIN, and GBR, with more variable performance in IBS and TSI. Importantly, performance was not confined to Europe: several DRB115:01 tags also performed well in subsets of the American region populations (e.g., MXL and PEL, and in some analyses PUR/CLM), consistent with the possibility that European ancestry contributions can preserve European-like LD blocks in admixed cohorts. In contrast, the same SNPs largely failed to achieve high tagging performance in most East Asian and South Asian populations, with a notable exception where one SNP (rs9271366) showed strong sensitivity/specificity in the Southern Han Chinese (CHS) group. Collectively, this pattern illustrates the central message of the paper: “validated in Europeans” does not imply “validated in non-Europeans,” even for a canonical MS risk allele.

HLA-DRB115:03 and HLA-DQB106:02: a cautionary tale and a partial solution
The most consequential negative result concerns DRB115:03: despite examining multiple candidate SNPs, the authors did not identify any SNP with high tagging performance for DRB115:03 in the populations where this allele is most relevant (predominantly African cohorts). Some SNPs showed relatively high sensitivity but poor specificity, an especially problematic failure mode because it can inflate apparent allele carriage and dilute true associations. For DQB106:02, where validated tag SNPs have been less established, the study provides a more constructive outcome: rs3135388 (already prominent as a DRB115:01 tag in several settings) showed high tagging performance for DQB106:02 in a limited set of populations spanning different regions (including one European, one American, and one South Asian population). Critically, the authors also observed that rs3135388 could be in perfect LD with DRB115:01 in certain groups while showing weaker LD with DQB106:02 in those same groups, reinforcing that DR15 haplotype correlations are not interchangeable across ancestries and that allele-level inference must be validated directly for each target.

Protective genetics: rs2844821 emerges as a robust tag for HLA-A02:01 across Africa and Europe
A particularly impactful contribution is the identification of rs2844821 as a high-performing tag SNP for the protective allele A02:01. Among seven African populations in the 1000 Genomes panel (including African Americans) and across essentially all European populations assessed, rs2844821 met stringent performance criteria (high LD and excellent sensitivity/specificity), positioning it as a practical proxy for expanding A02:01 association studies beyond the historically narrow set of cohorts. This matters because protective effects are often more difficult to study than risk effects: they may be masked by population structure, differential allele frequencies, and inadequate tagging. By providing evidence that rs2844821 tracks A02:01 well in both African and European populations, the study supplies a concrete tool for more inclusive MS immunogenetics—while simultaneously demonstrating that other previously used A02:01 SNP proxies can be unreliable, underscoring the need for systematic validation rather than tradition-driven SNP selection.

Implications for MS genetics and beyond: validation, admixture, and study portability
The broader lesson is methodological: tag SNP portability cannot be presumed in the MHC, and imperfect tagging can plausibly generate null results or inconsistent effect sizes across studies by misclassifying true HLA carriage—especially in admixed populations where ancestry mosaics can disrupt canonical LD patterns. The authors support this argument by discussing how inadequate tagging has likely contributed to contradictory findings in prior disease-association contexts, including examples where a tag SNP failed in specific subpopulations despite the underlying HLA allele being genuinely associated by direct typing. Their results suggest a pragmatic hierarchy for tag SNP selection: prioritize high LD ((R^2)) as a primary screen, and then confirm classification performance (sensitivity/specificity) and frequency concordance (MAF) within the population under study. Looking forward, the paper motivates two complementary directions: (i) deploying validated tags (such as rs2844821 for A02:01 in multiple African and European populations, and rs3135388 for DQB106:02 in select groups) to enable broader epidemiologic and mechanistic studies, and (ii) discovering new tags—particularly for DRB115:03—in larger, more diverse datasets that couple high-quality HLA typing with genome-wide SNP data, thereby improving the equity and accuracy of immunogenetic inference in MS and other immune-mediated diseases.

Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.

References:
Boullerne, A. I., Goudey, B., Paganini, J., Erlichster, M., Gaitonde, S., & Feinstein, D. L. (2024). Validation of tag SNPs for multiple sclerosis HLA risk alleles across the 1000 genomes panel. Human Immunology, 85(3), 110790.