Polygenic Risk Score Prediction of Multiple Sclerosis in South Asian Ancestry: Performance, Portability, and Equity
Multiple sclerosis (MS) is a complex immune-mediated disease in which inherited susceptibility is distributed across many common genetic variants, with the major histocompatibility complex (MHC) locus exerting the single largest effect on risk. The growing ability to aggregate these effects into polygenic risk scores (PRS) raises the prospect of identifying individuals at higher future risk for research enrichment (e.g., prevention trials) and, potentially, earlier intervention during a prodromal phase. However, most large genome-wide association studies (GWAS) used to derive PRS have been conducted primarily in European-ancestry populations, creating a well-recognized portability problem when scores are applied to other ancestries—an issue that can exacerbate inequities if PRS become clinically influential.
Study objective and cohorts: testing PRS transferability into South Asian ancestry
Breedon and colleagues directly evaluated whether a European-derived MS PRS underperforms in individuals of South Asian ancestry compared with a European-ancestry cohort, using two large UK resources. The South Asian-ancestry analysis leveraged Genes & Health (G&H), a longitudinal cohort of British–Bangladeshi and British–Pakistani participants; after quality control, 40,532 individuals were analysed, including 42 MS cases and 40,490 controls ascertained from linked electronic health records. For comparison, the authors conducted parallel analyses in UK Biobank (UKB), restricting to genetically European-ancestry participants (with a large overall sample of 2,091 MS cases and 374,866 controls) and additionally using resampling to match G&H case/control counts to reduce sample-size-driven differences.
Methodological backbone: PRS construction, MHC stratification, and evaluation metrics
PRS were constructed using PRSice-2 with a clumping-and-thresholding framework, applying effect sizes from the International Multiple Sclerosis Genetics Consortium (IMSGC) 2019 GWAS meta-analysis. The authors generated a broad grid of scores (336 total), varying linkage disequilibrium clumping thresholds (R² from 0.001 to 0.6) and GWAS P-value thresholds (from 1×10⁻⁸ to 0.5). Importantly, they analysed PRS (i) including the MHC region, (ii) excluding the MHC region, and (iii) restricted to MHC-only variants (MHC defined as chr6: 25,000,000–35,000,000 in hg38). Predictive performance was quantified primarily via Nagelkerke’s pseudo-R² adjusted for case ascertainment (assuming MS prevalence 0.002) with covariate adjustment (age, sex, and genetic principal components 1–4), complemented by odds ratios across PRS quartiles, discrimination (AUC), and calibration plots.
Core findings in Genes & Health: modest liability explained and limited MHC contribution
In G&H, European-derived PRS were statistically associated with MS status but explained only a small fraction of liability. The “optimal” PRS including the MHC explained ~1.1% of liability (adjusted pseudo-R² = 0.011; P = 0.033), while the optimal PRS excluding the MHC explained ~1.5% (adjusted pseudo-R² = 0.015; P = 0.015). Notably, an MHC-only score did not correlate with MS case status (P = 0.19), and inclusion of the MHC did not clearly improve performance relative to non-MHC scoring—an unexpected pattern given the MHC’s established importance in MS genetics. The figure panel summarizing G&H results (Figure 2 on page 5) illustrates partially separated PRS distributions for cases versus controls, quartile-based odds ratios with wide confidence intervals, ROC curves, and calibration indicating that fitted risks tracked observed prevalence reasonably across quartiles despite the small case count.
Cross-ancestry comparison in UK Biobank: stronger prediction in Europeans, even after resampling
Applying comparable methods in UKB produced substantially higher explained liability than in G&H, consistent with ancestry-matched discovery and target datasets. In the full European-ancestry UKB analysis, the PRS including the MHC explained 4.4% of liability (adjusted pseudo-R² = 0.044; extremely small P-value), while the non-MHC PRS explained 2.3% (adjusted pseudo-R² = 0.023). To address the possibility that sample size alone drives apparent differences, the authors repeatedly subsampled UKB to match G&H (42 cases and 40,490 controls) across 1,000 iterations and compared the resulting distribution of explained liability to the G&H point estimates. The permutation-based comparison indicated that the MHC-inclusive PRS explained more liability in the European-ancestry subsamples than in G&H (empirical P = 0.01), whereas the evidence for a difference in non-MHC PRS was suggestive but less definitive (empirical P = 0.10). Figure 3 (page 6) visually summarizes these contrasts, showing the higher UKB distribution relative to the G&H vertical reference lines, especially for the MHC-inclusive score.
Biological and statistical interpretation: LD/allele-frequency effects and the “missing” MHC boost
The authors’ interpretation aligns with a dominant explanation in cross-ancestry PRS research: when many PRS variants are not themselves causal but instead tag causal alleles through linkage disequilibrium (LD), differences in LD structure and allele frequencies between Europeans and South Asians reduce tagging efficiency and thus predictive accuracy. This framework accommodates two key observations in the paper: (i) European-derived PRS retain some predictive signal in South Asian ancestry—consistent with substantial sharing of underlying MS genetic architecture across populations—yet (ii) performance declines relative to Europeans. The particularly weak incremental value of including MHC in G&H is discussed as potentially arising from limited statistical power (only 42 cases), different causal human leukocyte antigen (HLA) configurations, and/or poor tagging of causal HLA alleles by the European GWAS variants in South Asian LD backgrounds; the authors emphasize that larger South Asian datasets are required to disambiguate these possibilities.
Implications, limitations, and next steps: towards equitable genomic prediction in MS
This study makes a practical and policy-relevant point: if MS PRS are developed and validated mainly in Europeans, their downstream use for trial recruitment, prevention strategies, or risk stratification could systematically underperform in non-European groups, thereby reinforcing existing health inequities. At the same time, several limitations temper overinterpretation: reliance on electronic health record coding may miss or misclassify cases; analysing and evaluating PRS in the same G&H dataset raises overfitting risk; and technical differences between cohorts (genotyping arrays and imputation panels) mean that PRS variant content is not perfectly identical even under the same clumping/thresholding settings. The authors’ central prescription is therefore methodological and infrastructural: prioritize ancestrally diverse MS GWAS (and improved multi-ancestry PRS methods) so that genomic prediction can be both accurate and equitable across populations.
Disclaimer: This blog post is based on the provided research article and is intended for informational purposes only. It is not intended to provide medical advice. Please consult with a healthcare professional for any health concerns.
References:
Breedon, J. R., Marshall, C. R., Giovannoni, G., van Heel, D. A., Dobson, R., & Jacobs, B. M. (2023). Polygenic risk score prediction of multiple sclerosis in individuals of South Asian ancestry. Brain Communications, 5(2), fcad041.
