Decoding the Genetic Puzzle: GWAS
Genome-wide association studies (GWAS) have become a critical tool in understanding the genetic architecture of complex traits and diseases. The advent of large-scale biobanks, such as the UK Biobank (UKBB), has significantly enhanced the power and scope of these studies.
In GWAS, selecting the right study population is crucial. These studies often require large sample sizes to identify genome-wide significant associations. The desired sample size for a study can be determined using power calculations in software tools like CaTS or GPC. The design of GWAS can vary, involving cases and controls for dichotomous traits, or quantitative measurements for quantitative traits. The choice of data resource and study design depends on several factors, including the experimental question, the availability of pre-existing data, and the ease of collecting new data. GWAS often utilize pre-existing resources like biobanks or disease-focused cohorts, which come with their own set of biases, such as collider bias.
Genotyping in GWAS is typically performed using microarrays for common variants or next-generation sequencing methods like whole-exome sequencing (WES) or whole-genome sequencing (WGS) for rare variants. Microarray-based genotyping is more common due to its cost-effectiveness compared to next-generation sequencing. However, the choice of genotyping platform is guided by the study's purpose, and WGS is expected to become more prevalent with the advent of lower-cost technologies.
Data processing in GWAS includes several quality control steps to ensure the reliability of the results. These steps involve removing rare or monomorphic variants, filtering out SNPs missing in a fraction of the cohort, identifying and removing genotyping errors, and ensuring that phenotypes are well matched with genetic data. Software tools like PLINK are specifically designed for analyzing genetic data and conducting these quality control steps.
The UK Biobank has made a significant contribution to the GWAS field by providing a large, well-curated, and deeply measured population-based collection. This has enhanced the ability to estimate effect sizes and apply epidemiological approaches in GWAS. However, the size and complexity of the UK Biobank also bring challenges, such as increased exposure to complications and inferential complexities associated with GWAS.
Overall, the integration of GWAS with large biobanks like the UK Biobank represents a major advance in genetic research, offering deeper insights into the genetic underpinnings of complex diseases and traits. However, it also underscores the importance of careful study design, population selection, and data processing to ensure the accuracy and reliability of the findings.
Reference:
Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), 67.
Mazein, A., Ostaszewski, M., Kuperstein, I., Watterson, S., Le Novère, N., Lefaudeux, D., ... & Auffray, C. (2018). Systems medicine disease maps: community-driven comprehensive representation of disease mechanisms. NPJ systems biology and applications, 4(1), 21.