MSGD: A Comprehensive Database for Multiple Sclerosis Research
Multiple sclerosis (MS) is the most prevalent inflammatory demyelinating disease of the central nervous system. The advent of 'omics' technologies, encompassing genomics, transcriptomics, and proteomics, coupled with related drug information, has revolutionized our understanding of MS. However, the sheer volume and fragmentation of these data across numerous references pose significant challenges to researchers. Addressing this issue, the Multiple Sclerosis Gene Database (MSGD) has been meticulously curated to provide a centralized, accessible repository of MS-related genomic, transcriptomic, proteomic, and drug data. This initiative aims to facilitate a deeper understanding of gene mechanisms and enhance the prevention, diagnosis, and treatment of MS.
Data Collection and Management
The development of MSGD involved several rigorous steps to ensure data quality and reliability:
Literature Screening: A comprehensive PubMed search yielded 5485 publications on multiple sclerosis, experimental autoimmune encephalomyelitis, and related genes.
Data Extraction: Information from these studies was meticulously extracted, reviewed, and categorized by a team of researchers.
Standardization: Gene names and variant descriptions were standardized using authoritative sources like the Mouse Genome Informatics, HUGO Gene Nomenclature Committee, Rat Genome Database, and dbSNP.
Database Compilation: The curated data was compiled into MSGD, resulting in 6255 entries encompassing gene variations, RNA, proteins, gene knockouts, drugs, and high-throughput data.
Database Construction
MSGD utilizes MySQL for data management, with a web interface built using Java Server Pages, and data processing scripts written in Java. The web service is hosted on the Apache Tomcat Web server, ensuring free access to the database at MSGD Database.
The Web Interface for MSGD
Search Capabilities: Users can perform searches based on gene variations and symbols, utilizing both fuzzy and advanced search options. Search results are displayed in a sortable table format, with hyperlinks to detailed information.
Browse Functionality: The 'Browse' page allows exploration of data based on gene variations, transcriptomics, proteomics, and drugs. Hot Points highlight genes of particular interest in recent publications.
Data Download and Submission: The 'Download' section provides free access to all collected data, while the 'Submit' page allows users to contribute new MS-gene association data for future database updates.
Data Statistics in MSGD
The analysis of yearly publication counts reveals a significant increase in MS-related genetic research over the past decade. Genetic evidence is widely distributed across chromosomes, with the highest concentration on chromosome 6. The top five genes associated with MS risk are HLA-DRB1, IL7R, TNF, VDR, and HLA-DQB1. Functional enrichment analysis indicates a significant involvement of these genes in cell membrane functions, antigen presentation, and immune cell activation pathways. A dual-part network of MS genes and drugs, constructed using Cytoscape, visually represents experimentally supported associations.
MSGD addresses the critical need for a comprehensive, high-quality database of MS-related gene data. By integrating diverse data types and offering detailed gene information, MSGD provides a valuable resource for researchers exploring gene-MS relationships across different species. Future updates will include newly validated gene-MS associations, interface optimizations, enhanced data integration, and incorporation of gene targets for approved drugs and clinical trials.
Conclusion
MSGD stands as a significant resource for MS research, offering an extensive and accessible database of genetic, transcriptomic, proteomic, and drug information. Regular updates and continuous data integration will ensure that MSGD remains an invaluable tool for researchers, aiding in the advancement of MS diagnosis and treatment.
Data Availability
All data used in the analysis can be obtained at MSGD Database. http://bio-bigdata.hrbmu.edu.cn/MSGD/index.html
References
Wu, T., Hou, Y., Xin, G., Niu, J., Peng, S., Xu, F., ... & Hao, J. (2024). MSGD: a manually curated database of genomic, transcriptomic, proteomic and drug information for multiple sclerosis. Database, 2024, baae037.