A 9 mRNAs-based diagnostic signature for rheumatoid arthritis by integrating bioinformatic analysis and machine-learning

Abstract Background Rheumatoid arthritis (RA) is an autoimmune rheumatic disease that carries a substantial burden for both patients and society. Early diagnosis of RA is essential to prevent disease progression and select an optimal therapeutic strategy. However, RA diagnosis is challenging, partly...

Full description

Bibliographic Details
Main Authors: Jianyong Liu, Ningjie Chen
Format: Article
Language:English
Published: BMC 2021-01-01
Series:Journal of Orthopaedic Surgery and Research
Subjects:
Online Access:https://doi.org/10.1186/s13018-020-02180-w
Description
Summary:Abstract Background Rheumatoid arthritis (RA) is an autoimmune rheumatic disease that carries a substantial burden for both patients and society. Early diagnosis of RA is essential to prevent disease progression and select an optimal therapeutic strategy. However, RA diagnosis is challenging, partly due to a lack of reliable biomarkers. Here, we aimed to explore the diagnostic signature and establish a predictive model of RA. Methods The mRNA expression profiling data of GSE17755, containing blood samples of 112 RA patients and 53 healthy control patients, were obtained from the Gene Expression Omnibus (GEO) database, followed by differential expression, GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes) enrichment analysis. A PPI network was constructed to select candidate hub genes, then logistic regression and random forest models were established based on the identified genes. Results Significantly, we identified 52 differentially expressed genes (DEGs), including 16 upregulated genes and 36 downregulated genes in RA samples compared with control samples. GO and KEGG analysis showed that several immune-related cellular processes were particularly enriched. We identified nine hub genes in the PPI network, including CFL1, COTL1, ACTG1, PFN1, LCP1, LCK, HLA-E, FYN, and HLA-DRA. The logistic regression and random forest models based on the nine identified genes reliably distinguished the RA samples from the healthy samples with substantially high AUC. Conclusion The diagnostic logistic regression and random forest models based on nine hub genes reliably predicted the occurrence of RA. Our findings could provide new insights into RA diagnostics.
ISSN:1749-799X