Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods

Abstract Background Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to...

Full description

Bibliographic Details
Main Authors: Qidong Cai, Boxue He, Pengfei Zhang, Zhenyu Zhao, Xiong Peng, Yuqian Zhang, Hui Xie, Xiang Wang
Format: Article
Language:English
Published: BMC 2020-12-01
Series:Journal of Translational Medicine
Subjects:
Online Access:https://doi.org/10.1186/s12967-020-02635-y
id doaj-29a314920dba4008a0a4865dc55a70a8
record_format Article
spelling doaj-29a314920dba4008a0a4865dc55a70a82020-12-07T19:32:59ZengBMCJournal of Translational Medicine1479-58762020-12-0118111510.1186/s12967-020-02635-yExploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methodsQidong Cai0Boxue He1Pengfei Zhang2Zhenyu Zhao3Xiong Peng4Yuqian Zhang5Hui Xie6Xiang Wang7Department of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityDepartment of Thoracic Surgery, The Second Xiangya Hospital, Central South UniversityAbstract Background Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. Method RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). Results Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. Conclusion In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.https://doi.org/10.1186/s12967-020-02635-yLung adenocarcinomaAlternative splicingRandom forestsSplicing switchMetastasisPrognosis
collection DOAJ
language English
format Article
sources DOAJ
author Qidong Cai
Boxue He
Pengfei Zhang
Zhenyu Zhao
Xiong Peng
Yuqian Zhang
Hui Xie
Xiang Wang
spellingShingle Qidong Cai
Boxue He
Pengfei Zhang
Zhenyu Zhao
Xiong Peng
Yuqian Zhang
Hui Xie
Xiang Wang
Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
Journal of Translational Medicine
Lung adenocarcinoma
Alternative splicing
Random forests
Splicing switch
Metastasis
Prognosis
author_facet Qidong Cai
Boxue He
Pengfei Zhang
Zhenyu Zhao
Xiong Peng
Yuqian Zhang
Hui Xie
Xiang Wang
author_sort Qidong Cai
title Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_short Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_full Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_fullStr Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_full_unstemmed Exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
title_sort exploration of predictive and prognostic alternative splicing signatures in lung adenocarcinoma using machine learning methods
publisher BMC
series Journal of Translational Medicine
issn 1479-5876
publishDate 2020-12-01
description Abstract Background Alternative splicing (AS) plays critical roles in generating protein diversity and complexity. Dysregulation of AS underlies the initiation and progression of tumors. Machine learning approaches have emerged as efficient tools to identify promising biomarkers. It is meaningful to explore pivotal AS events (ASEs) to deepen understanding and improve prognostic assessments of lung adenocarcinoma (LUAD) via machine learning algorithms. Method RNA sequencing data and AS data were extracted from The Cancer Genome Atlas (TCGA) database and TCGA SpliceSeq database. Using several machine learning methods, we identified 24 pairs of LUAD-related ASEs implicated in splicing switches and a random forest-based classifiers for identifying lymph node metastasis (LNM) consisting of 12 ASEs. Furthermore, we identified key prognosis-related ASEs and established a 16-ASE-based prognostic model to predict overall survival for LUAD patients using Cox regression model, random survival forest analysis, and forward selection model. Bioinformatics analyses were also applied to identify underlying mechanisms and associated upstream splicing factors (SFs). Results Each pair of ASEs was spliced from the same parent gene, and exhibited perfect inverse intrapair correlation (correlation coefficient = − 1). The 12-ASE-based classifier showed robust ability to evaluate LNM status of LUAD patients with the area under the receiver operating characteristic (ROC) curve (AUC) more than 0.7 in fivefold cross-validation. The prognostic model performed well at 1, 3, 5, and 10 years in both the training cohort and internal test cohort. Univariate and multivariate Cox regression indicated the prognostic model could be used as an independent prognostic factor for patients with LUAD. Further analysis revealed correlations between the prognostic model and American Joint Committee on Cancer stage, T stage, N stage, and living status. The splicing network constructed of survival-related SFs and ASEs depicts regulatory relationships between them. Conclusion In summary, our study provides insight into LUAD researches and managements based on these AS biomarkers.
topic Lung adenocarcinoma
Alternative splicing
Random forests
Splicing switch
Metastasis
Prognosis
url https://doi.org/10.1186/s12967-020-02635-y
work_keys_str_mv AT qidongcai explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT boxuehe explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT pengfeizhang explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT zhenyuzhao explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT xiongpeng explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT yuqianzhang explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT huixie explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
AT xiangwang explorationofpredictiveandprognosticalternativesplicingsignaturesinlungadenocarcinomausingmachinelearningmethods
_version_ 1724397242367868928