Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles
Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Cancers |
Subjects: | |
Online Access: | https://www.mdpi.com/2072-6694/13/15/3768 |
id |
doaj-e11b35165b294e2d8fb722c7a2170759 |
---|---|
record_format |
Article |
spelling |
doaj-e11b35165b294e2d8fb722c7a21707592021-08-06T15:20:26ZengMDPI AGCancers2072-66942021-07-01133768376810.3390/cancers13153768Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation ProfilesVijayachitra Modhukur0Shakshi Sharma1Mainak Mondal2Ankita Lawarde3Keiu Kask4Rajesh Sharma5Andres Salumets6Competence Centre on Health Technologies, 50411 Tartu, EstoniaInstitute of Computer Science, University of Tartu, 51009 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaInstitute of Computer Science, University of Tartu, 51009 Tartu, EstoniaCompetence Centre on Health Technologies, 50411 Tartu, EstoniaMetastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types.https://www.mdpi.com/2072-6694/13/15/3768DNA methylationTCGAbiomarkersclusteringdifferential methylationmetastasis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Vijayachitra Modhukur Shakshi Sharma Mainak Mondal Ankita Lawarde Keiu Kask Rajesh Sharma Andres Salumets |
spellingShingle |
Vijayachitra Modhukur Shakshi Sharma Mainak Mondal Ankita Lawarde Keiu Kask Rajesh Sharma Andres Salumets Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles Cancers DNA methylation TCGA biomarkers clustering differential methylation metastasis |
author_facet |
Vijayachitra Modhukur Shakshi Sharma Mainak Mondal Ankita Lawarde Keiu Kask Rajesh Sharma Andres Salumets |
author_sort |
Vijayachitra Modhukur |
title |
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles |
title_short |
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles |
title_full |
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles |
title_fullStr |
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles |
title_full_unstemmed |
Machine Learning Approaches to Classify Primary and Metastatic Cancers Using Tissue of Origin-Based DNA Methylation Profiles |
title_sort |
machine learning approaches to classify primary and metastatic cancers using tissue of origin-based dna methylation profiles |
publisher |
MDPI AG |
series |
Cancers |
issn |
2072-6694 |
publishDate |
2021-07-01 |
description |
Metastatic cancers account for up to 90% of cancer-related deaths. The clear differentiation of metastatic cancers from primary cancers is crucial for cancer type identification and developing targeted treatment for each cancer type. DNA methylation patterns are suggested to be an intriguing target for cancer prediction and are also considered to be an important mediator for the transition to metastatic cancer. In the present study, we used 24 cancer types and 9303 methylome samples downloaded from publicly available data repositories, including The Cancer Genome Atlas (TCGA) and the Gene Expression Omnibus (GEO). We constructed machine learning classifiers to discriminate metastatic, primary, and non-cancerous methylome samples. We applied support vector machines (SVM), Naive Bayes (NB), extreme gradient boosting (XGBoost), and random forest (RF) machine learning models to classify the cancer types based on their tissue of origin. RF outperformed the other classifiers, with an average accuracy of 99%. Moreover, we applied local interpretable model-agnostic explanations (LIME) to explain important methylation biomarkers to classify cancer types. |
topic |
DNA methylation TCGA biomarkers clustering differential methylation metastasis |
url |
https://www.mdpi.com/2072-6694/13/15/3768 |
work_keys_str_mv |
AT vijayachitramodhukur machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT shakshisharma machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT mainakmondal machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT ankitalawarde machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT keiukask machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT rajeshsharma machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles AT andressalumets machinelearningapproachestoclassifyprimaryandmetastaticcancersusingtissueoforiginbaseddnamethylationprofiles |
_version_ |
1721218860518998016 |