Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations

An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine lea...

Full description

Bibliographic Details
Main Authors: Yukun Chen, Jingchun Sun, Liang-Chin Huang, Hua Xu, Zhongming Zhao
Format: Article
Language:English
Published: Hindawi Limited 2015-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2015/491502
id doaj-db10468de49e415588b2c0bc88345ddd
record_format Article
spelling doaj-db10468de49e415588b2c0bc88345ddd2020-11-24T23:02:33ZengHindawi LimitedBioMed Research International2314-61332314-61412015-01-01201510.1155/2015/491502491502Classification of Cancer Primary Sites Using Machine Learning and Somatic MutationsYukun Chen0Jingchun Sun1Liang-Chin Huang2Hua Xu3Zhongming Zhao4Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USASchool of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USASchool of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USASchool of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX 77030, USADepartment of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, TN 37203, USAAn accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data.http://dx.doi.org/10.1155/2015/491502
collection DOAJ
language English
format Article
sources DOAJ
author Yukun Chen
Jingchun Sun
Liang-Chin Huang
Hua Xu
Zhongming Zhao
spellingShingle Yukun Chen
Jingchun Sun
Liang-Chin Huang
Hua Xu
Zhongming Zhao
Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
BioMed Research International
author_facet Yukun Chen
Jingchun Sun
Liang-Chin Huang
Hua Xu
Zhongming Zhao
author_sort Yukun Chen
title Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_short Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_full Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_fullStr Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_full_unstemmed Classification of Cancer Primary Sites Using Machine Learning and Somatic Mutations
title_sort classification of cancer primary sites using machine learning and somatic mutations
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2015-01-01
description An accurate classification of human cancer, including its primary site, is important for better understanding of cancer and effective therapeutic strategies development. The available big data of somatic mutations provides us a great opportunity to investigate cancer classification using machine learning. Here, we explored the patterns of 1,760,846 somatic mutations identified from 230,255 cancer patients along with gene function information using support vector machine. Specifically, we performed a multiclass classification experiment over the 17 tumor sites using the gene symbol, somatic mutation, chromosome, and gene functional pathway as predictors for 6,751 subjects. The performance of the baseline using only gene features is 0.57 in accuracy. It was improved to 0.62 when adding the information of mutation and chromosome. Among the predictable primary tumor sites, the prediction of five primary sites (large intestine, liver, skin, pancreas, and lung) could achieve the performance with more than 0.70 in F-measure. The model of the large intestine ranked the first with 0.87 in F-measure. The results demonstrate that the somatic mutation information is useful for prediction of primary tumor sites with machine learning modeling. To our knowledge, this study is the first investigation of the primary sites classification using machine learning and somatic mutation data.
url http://dx.doi.org/10.1155/2015/491502
work_keys_str_mv AT yukunchen classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT jingchunsun classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT liangchinhuang classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT huaxu classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
AT zhongmingzhao classificationofcancerprimarysitesusingmachinelearningandsomaticmutations
_version_ 1725636305972363264