A genetic programming approach to oral cancer prognosis

Background The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of...

Full description

Bibliographic Details
Main Authors: Mei Sze Tan, Jing Wei Tan, Siow-Wee Chang, Hwa Jen Yap, Sameem Abdul Kareem, Rosnah Binti Zain
Format: Article
Language:English
Published: PeerJ Inc. 2016-09-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/2482.pdf
id doaj-4b4eff8257134b548b9478ac3ebfb7a2
record_format Article
spelling doaj-4b4eff8257134b548b9478ac3ebfb7a22020-11-25T00:32:09ZengPeerJ Inc.PeerJ2167-83592016-09-014e248210.7717/peerj.2482A genetic programming approach to oral cancer prognosisMei Sze Tan0Jing Wei Tan1Siow-Wee Chang2Hwa Jen Yap3Sameem Abdul Kareem4Rosnah Binti Zain5Bioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, MalaysiaBioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, MalaysiaBioinformatics Program, Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, MalaysiaDepartment of Mechanical Engineering, Faculty of Engineering, University of Malaya, Kuala Lumpur, MalaysiaDepartment of Artificial Intelligence, Faculty of Computer Science & Information Technology, University of Malaya, Kuala Lumpur, MalaysiaOral Cancer Research & Coordinating Centre (OCRCC), Faculty of Dentistry, University of Malaya, Kuala Lumpur, MalaysiaBackground The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. Method GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. Result The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Discussion Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.https://peerj.com/articles/2482.pdfGenetic ProgrammingOral cancer prognosisMachine learningFeature selection
collection DOAJ
language English
format Article
sources DOAJ
author Mei Sze Tan
Jing Wei Tan
Siow-Wee Chang
Hwa Jen Yap
Sameem Abdul Kareem
Rosnah Binti Zain
spellingShingle Mei Sze Tan
Jing Wei Tan
Siow-Wee Chang
Hwa Jen Yap
Sameem Abdul Kareem
Rosnah Binti Zain
A genetic programming approach to oral cancer prognosis
PeerJ
Genetic Programming
Oral cancer prognosis
Machine learning
Feature selection
author_facet Mei Sze Tan
Jing Wei Tan
Siow-Wee Chang
Hwa Jen Yap
Sameem Abdul Kareem
Rosnah Binti Zain
author_sort Mei Sze Tan
title A genetic programming approach to oral cancer prognosis
title_short A genetic programming approach to oral cancer prognosis
title_full A genetic programming approach to oral cancer prognosis
title_fullStr A genetic programming approach to oral cancer prognosis
title_full_unstemmed A genetic programming approach to oral cancer prognosis
title_sort genetic programming approach to oral cancer prognosis
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2016-09-01
description Background The potential of genetic programming (GP) on various fields has been attained in recent years. In bio-medical field, many researches in GP are focused on the recognition of cancerous cells and also on gene expression profiling data. In this research, the aim is to study the performance of GP on the survival prediction of a small sample size of oral cancer prognosis dataset, which is the first study in the field of oral cancer prognosis. Method GP is applied on an oral cancer dataset that contains 31 cases collected from the Malaysia Oral Cancer Database and Tissue Bank System (MOCDTBS). The feature subsets that is automatically selected through GP were noted and the influences of this subset on the results of GP were recorded. In addition, a comparison between the GP performance and that of the Support Vector Machine (SVM) and logistic regression (LR) are also done in order to verify the predictive capabilities of the GP. Result The result shows that GP performed the best (average accuracy of 83.87% and average AUROC of 0.8341) when the features selected are smoking, drinking, chewing, histological differentiation of SCC, and oncogene p63. In addition, based on the comparison results, we found that the GP outperformed the SVM and LR in oral cancer prognosis. Discussion Some of the features in the dataset are found to be statistically co-related. This is because the accuracy of the GP prediction drops when one of the feature in the best feature subset is excluded. Thus, GP provides an automatic feature selection function, which chooses features that are highly correlated to the prognosis of oral cancer. This makes GP an ideal prediction model for cancer clinical and genomic data that can be used to aid physicians in their decision making stage of diagnosis or prognosis.
topic Genetic Programming
Oral cancer prognosis
Machine learning
Feature selection
url https://peerj.com/articles/2482.pdf
work_keys_str_mv AT meiszetan ageneticprogrammingapproachtooralcancerprognosis
AT jingweitan ageneticprogrammingapproachtooralcancerprognosis
AT siowweechang ageneticprogrammingapproachtooralcancerprognosis
AT hwajenyap ageneticprogrammingapproachtooralcancerprognosis
AT sameemabdulkareem ageneticprogrammingapproachtooralcancerprognosis
AT rosnahbintizain ageneticprogrammingapproachtooralcancerprognosis
AT meiszetan geneticprogrammingapproachtooralcancerprognosis
AT jingweitan geneticprogrammingapproachtooralcancerprognosis
AT siowweechang geneticprogrammingapproachtooralcancerprognosis
AT hwajenyap geneticprogrammingapproachtooralcancerprognosis
AT sameemabdulkareem geneticprogrammingapproachtooralcancerprognosis
AT rosnahbintizain geneticprogrammingapproachtooralcancerprognosis
_version_ 1725320569367298048