Detection of effective genes in colon cancer: A machine learning approach

Nowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the secon...

Full description

Bibliographic Details
Main Authors: Mohammad Amin Fahami, Mohamad Roshanzamir, Navid Hoseini Izadi, Vahideh Keyvani, Roohallah Alizadehsani
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Informatics in Medicine Unlocked
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2352914821000952
id doaj-29da3e630aef4a2ca6a19adda7a1172d
record_format Article
spelling doaj-29da3e630aef4a2ca6a19adda7a1172d2021-06-19T04:55:13ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100605Detection of effective genes in colon cancer: A machine learning approachMohammad Amin Fahami0Mohamad Roshanzamir1Navid Hoseini Izadi2Vahideh Keyvani3Roohallah Alizadehsani4Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran; Corresponding author.Department of Computer Engineering, Faculty of Engineering, Fasa University, 74617-81189, Fasa, IranDepartment of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, IranDepartment of Biology, Faculty of Science, Shahid Chamran University of Ahvaz, Ahvaz, IranInstitute for Intelligent Systems Research and Innovation (IISRI), Deakin University, AustraliaNowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the second main cause of women's and men's death worldwide among cancers. Hence, many researchers have been trying to provide new methods for early diagnosis of colon cancer. In this study, we apply statistical hypothesis tests such as t-test and Mann–Whitney–Wilcoxon and machine learning methods such as Neural Network, KNN and Decision Tree to detect the most effective genes in the vital status of colon cancer patients. We normalize the dataset using a new two-step method. In the first step, the genes within each sample (patient) are normalized to have zero mean and unit variance. In the second step, normalization is done for each gene across the whole dataset. Analyzing the results shows that this normalization method is more efficient than the others and improves the overall performance of the research. Afterwards, we apply unsupervised learning methods to find the meaningful structures in colon cancer gene expressions. In this regard, the dimensionality of the dataset is reduced by employing Principle Component Analysis (PCA). Next, we cluster the patients according to the PCA extracted features. We then check the labeling results of unsupervised learning methods using different supervised learning algorithms. Finally, we determine genes which have major impact on colon cancer mortality rate in each cluster. Our conducted study is the first which suggests that the colon cancer patients can be categorized into two clusters. In each cluster, 20 effective genes were extracted which can be important for early diagnosis of colon cancer. Many of these genes have been identified for the first time.http://www.sciencedirect.com/science/article/pii/S2352914821000952Colon cancerMachine learningUnsupervised learningDimension reductionStatistical hypothesis tests
collection DOAJ
language English
format Article
sources DOAJ
author Mohammad Amin Fahami
Mohamad Roshanzamir
Navid Hoseini Izadi
Vahideh Keyvani
Roohallah Alizadehsani
spellingShingle Mohammad Amin Fahami
Mohamad Roshanzamir
Navid Hoseini Izadi
Vahideh Keyvani
Roohallah Alizadehsani
Detection of effective genes in colon cancer: A machine learning approach
Informatics in Medicine Unlocked
Colon cancer
Machine learning
Unsupervised learning
Dimension reduction
Statistical hypothesis tests
author_facet Mohammad Amin Fahami
Mohamad Roshanzamir
Navid Hoseini Izadi
Vahideh Keyvani
Roohallah Alizadehsani
author_sort Mohammad Amin Fahami
title Detection of effective genes in colon cancer: A machine learning approach
title_short Detection of effective genes in colon cancer: A machine learning approach
title_full Detection of effective genes in colon cancer: A machine learning approach
title_fullStr Detection of effective genes in colon cancer: A machine learning approach
title_full_unstemmed Detection of effective genes in colon cancer: A machine learning approach
title_sort detection of effective genes in colon cancer: a machine learning approach
publisher Elsevier
series Informatics in Medicine Unlocked
issn 2352-9148
publishDate 2021-01-01
description Nowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the second main cause of women's and men's death worldwide among cancers. Hence, many researchers have been trying to provide new methods for early diagnosis of colon cancer. In this study, we apply statistical hypothesis tests such as t-test and Mann–Whitney–Wilcoxon and machine learning methods such as Neural Network, KNN and Decision Tree to detect the most effective genes in the vital status of colon cancer patients. We normalize the dataset using a new two-step method. In the first step, the genes within each sample (patient) are normalized to have zero mean and unit variance. In the second step, normalization is done for each gene across the whole dataset. Analyzing the results shows that this normalization method is more efficient than the others and improves the overall performance of the research. Afterwards, we apply unsupervised learning methods to find the meaningful structures in colon cancer gene expressions. In this regard, the dimensionality of the dataset is reduced by employing Principle Component Analysis (PCA). Next, we cluster the patients according to the PCA extracted features. We then check the labeling results of unsupervised learning methods using different supervised learning algorithms. Finally, we determine genes which have major impact on colon cancer mortality rate in each cluster. Our conducted study is the first which suggests that the colon cancer patients can be categorized into two clusters. In each cluster, 20 effective genes were extracted which can be important for early diagnosis of colon cancer. Many of these genes have been identified for the first time.
topic Colon cancer
Machine learning
Unsupervised learning
Dimension reduction
Statistical hypothesis tests
url http://www.sciencedirect.com/science/article/pii/S2352914821000952
work_keys_str_mv AT mohammadaminfahami detectionofeffectivegenesincoloncanceramachinelearningapproach
AT mohamadroshanzamir detectionofeffectivegenesincoloncanceramachinelearningapproach
AT navidhoseiniizadi detectionofeffectivegenesincoloncanceramachinelearningapproach
AT vahidehkeyvani detectionofeffectivegenesincoloncanceramachinelearningapproach
AT roohallahalizadehsani detectionofeffectivegenesincoloncanceramachinelearningapproach
_version_ 1721371717725585408