Detection of effective genes in colon cancer: A machine learning approach
Nowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the secon...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2021-01-01
|
Series: | Informatics in Medicine Unlocked |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2352914821000952 |
id |
doaj-29da3e630aef4a2ca6a19adda7a1172d |
---|---|
record_format |
Article |
spelling |
doaj-29da3e630aef4a2ca6a19adda7a1172d2021-06-19T04:55:13ZengElsevierInformatics in Medicine Unlocked2352-91482021-01-0124100605Detection of effective genes in colon cancer: A machine learning approachMohammad Amin Fahami0Mohamad Roshanzamir1Navid Hoseini Izadi2Vahideh Keyvani3Roohallah Alizadehsani4Department of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, Iran; Corresponding author.Department of Computer Engineering, Faculty of Engineering, Fasa University, 74617-81189, Fasa, IranDepartment of Electrical and Computer Engineering, Isfahan University of Technology, Isfahan, 84156-83111, IranDepartment of Biology, Faculty of Science, Shahid Chamran University of Ahvaz, Ahvaz, IranInstitute for Intelligent Systems Research and Innovation (IISRI), Deakin University, AustraliaNowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the second main cause of women's and men's death worldwide among cancers. Hence, many researchers have been trying to provide new methods for early diagnosis of colon cancer. In this study, we apply statistical hypothesis tests such as t-test and Mann–Whitney–Wilcoxon and machine learning methods such as Neural Network, KNN and Decision Tree to detect the most effective genes in the vital status of colon cancer patients. We normalize the dataset using a new two-step method. In the first step, the genes within each sample (patient) are normalized to have zero mean and unit variance. In the second step, normalization is done for each gene across the whole dataset. Analyzing the results shows that this normalization method is more efficient than the others and improves the overall performance of the research. Afterwards, we apply unsupervised learning methods to find the meaningful structures in colon cancer gene expressions. In this regard, the dimensionality of the dataset is reduced by employing Principle Component Analysis (PCA). Next, we cluster the patients according to the PCA extracted features. We then check the labeling results of unsupervised learning methods using different supervised learning algorithms. Finally, we determine genes which have major impact on colon cancer mortality rate in each cluster. Our conducted study is the first which suggests that the colon cancer patients can be categorized into two clusters. In each cluster, 20 effective genes were extracted which can be important for early diagnosis of colon cancer. Many of these genes have been identified for the first time.http://www.sciencedirect.com/science/article/pii/S2352914821000952Colon cancerMachine learningUnsupervised learningDimension reductionStatistical hypothesis tests |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mohammad Amin Fahami Mohamad Roshanzamir Navid Hoseini Izadi Vahideh Keyvani Roohallah Alizadehsani |
spellingShingle |
Mohammad Amin Fahami Mohamad Roshanzamir Navid Hoseini Izadi Vahideh Keyvani Roohallah Alizadehsani Detection of effective genes in colon cancer: A machine learning approach Informatics in Medicine Unlocked Colon cancer Machine learning Unsupervised learning Dimension reduction Statistical hypothesis tests |
author_facet |
Mohammad Amin Fahami Mohamad Roshanzamir Navid Hoseini Izadi Vahideh Keyvani Roohallah Alizadehsani |
author_sort |
Mohammad Amin Fahami |
title |
Detection of effective genes in colon cancer: A machine learning approach |
title_short |
Detection of effective genes in colon cancer: A machine learning approach |
title_full |
Detection of effective genes in colon cancer: A machine learning approach |
title_fullStr |
Detection of effective genes in colon cancer: A machine learning approach |
title_full_unstemmed |
Detection of effective genes in colon cancer: A machine learning approach |
title_sort |
detection of effective genes in colon cancer: a machine learning approach |
publisher |
Elsevier |
series |
Informatics in Medicine Unlocked |
issn |
2352-9148 |
publishDate |
2021-01-01 |
description |
Nowadays, a variety of cancers have become common among humans which unfortunately are the cause of death for many of these people. Early detection and diagnosis of cancers can have a significant impact on the survival of patients and treatment cost reduction. Colon cancer is the third and the second main cause of women's and men's death worldwide among cancers. Hence, many researchers have been trying to provide new methods for early diagnosis of colon cancer. In this study, we apply statistical hypothesis tests such as t-test and Mann–Whitney–Wilcoxon and machine learning methods such as Neural Network, KNN and Decision Tree to detect the most effective genes in the vital status of colon cancer patients. We normalize the dataset using a new two-step method. In the first step, the genes within each sample (patient) are normalized to have zero mean and unit variance. In the second step, normalization is done for each gene across the whole dataset. Analyzing the results shows that this normalization method is more efficient than the others and improves the overall performance of the research. Afterwards, we apply unsupervised learning methods to find the meaningful structures in colon cancer gene expressions. In this regard, the dimensionality of the dataset is reduced by employing Principle Component Analysis (PCA). Next, we cluster the patients according to the PCA extracted features. We then check the labeling results of unsupervised learning methods using different supervised learning algorithms. Finally, we determine genes which have major impact on colon cancer mortality rate in each cluster. Our conducted study is the first which suggests that the colon cancer patients can be categorized into two clusters. In each cluster, 20 effective genes were extracted which can be important for early diagnosis of colon cancer. Many of these genes have been identified for the first time. |
topic |
Colon cancer Machine learning Unsupervised learning Dimension reduction Statistical hypothesis tests |
url |
http://www.sciencedirect.com/science/article/pii/S2352914821000952 |
work_keys_str_mv |
AT mohammadaminfahami detectionofeffectivegenesincoloncanceramachinelearningapproach AT mohamadroshanzamir detectionofeffectivegenesincoloncanceramachinelearningapproach AT navidhoseiniizadi detectionofeffectivegenesincoloncanceramachinelearningapproach AT vahidehkeyvani detectionofeffectivegenesincoloncanceramachinelearningapproach AT roohallahalizadehsani detectionofeffectivegenesincoloncanceramachinelearningapproach |
_version_ |
1721371717725585408 |