Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques

Cervical cancer is the fourth most common malignant disease in women’s worldwide. In most cases, cervical cancer symptoms are not noticeable at its early stages. There are a lot of factors that increase the risk of developing cervical cancer like human papilloma virus, sexual transmitted...

Full description

Bibliographic Details
Main Authors: Sherif F. Abdoh, Mohamed Abo Rizka, Fahima A. Maghraby
Format: Article
Language:English
Published: IEEE 2018-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8482260/
id doaj-6ba0c27fa8ae4dd3aa8e2d68b0f5ff32
record_format Article
spelling doaj-6ba0c27fa8ae4dd3aa8e2d68b0f5ff322021-03-29T21:41:10ZengIEEEIEEE Access2169-35362018-01-016594755948510.1109/ACCESS.2018.28740638482260Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction TechniquesSherif F. Abdoh0https://orcid.org/0000-0002-5039-3354Mohamed Abo Rizka1Fahima A. Maghraby2Department of Computer Science, Arab Academy for Science, Technology and Maritime Transport, Cairo, EgyptDepartment of Computer Science, Arab Academy for Science, Technology and Maritime Transport, Cairo, EgyptDepartment of Computer Science, Arab Academy for Science, Technology and Maritime Transport, Cairo, EgyptCervical cancer is the fourth most common malignant disease in women’s worldwide. In most cases, cervical cancer symptoms are not noticeable at its early stages. There are a lot of factors that increase the risk of developing cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Identifying those factors and building a classification model to classify whether the cases are cervical cancer or not is a challenging research. This study aims at using cervical cancer risk factors to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Most medical data sets are often imbalanced because the number of patients is much less than the number of non-patients. Because of the imbalance of the used data set, SMOTE is used to solve this problem. The data set consists of 32 risk factors and four target variables: Hinselmann, Schiller, Cytology, and Biopsy. After comparing the results, we find that the combination of the random forest classification technique with SMOTE improve the classification performance.https://ieeexplore.ieee.org/document/8482260/Cervical cancerrandom forestrisk factorsSMOTE
collection DOAJ
language English
format Article
sources DOAJ
author Sherif F. Abdoh
Mohamed Abo Rizka
Fahima A. Maghraby
spellingShingle Sherif F. Abdoh
Mohamed Abo Rizka
Fahima A. Maghraby
Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
IEEE Access
Cervical cancer
random forest
risk factors
SMOTE
author_facet Sherif F. Abdoh
Mohamed Abo Rizka
Fahima A. Maghraby
author_sort Sherif F. Abdoh
title Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
title_short Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
title_full Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
title_fullStr Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
title_full_unstemmed Cervical Cancer Diagnosis Using Random Forest Classifier With SMOTE and Feature Reduction Techniques
title_sort cervical cancer diagnosis using random forest classifier with smote and feature reduction techniques
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2018-01-01
description Cervical cancer is the fourth most common malignant disease in women’s worldwide. In most cases, cervical cancer symptoms are not noticeable at its early stages. There are a lot of factors that increase the risk of developing cervical cancer like human papilloma virus, sexual transmitted diseases, and smoking. Identifying those factors and building a classification model to classify whether the cases are cervical cancer or not is a challenging research. This study aims at using cervical cancer risk factors to build classification model using Random Forest (RF) classification technique with the synthetic minority oversampling technique (SMOTE) and two feature reduction techniques recursive feature elimination and principle component analysis (PCA). Most medical data sets are often imbalanced because the number of patients is much less than the number of non-patients. Because of the imbalance of the used data set, SMOTE is used to solve this problem. The data set consists of 32 risk factors and four target variables: Hinselmann, Schiller, Cytology, and Biopsy. After comparing the results, we find that the combination of the random forest classification technique with SMOTE improve the classification performance.
topic Cervical cancer
random forest
risk factors
SMOTE
url https://ieeexplore.ieee.org/document/8482260/
work_keys_str_mv AT sheriffabdoh cervicalcancerdiagnosisusingrandomforestclassifierwithsmoteandfeaturereductiontechniques
AT mohamedaborizka cervicalcancerdiagnosisusingrandomforestclassifierwithsmoteandfeaturereductiontechniques
AT fahimaamaghraby cervicalcancerdiagnosisusingrandomforestclassifierwithsmoteandfeaturereductiontechniques
_version_ 1724192446139596800