An investigation of XGBoost-based algorithm for breast cancer classification

Breast cancer is one of the leading cancers affecting women around the world. The Computer-Aided Diagnosis (CAD) system is a powerful tool to assist pathologists during the process of diagnosing cancer, which effectively identifies the presence of cancerous cells. A standard CAD system includes proc...

Full description

Bibliographic Details
Main Authors: Xin Yu Liew, Nazia Hameed, Jeremie Clos
Format: Article
Language:English
Published: Elsevier 2021-12-01
Series:Machine Learning with Applications
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2666827021000773
id doaj-37db60bede1e4475a25952f903db01cd
record_format Article
spelling doaj-37db60bede1e4475a25952f903db01cd2021-09-13T04:15:18ZengElsevierMachine Learning with Applications2666-82702021-12-016100154An investigation of XGBoost-based algorithm for breast cancer classificationXin Yu Liew0Nazia Hameed1Jeremie Clos2Corresponding author.; University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, United KingdomUniversity of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, United KingdomUniversity of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 1BB, United KingdomBreast cancer is one of the leading cancers affecting women around the world. The Computer-Aided Diagnosis (CAD) system is a powerful tool to assist pathologists during the process of diagnosing cancer, which effectively identifies the presence of cancerous cells. A standard CAD system includes processes of pre-processing, feature extraction, feature selection and classification. In this paper, we propose an enhanced breast cancer classification technique called Deep Learning and eXtreme Gradient Boosting (DLXGB) on histopathology breast cancer images using the BreaKHis dataset. This method first applies data augmentation and stain normalization for pre-processing, then pre-trained DenseNet201 will automatically learn features within an image and combine with a powerful gradient boosting classifier. The proposed classification technique is designed to classify breast cancer histology images into binary benign and malignant, and additionally one of eight non-overlapping/overlapping categories: i.e., Adenosis (A), Fibroadenoma (F), Phyllodes Tumour (PT), And Tubular Adenoma (TA) Ductal Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma (MC), And Papillary Carcinoma (PC). With DLXGB, we have obtained an accuracy of 97% for both binary and multi-classification improving the exiting work done by researchers using the BreaKHis dataset. The results indicated that this method could produce a powerful prediction for breast cancer image classification.http://www.sciencedirect.com/science/article/pii/S2666827021000773Deep learningExtreme gradient boostingXGBoostMachine learningComputer-aided diagnosisBreast cancer
collection DOAJ
language English
format Article
sources DOAJ
author Xin Yu Liew
Nazia Hameed
Jeremie Clos
spellingShingle Xin Yu Liew
Nazia Hameed
Jeremie Clos
An investigation of XGBoost-based algorithm for breast cancer classification
Machine Learning with Applications
Deep learning
Extreme gradient boosting
XGBoost
Machine learning
Computer-aided diagnosis
Breast cancer
author_facet Xin Yu Liew
Nazia Hameed
Jeremie Clos
author_sort Xin Yu Liew
title An investigation of XGBoost-based algorithm for breast cancer classification
title_short An investigation of XGBoost-based algorithm for breast cancer classification
title_full An investigation of XGBoost-based algorithm for breast cancer classification
title_fullStr An investigation of XGBoost-based algorithm for breast cancer classification
title_full_unstemmed An investigation of XGBoost-based algorithm for breast cancer classification
title_sort investigation of xgboost-based algorithm for breast cancer classification
publisher Elsevier
series Machine Learning with Applications
issn 2666-8270
publishDate 2021-12-01
description Breast cancer is one of the leading cancers affecting women around the world. The Computer-Aided Diagnosis (CAD) system is a powerful tool to assist pathologists during the process of diagnosing cancer, which effectively identifies the presence of cancerous cells. A standard CAD system includes processes of pre-processing, feature extraction, feature selection and classification. In this paper, we propose an enhanced breast cancer classification technique called Deep Learning and eXtreme Gradient Boosting (DLXGB) on histopathology breast cancer images using the BreaKHis dataset. This method first applies data augmentation and stain normalization for pre-processing, then pre-trained DenseNet201 will automatically learn features within an image and combine with a powerful gradient boosting classifier. The proposed classification technique is designed to classify breast cancer histology images into binary benign and malignant, and additionally one of eight non-overlapping/overlapping categories: i.e., Adenosis (A), Fibroadenoma (F), Phyllodes Tumour (PT), And Tubular Adenoma (TA) Ductal Carcinoma (DC), Lobular Carcinoma (LC), Mucinous Carcinoma (MC), And Papillary Carcinoma (PC). With DLXGB, we have obtained an accuracy of 97% for both binary and multi-classification improving the exiting work done by researchers using the BreaKHis dataset. The results indicated that this method could produce a powerful prediction for breast cancer image classification.
topic Deep learning
Extreme gradient boosting
XGBoost
Machine learning
Computer-aided diagnosis
Breast cancer
url http://www.sciencedirect.com/science/article/pii/S2666827021000773
work_keys_str_mv AT xinyuliew aninvestigationofxgboostbasedalgorithmforbreastcancerclassification
AT naziahameed aninvestigationofxgboostbasedalgorithmforbreastcancerclassification
AT jeremieclos aninvestigationofxgboostbasedalgorithmforbreastcancerclassification
AT xinyuliew investigationofxgboostbasedalgorithmforbreastcancerclassification
AT naziahameed investigationofxgboostbasedalgorithmforbreastcancerclassification
AT jeremieclos investigationofxgboostbasedalgorithmforbreastcancerclassification
_version_ 1717381493978824704