A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data

With the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the con...

Full description

Bibliographic Details
Main Authors: Na Liu, Xiaomei Li, Ershi Qi, Man Xu, Ling Li, Bo Gao
Format: Article
Language:English
Published: IEEE 2020-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/9159642/
id doaj-a65fec755a564a379ba17f99ce6833b5
record_format Article
spelling doaj-a65fec755a564a379ba17f99ce6833b52021-03-30T03:58:04ZengIEEEIEEE Access2169-35362020-01-01817126317128010.1109/ACCESS.2020.30143629159642A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced DataNa Liu0Xiaomei Li1Ershi Qi2Man Xu3https://orcid.org/0000-0003-4256-6304Ling Li4Bo Gao5College of Management and Economics, Tianjin University, Tianjin, ChinaCollege of Management and Economics, Tianjin University, Tianjin, ChinaCollege of Management and Economics, Tianjin University, Tianjin, ChinaBusiness School, Nankai University, Tianjin, ChinaSchool of Political and Law, Shihezi University, Shihezi, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaWith the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the conventional ML techniques, such as classification, achieved excellent performance in classification accuracy when applied in medical diagnoses, they have a fatal shortcoming of poor performance since the imbalanced dataset, especially for the detection of the minority category. To tackle the shortcomings of conventional classification approaches, this study proposes a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifier and final ensemble. In the first data pre-processing phase, we introduce the extension of Synthetic Minority Oversampling Technique (SMOTE) by integrating it with cross-validated committees filter (CVCF) technique, which can not only synthesize the minority sample and thereby balance the input instances, but also filter the noisy examples so as to perform well in the process of classification. In the classification phase, we introduce ensemble support vector machine (ESVM) classification technique, which were constructed by multiple diversity structures of SVM classifiers and thus has the advantages of strong generalization performance and classification precision. Additionally, in the last phase of the final ensemble strategy, we introduce the weighted majority voting strategy and introduce simulated annealing genetic algorithm (SAGA) to optimize the weight vector and thereby enhance the overall classification performance. The efficiency of our proposed ensemble learning method was tested on nine imbalanced medical datasets and the experimental results clearly indicate that the proposed ensemble learning paradigm outperforms other state-of-the-art classification models. Promisingly, our proposed ensemble learning paradigm can effectively facilitate medical decision making for physicians.https://ieeexplore.ieee.org/document/9159642/Support vector machineimbalanced dataensemble learningmedical diagnosis
collection DOAJ
language English
format Article
sources DOAJ
author Na Liu
Xiaomei Li
Ershi Qi
Man Xu
Ling Li
Bo Gao
spellingShingle Na Liu
Xiaomei Li
Ershi Qi
Man Xu
Ling Li
Bo Gao
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
IEEE Access
Support vector machine
imbalanced data
ensemble learning
medical diagnosis
author_facet Na Liu
Xiaomei Li
Ershi Qi
Man Xu
Ling Li
Bo Gao
author_sort Na Liu
title A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
title_short A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
title_full A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
title_fullStr A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
title_full_unstemmed A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
title_sort novel ensemble learning paradigm for medical diagnosis with imbalanced data
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2020-01-01
description With the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the conventional ML techniques, such as classification, achieved excellent performance in classification accuracy when applied in medical diagnoses, they have a fatal shortcoming of poor performance since the imbalanced dataset, especially for the detection of the minority category. To tackle the shortcomings of conventional classification approaches, this study proposes a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifier and final ensemble. In the first data pre-processing phase, we introduce the extension of Synthetic Minority Oversampling Technique (SMOTE) by integrating it with cross-validated committees filter (CVCF) technique, which can not only synthesize the minority sample and thereby balance the input instances, but also filter the noisy examples so as to perform well in the process of classification. In the classification phase, we introduce ensemble support vector machine (ESVM) classification technique, which were constructed by multiple diversity structures of SVM classifiers and thus has the advantages of strong generalization performance and classification precision. Additionally, in the last phase of the final ensemble strategy, we introduce the weighted majority voting strategy and introduce simulated annealing genetic algorithm (SAGA) to optimize the weight vector and thereby enhance the overall classification performance. The efficiency of our proposed ensemble learning method was tested on nine imbalanced medical datasets and the experimental results clearly indicate that the proposed ensemble learning paradigm outperforms other state-of-the-art classification models. Promisingly, our proposed ensemble learning paradigm can effectively facilitate medical decision making for physicians.
topic Support vector machine
imbalanced data
ensemble learning
medical diagnosis
url https://ieeexplore.ieee.org/document/9159642/
work_keys_str_mv AT naliu anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT xiaomeili anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT ershiqi anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT manxu anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT lingli anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT bogao anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT naliu novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT xiaomeili novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT ershiqi novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT manxu novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT lingli novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
AT bogao novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata
_version_ 1724182546624806912