A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data
With the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the con...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9159642/ |
id |
doaj-a65fec755a564a379ba17f99ce6833b5 |
---|---|
record_format |
Article |
spelling |
doaj-a65fec755a564a379ba17f99ce6833b52021-03-30T03:58:04ZengIEEEIEEE Access2169-35362020-01-01817126317128010.1109/ACCESS.2020.30143629159642A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced DataNa Liu0Xiaomei Li1Ershi Qi2Man Xu3https://orcid.org/0000-0003-4256-6304Ling Li4Bo Gao5College of Management and Economics, Tianjin University, Tianjin, ChinaCollege of Management and Economics, Tianjin University, Tianjin, ChinaCollege of Management and Economics, Tianjin University, Tianjin, ChinaBusiness School, Nankai University, Tianjin, ChinaSchool of Political and Law, Shihezi University, Shihezi, ChinaSchool of Computer Science and Technology, Anhui University, Hefei, ChinaWith the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the conventional ML techniques, such as classification, achieved excellent performance in classification accuracy when applied in medical diagnoses, they have a fatal shortcoming of poor performance since the imbalanced dataset, especially for the detection of the minority category. To tackle the shortcomings of conventional classification approaches, this study proposes a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifier and final ensemble. In the first data pre-processing phase, we introduce the extension of Synthetic Minority Oversampling Technique (SMOTE) by integrating it with cross-validated committees filter (CVCF) technique, which can not only synthesize the minority sample and thereby balance the input instances, but also filter the noisy examples so as to perform well in the process of classification. In the classification phase, we introduce ensemble support vector machine (ESVM) classification technique, which were constructed by multiple diversity structures of SVM classifiers and thus has the advantages of strong generalization performance and classification precision. Additionally, in the last phase of the final ensemble strategy, we introduce the weighted majority voting strategy and introduce simulated annealing genetic algorithm (SAGA) to optimize the weight vector and thereby enhance the overall classification performance. The efficiency of our proposed ensemble learning method was tested on nine imbalanced medical datasets and the experimental results clearly indicate that the proposed ensemble learning paradigm outperforms other state-of-the-art classification models. Promisingly, our proposed ensemble learning paradigm can effectively facilitate medical decision making for physicians.https://ieeexplore.ieee.org/document/9159642/Support vector machineimbalanced dataensemble learningmedical diagnosis |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Na Liu Xiaomei Li Ershi Qi Man Xu Ling Li Bo Gao |
spellingShingle |
Na Liu Xiaomei Li Ershi Qi Man Xu Ling Li Bo Gao A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data IEEE Access Support vector machine imbalanced data ensemble learning medical diagnosis |
author_facet |
Na Liu Xiaomei Li Ershi Qi Man Xu Ling Li Bo Gao |
author_sort |
Na Liu |
title |
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data |
title_short |
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data |
title_full |
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data |
title_fullStr |
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data |
title_full_unstemmed |
A Novel Ensemble Learning Paradigm for Medical Diagnosis With Imbalanced Data |
title_sort |
novel ensemble learning paradigm for medical diagnosis with imbalanced data |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
With the help of machine learning (ML) techniques, the possible errors made by the pathologists and physicians, such as those caused by inexperience, fatigue, stress and so on can be avoided, and the medical data can be examined in a shorter time and in a more detailed manner. However, while the conventional ML techniques, such as classification, achieved excellent performance in classification accuracy when applied in medical diagnoses, they have a fatal shortcoming of poor performance since the imbalanced dataset, especially for the detection of the minority category. To tackle the shortcomings of conventional classification approaches, this study proposes a novel ensemble learning paradigm for medical diagnosis with imbalanced data, which consists of three phases: data pre-processing, training base classifier and final ensemble. In the first data pre-processing phase, we introduce the extension of Synthetic Minority Oversampling Technique (SMOTE) by integrating it with cross-validated committees filter (CVCF) technique, which can not only synthesize the minority sample and thereby balance the input instances, but also filter the noisy examples so as to perform well in the process of classification. In the classification phase, we introduce ensemble support vector machine (ESVM) classification technique, which were constructed by multiple diversity structures of SVM classifiers and thus has the advantages of strong generalization performance and classification precision. Additionally, in the last phase of the final ensemble strategy, we introduce the weighted majority voting strategy and introduce simulated annealing genetic algorithm (SAGA) to optimize the weight vector and thereby enhance the overall classification performance. The efficiency of our proposed ensemble learning method was tested on nine imbalanced medical datasets and the experimental results clearly indicate that the proposed ensemble learning paradigm outperforms other state-of-the-art classification models. Promisingly, our proposed ensemble learning paradigm can effectively facilitate medical decision making for physicians. |
topic |
Support vector machine imbalanced data ensemble learning medical diagnosis |
url |
https://ieeexplore.ieee.org/document/9159642/ |
work_keys_str_mv |
AT naliu anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT xiaomeili anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT ershiqi anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT manxu anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT lingli anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT bogao anovelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT naliu novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT xiaomeili novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT ershiqi novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT manxu novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT lingli novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata AT bogao novelensemblelearningparadigmformedicaldiagnosiswithimbalanceddata |
_version_ |
1724182546624806912 |