Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification

Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a co...

Full description

Bibliographic Details
Main Authors: Chunye Wu, Nan Wang, Yu Wang
Format: Article
Language:English
Published: Hindawi Limited 2021-01-01
Series:Discrete Dynamics in Nature and Society
Online Access:http://dx.doi.org/10.1155/2021/6647557
id doaj-78a97848831b4828bbebfa6292e5cecf
record_format Article
spelling doaj-78a97848831b4828bbebfa6292e5cecf2021-05-17T00:01:00ZengHindawi LimitedDiscrete Dynamics in Nature and Society1607-887X2021-01-01202110.1155/2021/6647557Increasing Minority Recall Support Vector Machine Model for Imbalanced Data ClassificationChunye Wu0Nan Wang1Yu Wang2School of Mathematical ScienceSchool of Mathematical ScienceSchool of Mathematical ScienceImbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.http://dx.doi.org/10.1155/2021/6647557
collection DOAJ
language English
format Article
sources DOAJ
author Chunye Wu
Nan Wang
Yu Wang
spellingShingle Chunye Wu
Nan Wang
Yu Wang
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
Discrete Dynamics in Nature and Society
author_facet Chunye Wu
Nan Wang
Yu Wang
author_sort Chunye Wu
title Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_short Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_full Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_fullStr Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_full_unstemmed Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_sort increasing minority recall support vector machine model for imbalanced data classification
publisher Hindawi Limited
series Discrete Dynamics in Nature and Society
issn 1607-887X
publishDate 2021-01-01
description Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.
url http://dx.doi.org/10.1155/2021/6647557
work_keys_str_mv AT chunyewu increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification
AT nanwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification
AT yuwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification
_version_ 1721438784498696192