Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification

Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a co...

Full description

Bibliographic Details
Main Authors:	Chunye Wu, Nan Wang, Yu Wang
Format:	Article
Language:	English
Published:	Hindawi Limited 2021-01-01
Series:	Discrete Dynamics in Nature and Society
Online Access:	http://dx.doi.org/10.1155/2021/6647557

id	doaj-78a97848831b4828bbebfa6292e5cecf
record_format	Article
spelling	doaj-78a97848831b4828bbebfa6292e5cecf2021-05-17T00:01:00ZengHindawi LimitedDiscrete Dynamics in Nature and Society1607-887X2021-01-01202110.1155/2021/6647557Increasing Minority Recall Support Vector Machine Model for Imbalanced Data ClassificationChunye Wu0Nan Wang1Yu Wang2School of Mathematical ScienceSchool of Mathematical ScienceSchool of Mathematical ScienceImbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.http://dx.doi.org/10.1155/2021/6647557
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chunye Wu Nan Wang Yu Wang
spellingShingle	Chunye Wu Nan Wang Yu Wang Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification Discrete Dynamics in Nature and Society
author_facet	Chunye Wu Nan Wang Yu Wang
author_sort	Chunye Wu
title	Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_short	Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_full	Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_fullStr	Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_full_unstemmed	Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
title_sort	increasing minority recall support vector machine model for imbalanced data classification
publisher	Hindawi Limited
series	Discrete Dynamics in Nature and Society
issn	1607-887X
publishDate	2021-01-01
description	Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.
url	http://dx.doi.org/10.1155/2021/6647557
work_keys_str_mv	AT chunyewu increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification AT nanwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification AT yuwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification
_version_	1721438784498696192

Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification

Similar Items