Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification
Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a co...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2021-01-01
|
Series: | Discrete Dynamics in Nature and Society |
Online Access: | http://dx.doi.org/10.1155/2021/6647557 |
id |
doaj-78a97848831b4828bbebfa6292e5cecf |
---|---|
record_format |
Article |
spelling |
doaj-78a97848831b4828bbebfa6292e5cecf2021-05-17T00:01:00ZengHindawi LimitedDiscrete Dynamics in Nature and Society1607-887X2021-01-01202110.1155/2021/6647557Increasing Minority Recall Support Vector Machine Model for Imbalanced Data ClassificationChunye Wu0Nan Wang1Yu Wang2School of Mathematical ScienceSchool of Mathematical ScienceSchool of Mathematical ScienceImbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas.http://dx.doi.org/10.1155/2021/6647557 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chunye Wu Nan Wang Yu Wang |
spellingShingle |
Chunye Wu Nan Wang Yu Wang Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification Discrete Dynamics in Nature and Society |
author_facet |
Chunye Wu Nan Wang Yu Wang |
author_sort |
Chunye Wu |
title |
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification |
title_short |
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification |
title_full |
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification |
title_fullStr |
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification |
title_full_unstemmed |
Increasing Minority Recall Support Vector Machine Model for Imbalanced Data Classification |
title_sort |
increasing minority recall support vector machine model for imbalanced data classification |
publisher |
Hindawi Limited |
series |
Discrete Dynamics in Nature and Society |
issn |
1607-887X |
publishDate |
2021-01-01 |
description |
Imbalanced data classification is gaining importance in data mining and machine learning. The minority class recall rate requires special treatment in fields such as medical diagnosis, information security, industry, and computer vision. This paper proposes a new strategy and algorithm based on a cost-sensitive support vector machine to improve the minority class recall rate to 1 because the misclassification of even a few samples can cause serious losses in some physical problems. In the proposed method, the modification employs a margin compensation to make the margin lopsided, enabling decision boundary drift. When the boundary reaches a certain position, the minority class samples will be more generalized to achieve the requirement of a recall rate of 1. In the experiments, the effects of different parameters on the performance of the algorithm were analyzed, and the optimal parameters for a recall rate of 1 were determined. The experimental results reveal that, for the imbalanced data classification problem, the traditional definite cost classification scheme and the models classified using the area under the receiver operating characteristic curve criterion rarely produce results such as a recall rate of 1. The new strategy can yield a minority recall of 1 for imbalanced data as the loss of the majority class is acceptable; moreover, it improves the g-means index. The proposed algorithm provides superior performance in minority recall compared to the conventional methods. The proposed method has important practical significance in credit card fraud, medical diagnosis, and other areas. |
url |
http://dx.doi.org/10.1155/2021/6647557 |
work_keys_str_mv |
AT chunyewu increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification AT nanwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification AT yuwang increasingminorityrecallsupportvectormachinemodelforimbalanceddataclassification |
_version_ |
1721438784498696192 |