Handling class imbalance in credit card fraud using resampling methods

Credit card based online payments has grown intensely, compelling the financial organisations to implement and continuously improve their fraud detection system. However, credit card fraud dataset is heavily imbalanced and different types of misclassification errors may have different costs and it i...

Full description

Bibliographic Details
Main Authors: Hordri, Nur Farhana (Author), Yuhaniz, Siti Sophiayati (Author), Mohd. Azmi, Nurulhuda Firdaus (Author), Shamsuddin, Siti Mariyam (Author)
Format: Article
Language:English
Published: Science and Information Organization, 2018.
Subjects:
Online Access:Get fulltext
LEADER 01824 am a22001693u 4500
001 86470
042 |a dc 
100 1 0 |a Hordri, Nur Farhana  |e author 
700 1 0 |a Yuhaniz, Siti Sophiayati  |e author 
700 1 0 |a Mohd. Azmi, Nurulhuda Firdaus  |e author 
700 1 0 |a Shamsuddin, Siti Mariyam  |e author 
245 0 0 |a Handling class imbalance in credit card fraud using resampling methods 
260 |b Science and Information Organization,   |c 2018. 
856 |z Get fulltext  |u http://eprints.utm.my/id/eprint/86470/1/NurFarhanaHordri2018_HandlingClassImbalanceinCreditCard.pdf 
520 |a Credit card based online payments has grown intensely, compelling the financial organisations to implement and continuously improve their fraud detection system. However, credit card fraud dataset is heavily imbalanced and different types of misclassification errors may have different costs and it is essential to control them, to a certain degree, to compromise those errors. Classification techniques are the promising solutions to detect the fraud and non-fraud transactions. Unfortunately, in a certain condition, classification techniques do not perform well when it comes to huge numbers of differences in minority and majority cases. Hence in this study, resampling methods, Random Under Sampling, Random Over Sampling and Synthetic Minority Oversampling Technique, were applied in the credit card dataset to overcome the rare events in the dataset. Then, the three resampled datasets were classified using classification techniques. The performances were measured by their sensitivity, specificity, accuracy, precision, area under curve (AUC) and error rate. The findings disclosed that by resampling the dataset, the models were more practicable, gave better performance and were statistically better. 
546 |a en 
650 0 4 |a T Technology (General)