A framework to select a classification algorithm in electricity fraud detection

In the electrical domain, a non-technical loss often refers to energy used but not paid for by a consumer. The identification and detection of this loss is important as the financial loss by the electricity supplier has a negative impact on revenue. Several statistical and machine learning classifi...

Full description

Bibliographic Details
Main Authors:	Sisa Pazi, Chantelle M. Clohessy, Gary D. Sharp
Format:	Article
Language:	English
Published:	Academy of Science of South Africa 2020-09-01
Series:	South African Journal of Science
Subjects:	electricity fraud detection confusion matrix classification algorithms
Online Access:	https://www.sajs.co.za/article/view/8189

id	doaj-7f3f839df44349938075a0b0241e8bf2
record_format	Article
spelling	doaj-7f3f839df44349938075a0b0241e8bf22020-11-25T03:14:03ZengAcademy of Science of South AfricaSouth African Journal of Science1996-74892020-09-011169/1010.17159/sajs.2020/8189A framework to select a classification algorithm in electricity fraud detectionSisa Pazi0Chantelle M. Clohessy1Gary D. Sharp2Department of Statistics, Nelson Mandela University, Port Elizabeth, South AfricaDepartment of Statistics, Nelson Mandela University, Port Elizabeth, South AfricaDepartment of Statistics, Nelson Mandela University, Port Elizabeth, South Africa In the electrical domain, a non-technical loss often refers to energy used but not paid for by a consumer. The identification and detection of this loss is important as the financial loss by the electricity supplier has a negative impact on revenue. Several statistical and machine learning classification algorithms have been developed to identify customers who use energy without paying. These algorithms are generally assessed and compared using results from a confusion matrix. We propose that the data for the performance metrics from the confusion matrix be resampled to improve the comparison methods of the algorithms. We use the results from three classification algorithms, namely a support vector machine, k-nearest neighbour and naïve Bayes procedure, to demonstrate how the methodology identifies the best classifier. The case study is of electrical consumption data for a large municipality in South Africa. Significance: • The methodology provides data analysts with a procedure for analysing electricity consumption in an attempt to identify abnormal usage. • The resampling procedure provides a method for assessing performance measures in fraud detection systems. • The results show that no single metric is best, and that the selected metric is dependent on the objective of the analysis. https://www.sajs.co.za/article/view/8189electricity fraud detectionconfusion matrixclassification algorithms
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Sisa Pazi Chantelle M. Clohessy Gary D. Sharp
spellingShingle	Sisa Pazi Chantelle M. Clohessy Gary D. Sharp A framework to select a classification algorithm in electricity fraud detection South African Journal of Science electricity fraud detection confusion matrix classification algorithms
author_facet	Sisa Pazi Chantelle M. Clohessy Gary D. Sharp
author_sort	Sisa Pazi
title	A framework to select a classification algorithm in electricity fraud detection
title_short	A framework to select a classification algorithm in electricity fraud detection
title_full	A framework to select a classification algorithm in electricity fraud detection
title_fullStr	A framework to select a classification algorithm in electricity fraud detection
title_full_unstemmed	A framework to select a classification algorithm in electricity fraud detection
title_sort	framework to select a classification algorithm in electricity fraud detection
publisher	Academy of Science of South Africa
series	South African Journal of Science
issn	1996-7489
publishDate	2020-09-01
description	In the electrical domain, a non-technical loss often refers to energy used but not paid for by a consumer. The identification and detection of this loss is important as the financial loss by the electricity supplier has a negative impact on revenue. Several statistical and machine learning classification algorithms have been developed to identify customers who use energy without paying. These algorithms are generally assessed and compared using results from a confusion matrix. We propose that the data for the performance metrics from the confusion matrix be resampled to improve the comparison methods of the algorithms. We use the results from three classification algorithms, namely a support vector machine, k-nearest neighbour and naïve Bayes procedure, to demonstrate how the methodology identifies the best classifier. The case study is of electrical consumption data for a large municipality in South Africa. Significance: • The methodology provides data analysts with a procedure for analysing electricity consumption in an attempt to identify abnormal usage. • The resampling procedure provides a method for assessing performance measures in fraud detection systems. • The results show that no single metric is best, and that the selected metric is dependent on the objective of the analysis.
topic	electricity fraud detection confusion matrix classification algorithms
url	https://www.sajs.co.za/article/view/8189
work_keys_str_mv	AT sisapazi aframeworktoselectaclassificationalgorithminelectricityfrauddetection AT chantellemclohessy aframeworktoselectaclassificationalgorithminelectricityfrauddetection AT garydsharp aframeworktoselectaclassificationalgorithminelectricityfrauddetection AT sisapazi frameworktoselectaclassificationalgorithminelectricityfrauddetection AT chantellemclohessy frameworktoselectaclassificationalgorithminelectricityfrauddetection AT garydsharp frameworktoselectaclassificationalgorithminelectricityfrauddetection
_version_	1724644778587455488

A framework to select a classification algorithm in electricity fraud detection

Similar Items