Class center-based firefly algorithm for handling missing data

Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes....

Full description

Bibliographic Details
Main Authors: Heru Nugroho, Nugraha Priya Utama, Kridanto Surendro
Format: Article
Language:English
Published: SpringerOpen 2021-02-01
Series:Journal of Big Data
Subjects:
Online Access:https://doi.org/10.1186/s40537-021-00424-y
id doaj-98742ed3694f48b6930981a1a423ef3d
record_format Article
spelling doaj-98742ed3694f48b6930981a1a423ef3d2021-02-23T14:58:05ZengSpringerOpenJournal of Big Data2196-11152021-02-018111410.1186/s40537-021-00424-yClass center-based firefly algorithm for handling missing dataHeru Nugroho0Nugraha Priya Utama1Kridanto Surendro2School of Electrical Engineering and Informatics, Institut TeknologiSchool of Electrical Engineering and Informatics, Institut TeknologiSchool of Electrical Engineering and Informatics, Institut TeknologiAbstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.https://doi.org/10.1186/s40537-021-00424-yMissing dataCorrelationImputationFirefly algorithmClass center
collection DOAJ
language English
format Article
sources DOAJ
author Heru Nugroho
Nugraha Priya Utama
Kridanto Surendro
spellingShingle Heru Nugroho
Nugraha Priya Utama
Kridanto Surendro
Class center-based firefly algorithm for handling missing data
Journal of Big Data
Missing data
Correlation
Imputation
Firefly algorithm
Class center
author_facet Heru Nugroho
Nugraha Priya Utama
Kridanto Surendro
author_sort Heru Nugroho
title Class center-based firefly algorithm for handling missing data
title_short Class center-based firefly algorithm for handling missing data
title_full Class center-based firefly algorithm for handling missing data
title_fullStr Class center-based firefly algorithm for handling missing data
title_full_unstemmed Class center-based firefly algorithm for handling missing data
title_sort class center-based firefly algorithm for handling missing data
publisher SpringerOpen
series Journal of Big Data
issn 2196-1115
publishDate 2021-02-01
description Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.
topic Missing data
Correlation
Imputation
Firefly algorithm
Class center
url https://doi.org/10.1186/s40537-021-00424-y
work_keys_str_mv AT herunugroho classcenterbasedfireflyalgorithmforhandlingmissingdata
AT nugrahapriyautama classcenterbasedfireflyalgorithmforhandlingmissingdata
AT kridantosurendro classcenterbasedfireflyalgorithmforhandlingmissingdata
_version_ 1724254428782919680