Class center-based firefly algorithm for handling missing data
Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes....
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2021-02-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | https://doi.org/10.1186/s40537-021-00424-y |
id |
doaj-98742ed3694f48b6930981a1a423ef3d |
---|---|
record_format |
Article |
spelling |
doaj-98742ed3694f48b6930981a1a423ef3d2021-02-23T14:58:05ZengSpringerOpenJournal of Big Data2196-11152021-02-018111410.1186/s40537-021-00424-yClass center-based firefly algorithm for handling missing dataHeru Nugroho0Nugraha Priya Utama1Kridanto Surendro2School of Electrical Engineering and Informatics, Institut TeknologiSchool of Electrical Engineering and Informatics, Institut TeknologiSchool of Electrical Engineering and Informatics, Institut TeknologiAbstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy.https://doi.org/10.1186/s40537-021-00424-yMissing dataCorrelationImputationFirefly algorithmClass center |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Heru Nugroho Nugraha Priya Utama Kridanto Surendro |
spellingShingle |
Heru Nugroho Nugraha Priya Utama Kridanto Surendro Class center-based firefly algorithm for handling missing data Journal of Big Data Missing data Correlation Imputation Firefly algorithm Class center |
author_facet |
Heru Nugroho Nugraha Priya Utama Kridanto Surendro |
author_sort |
Heru Nugroho |
title |
Class center-based firefly algorithm for handling missing data |
title_short |
Class center-based firefly algorithm for handling missing data |
title_full |
Class center-based firefly algorithm for handling missing data |
title_fullStr |
Class center-based firefly algorithm for handling missing data |
title_full_unstemmed |
Class center-based firefly algorithm for handling missing data |
title_sort |
class center-based firefly algorithm for handling missing data |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2021-02-01 |
description |
Abstract A significant advancement that occurs during the data cleaning stage is estimating missing data. Studies have shown that improper data handling leads to inaccurate analysis. Furthermore, most studies indicate the occurrence of missing data irrespective of the correlation between attributes. However, an adaptive search procedure helps to determine the estimates of the missing data when correlations between attributes are considered in the process. Firefly Algorithm (FA) implements an adaptive search procedure in the imputation of the missing data by determining the estimated value closest to others' value. Therefore, this study proposes a class center-based adaptive approach model for retrieving missing data by considering the attribute correlation in the imputation process (C3-FA). The result showed that the class center-based firefly algorithm (FA) is an efficient technique for obtaining the actual value in handling missing data with the Pearson correlation coefficient (r) and root mean squared error (RMSE) close to 1 and 0, respectively. In addition, the proposed method has the ability to maintain the true distribution of data values. This is indicated by the Kolmogorov–Smirnov test, which stated that the value of DKS for most attributes in the dataset is generally closer to 0. Furthermore, the accuracy evaluation results using three classifiers showed that the proposed method produces good accuracy. |
topic |
Missing data Correlation Imputation Firefly algorithm Class center |
url |
https://doi.org/10.1186/s40537-021-00424-y |
work_keys_str_mv |
AT herunugroho classcenterbasedfireflyalgorithmforhandlingmissingdata AT nugrahapriyautama classcenterbasedfireflyalgorithmforhandlingmissingdata AT kridantosurendro classcenterbasedfireflyalgorithmforhandlingmissingdata |
_version_ |
1724254428782919680 |