Generating Synthetic Missing Data: A Review by Missing Mechanism

The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct mi...

Full description

Bibliographic Details
Main Authors:	Miriam Seoane Santos, Ricardo Cardoso Pereira, Adriana Fonseca Costa, Jastin Pompeu Soares, Joao Santos, Pedro Henriques Abreu
Format:	Article
Language:	English
Published:	IEEE 2019-01-01
Series:	IEEE Access
Subjects:	Data preprocessing missing data missing data generation missing data mechanisms
Online Access:	https://ieeexplore.ieee.org/document/8605316/

id	doaj-771d91a8bf8d49af8faeed66e034b03f
record_format	Article
spelling	doaj-771d91a8bf8d49af8faeed66e034b03f2021-03-29T22:02:40ZengIEEEIEEE Access2169-35362019-01-017116511166710.1109/ACCESS.2019.28913608605316Generating Synthetic Missing Data: A Review by Missing MechanismMiriam Seoane Santos0https://orcid.org/0000-0002-5912-963XRicardo Cardoso Pereira1Adriana Fonseca Costa2Jastin Pompeu Soares3Joao Santos4Pedro Henriques Abreu5Department of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalMedical Physics, Radiobiology and Radiation Protection Group, IPO Porto Research Center (CI-IPOP), Porto, PortugalDepartment of Informatics Engineering, Centre for Informatics and Systems, University of Coimbra, Coimbra, PortugalThe performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process defines the basis for the imputation experiments (configuration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-defined setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field.https://ieeexplore.ieee.org/document/8605316/Data preprocessingmissing datamissing data generationmissing data mechanisms
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Miriam Seoane Santos Ricardo Cardoso Pereira Adriana Fonseca Costa Jastin Pompeu Soares Joao Santos Pedro Henriques Abreu
spellingShingle	Miriam Seoane Santos Ricardo Cardoso Pereira Adriana Fonseca Costa Jastin Pompeu Soares Joao Santos Pedro Henriques Abreu Generating Synthetic Missing Data: A Review by Missing Mechanism IEEE Access Data preprocessing missing data missing data generation missing data mechanisms
author_facet	Miriam Seoane Santos Ricardo Cardoso Pereira Adriana Fonseca Costa Jastin Pompeu Soares Joao Santos Pedro Henriques Abreu
author_sort	Miriam Seoane Santos
title	Generating Synthetic Missing Data: A Review by Missing Mechanism
title_short	Generating Synthetic Missing Data: A Review by Missing Mechanism
title_full	Generating Synthetic Missing Data: A Review by Missing Mechanism
title_fullStr	Generating Synthetic Missing Data: A Review by Missing Mechanism
title_full_unstemmed	Generating Synthetic Missing Data: A Review by Missing Mechanism
title_sort	generating synthetic missing data: a review by missing mechanism
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2019-01-01
description	The performance evaluation of imputation algorithms often involves the generation of missing values. Missing values can be inserted in only one feature (univariate configuration) or in several features (multivariate configuration) at different percentages (missing rates) and according to distinct missing mechanisms, namely, missing completely at random, missing at random, and missing not at random. Since the missing data generation process defines the basis for the imputation experiments (configuration, missing rate, and missing mechanism), it is essential that it is appropriately applied; otherwise, conclusions derived from ill-defined setups may be invalid. The goal of this paper is to review the different approaches to synthetic missing data generation found in the literature and discuss their practical details, elaborating on their strengths and weaknesses. Our analysis revealed that creating missing at random and missing not at random scenarios in datasets comprising qualitative features is the most challenging issue in the related work and, therefore, should be the focus of future work in the field.
topic	Data preprocessing missing data missing data generation missing data mechanisms
url	https://ieeexplore.ieee.org/document/8605316/
work_keys_str_mv	AT miriamseoanesantos generatingsyntheticmissingdataareviewbymissingmechanism AT ricardocardosopereira generatingsyntheticmissingdataareviewbymissingmechanism AT adrianafonsecacosta generatingsyntheticmissingdataareviewbymissingmechanism AT jastinpompeusoares generatingsyntheticmissingdataareviewbymissingmechanism AT joaosantos generatingsyntheticmissingdataareviewbymissingmechanism AT pedrohenriquesabreu generatingsyntheticmissingdataareviewbymissingmechanism
_version_	1724192285593174016

Generating Synthetic Missing Data: A Review by Missing Mechanism

Similar Items