Comparison of Missing Data Infilling Mechanisms for Recovering a Real-World Single Station Streamflow Observation

Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity...

Full description

Bibliographic Details
Main Authors: Thelma Dede Baddoo, Zhijia Li, Samuel Nii Odai, Kenneth Rodolphe Chabi Boni, Isaac Kwesi Nooni, Samuel Ato Andam-Akorful
Format: Article
Language:English
Published: MDPI AG 2021-08-01
Series:International Journal of Environmental Research and Public Health
Subjects:
R
Online Access:https://www.mdpi.com/1660-4601/18/16/8375
Description
Summary:Reconstructing missing streamflow data can be challenging when additional data are not available, and missing data imputation of real-world datasets to investigate how to ascertain the accuracy of imputation algorithms for these datasets are lacking. This study investigated the necessary complexity of missing data reconstruction schemes to obtain the relevant results for a real-world single station streamflow observation to facilitate its further use. This investigation was implemented by applying different missing data mechanisms spanning from univariate algorithms to multiple imputation methods accustomed to multivariate data taking time as an explicit variable. The performance accuracy of these schemes was assessed using the total error measurement (TEM) and a recommended localized error measurement (LEM) in this study. The results show that univariate missing value algorithms, which are specially developed to handle univariate time series, provide satisfactory results, but the ones which provide the best results are usually time and computationally intensive. Also, multiple imputation algorithms which consider the surrounding observed values and/or which can understand the characteristics of the data provide similar results to the univariate missing data algorithms and, in some cases, perform better without the added time and computational downsides when time is taken as an explicit variable. Furthermore, the LEM would be especially useful when the missing data are in specific portions of the dataset or where very large gaps of ‘missingness’ occur. Finally, proper handling of missing values of real-world hydroclimatic datasets depends on imputing and extensive study of the particular dataset to be imputed.
ISSN:1661-7827
1660-4601