A Method for Improving Imputation and Prediction Accuracy of Highly Seasonal Univariate Data with Large Periods of Missingness

Imputation of missing data in datasets with high seasonality plays an important role in data analysis and prediction. Failure to appropriately account for missing data may lead to erroneous findings, false conclusions, and inaccurate predictions. The essence of a good imputation method is its missin...

Full description

Bibliographic Details
Main Authors: Aizaz Chaudhry, Wei Li, Amir Basri, François Patenaude
Format: Article
Language:English
Published: Hindawi-Wiley 2019-01-01
Series:Wireless Communications and Mobile Computing
Online Access:http://dx.doi.org/10.1155/2019/4039758
Description
Summary:Imputation of missing data in datasets with high seasonality plays an important role in data analysis and prediction. Failure to appropriately account for missing data may lead to erroneous findings, false conclusions, and inaccurate predictions. The essence of a good imputation method is its missingness-recovery-ability, i.e., the ability to deal with large periods of missing data in the dataset and the ability to extract the right characteristics (e.g., seasonality pattern) buried under the dataset to be analyzed. Univariate imputation is usually incapable of providing a reasonable imputation for a variable when periods of missing values are large. On the other hand, the default multivariate imputation approach cannot provide an accurate imputation for a variable when missing values of other correlated variables used for imputation occur at exactly the same time intervals. To deal with these drawbacks and to provide feasible imputations in such scenarios, we propose a novel method that converts a single variable into a multivariate form by exploiting the high seasonality and random missingness of this variable. After this conversion, multivariate imputation can then be applied. We then test the proposed method on an LTE spectrum dataset for imputing a single variable, such as the average cell throughput. We compare the performance of our proposed method with Kalman filtering and default method for multivariate imputation. The performance evaluation results clearly show that the proposed method significantly outperforms Kalman filtering and default method in terms of imputation and prediction accuracy.
ISSN:1530-8669
1530-8677