Summary: | Fine particulate matter (PM<sub>2.5</sub>) is one of the main air pollution problems that occur in major cities around the world. A country’s PM<sub>2.5</sub> can be affected not only by country factors but also by the neighboring country’s air quality factors. Therefore, forecasting PM<sub>2.5</sub> requires collecting data from outside the country as well as from within which is necessary for policies and plans. The data set of many variables with a relatively small number of observations can cause a dimensionality problem and limit the performance of the deep learning model. This study used daily data for five years in predicting PM<sub>2.5</sub> concentrations in eight Korean cities through deep learning models. PM<sub>2.5</sub> data of China were collected and used as input variables to solve the dimensionality problem using principal components analysis (PCA). The deep learning models used were a recurrent neural network (RNN), long short-term memory (LSTM), and bidirectional LSTM (BiLSTM). The performance of the models with and without PCA was compared using root-mean-square error (RMSE) and mean absolute error (MAE). As a result, the application of PCA in LSTM and BiLSTM, excluding the RNN, showed better performance: decreases of up to 16.6% and 33.3% in RMSE and MAE values. The results indicated that applying PCA in deep learning time series prediction can contribute to practical performance improvements, even with a small number of observations. It also provides a more accurate basis for the establishment of PM<sub>2.5</sub> reduction policy in the country.
|