Variable Selection in Time Series Forecasting Using Random Forests

Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random f...

Full description

Bibliographic Details
Main Authors:	Hristos Tyralis, Georgia Papacharalampous
Format:	Article
Language:	English
Published:	MDPI AG 2017-10-01
Series:	Algorithms
Subjects:	ARFIMA ARMA machine learning one-step ahead forecasting random forests time series forecasting variable selection
Online Access:	https://www.mdpi.com/1999-4893/10/4/114

id	doaj-379545664de44f2485d38e3f9da8bfc3
record_format	Article
spelling	doaj-379545664de44f2485d38e3f9da8bfc32020-11-24T22:52:54ZengMDPI AGAlgorithms1999-48932017-10-0110411410.3390/a10040114a10040114Variable Selection in Time Series Forecasting Using Random ForestsHristos Tyralis0Georgia Papacharalampous1Department of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceDepartment of Water Resources and Environmental Engineering, School of Civil Engineering, National Technical University of Athens, Iroon Polytechniou 5, 157 80 Zografou, GreeceTime series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare its performance to benchmarking methods. The first dataset is composed by 16,000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The second dataset consists of 135 mean annual temperature time series. The highest predictive performance of RF is observed when using a low number of recent lagged predictor variables. This outcome could be useful in relevant future applications, with the prospect to achieve higher predictive accuracy.https://www.mdpi.com/1999-4893/10/4/114ARFIMAARMAmachine learningone-step ahead forecastingrandom foreststime series forecastingvariable selection
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Hristos Tyralis Georgia Papacharalampous
spellingShingle	Hristos Tyralis Georgia Papacharalampous Variable Selection in Time Series Forecasting Using Random Forests Algorithms ARFIMA ARMA machine learning one-step ahead forecasting random forests time series forecasting variable selection
author_facet	Hristos Tyralis Georgia Papacharalampous
author_sort	Hristos Tyralis
title	Variable Selection in Time Series Forecasting Using Random Forests
title_short	Variable Selection in Time Series Forecasting Using Random Forests
title_full	Variable Selection in Time Series Forecasting Using Random Forests
title_fullStr	Variable Selection in Time Series Forecasting Using Random Forests
title_full_unstemmed	Variable Selection in Time Series Forecasting Using Random Forests
title_sort	variable selection in time series forecasting using random forests
publisher	MDPI AG
series	Algorithms
issn	1999-4893
publishDate	2017-10-01
description	Time series forecasting using machine learning algorithms has gained popularity recently. Random forest is a machine learning algorithm implemented in time series forecasting; however, most of its forecasting properties have remained unexplored. Here we focus on assessing the performance of random forests in one-step forecasting using two large datasets of short time series with the aim to suggest an optimal set of predictor variables. Furthermore, we compare its performance to benchmarking methods. The first dataset is composed by 16,000 simulated time series from a variety of Autoregressive Fractionally Integrated Moving Average (ARFIMA) models. The second dataset consists of 135 mean annual temperature time series. The highest predictive performance of RF is observed when using a low number of recent lagged predictor variables. This outcome could be useful in relevant future applications, with the prospect to achieve higher predictive accuracy.
topic	ARFIMA ARMA machine learning one-step ahead forecasting random forests time series forecasting variable selection
url	https://www.mdpi.com/1999-4893/10/4/114
work_keys_str_mv	AT hristostyralis variableselectionintimeseriesforecastingusingrandomforests AT georgiapapacharalampous variableselectionintimeseriesforecastingusingrandomforests
_version_	1725664087063396352

Variable Selection in Time Series Forecasting Using Random Forests

Similar Items