Machine learning algorithms for predicting roadside fine particulate matter concentration level in Hong Kong Central

Data mining is an approach to discover knowledge from large data. Pollutant forecasting is an important problem in the environmental sciences. This paper tries to use data mining methods to forecast fine particles (PM2.5) concentration level in Hong Kong Central, which is a famous business centre in...

Full description

Bibliographic Details
Main Authors: Yin Zhao, Yahya Abu Hasan
Format: Article
Language:English
Published: International Academy of Ecology and Environmental Sciences 2013-09-01
Series:Computational Ecology and Software
Subjects:
Online Access:http://www.iaees.org/publications/journals/ces/articles/2013-3(3)/machine-learning-algorithms-for-predicting-roadside-particulate-matter.pdf
Description
Summary:Data mining is an approach to discover knowledge from large data. Pollutant forecasting is an important problem in the environmental sciences. This paper tries to use data mining methods to forecast fine particles (PM2.5) concentration level in Hong Kong Central, which is a famous business centre in Asia. There are several classification algorithms available in data mining, such as Artificial Neural Network (ANN) and Support Vector Machine (SVM). ANN and SVM are both machine learning algorithm used in variant area. This paper builds PM2.5 concentration level predictive models based on ANN and SVM by using R packages. The data set includes 2008-2011 period meteorological data and PM2.5 data. The PM2.5 concentration is divided into 2 levels: low and high. The critical point is 40ug/cubic meter (24 hours mean), which is based on the standard of US Environmental Protection Agency (EPA). The parameters of both models are selected by multiple cross validation. According to 100 times 10-fold cross validation, the testing accuracy of SVM is around 0.803-0.820, which is much better than ANN whose accuracy is around 0.746-0.793.
ISSN:2220-721X