Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model

Machine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of vi...

Full description

Bibliographic Details
Main Author: Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
Format: Article
Language:English
Published: Technoscience Publications 2019-09-01
Series:Nature Environment and Pollution Technology
Subjects:
Online Access:http://neptjournal.com/upload-images/NL-69-4-(2)-D-905.pdf
Description
Summary:Machine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, air quality classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a new hybrid classification model based on information theory and support vector machine (SVM) using the air quality data of 4 cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from January 1, 2014 to April 30, 2016. China’s Ministry of Environmental Protection has classified the daily air quality into 6 levels, namely, serious pollution, severe pollution, moderate pollution, light pollution, good and excellent based on their respective air quality index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM machine learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), artificial neural network (ANN) and K-nearest neighbours (KNN) models in terms of accuracy as well as complexity.
ISSN:0972-6268
2395-3454