Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model

Machine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of vi...

Full description

Bibliographic Details
Main Author: Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
Format: Article
Language:English
Published: Technoscience Publications 2019-09-01
Series:Nature Environment and Pollution Technology
Subjects:
Online Access:http://neptjournal.com/upload-images/NL-69-4-(2)-D-905.pdf
id doaj-a0d0382289cd4413ad49f5aeb30af4e9
record_format Article
spelling doaj-a0d0382289cd4413ad49f5aeb30af4e92020-11-25T03:25:32ZengTechnoscience PublicationsNature Environment and Pollution Technology0972-62682395-34542019-09-01183697708Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid ModelBingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng XiangMachine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, air quality classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a new hybrid classification model based on information theory and support vector machine (SVM) using the air quality data of 4 cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from January 1, 2014 to April 30, 2016. China’s Ministry of Environmental Protection has classified the daily air quality into 6 levels, namely, serious pollution, severe pollution, moderate pollution, light pollution, good and excellent based on their respective air quality index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM machine learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), artificial neural network (ANN) and K-nearest neighbours (KNN) models in terms of accuracy as well as complexity.http://neptjournal.com/upload-images/NL-69-4-(2)-D-905.pdfenvironment
collection DOAJ
language English
format Article
sources DOAJ
author Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
spellingShingle Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
Nature Environment and Pollution Technology
environment
author_facet Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
author_sort Bingchun Liu, Hui Wang, Arihant Binaykia, Chuanchuan Fu and Bingpeng Xiang
title Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
title_short Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
title_full Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
title_fullStr Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
title_full_unstemmed Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine Hybrid Model
title_sort multi-level air quality classification in china using information gain and support vector machine hybrid model
publisher Technoscience Publications
series Nature Environment and Pollution Technology
issn 0972-6268
2395-3454
publishDate 2019-09-01
description Machine learning and data mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a widely used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, air quality classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a new hybrid classification model based on information theory and support vector machine (SVM) using the air quality data of 4 cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from January 1, 2014 to April 30, 2016. China’s Ministry of Environmental Protection has classified the daily air quality into 6 levels, namely, serious pollution, severe pollution, moderate pollution, light pollution, good and excellent based on their respective air quality index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM machine learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), artificial neural network (ANN) and K-nearest neighbours (KNN) models in terms of accuracy as well as complexity.
topic environment
url http://neptjournal.com/upload-images/NL-69-4-(2)-D-905.pdf
work_keys_str_mv AT bingchunliuhuiwangarihantbinaykiachuanchuanfuandbingpengxiang multilevelairqualityclassificationinchinausinginformationgainandsupportvectormachinehybridmodel
_version_ 1724596533685387264