Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China

With rapid economic growth, urbanization and industrialization, fine particulate matter with aerodynamic diameters ≤ 2.5 µm (PM2.5) has become a major pollutant and shows adverse effects on both human health and the atmospheric environment. Many studies on estimating PM2.5 concentrations have been p...

Full description

Bibliographic Details
Main Authors: Ping Zhang, Wenjie Ma, Feng Wen, Lei Liu, Lianwei Yang, Jia Song, Ning Wang, Qi Liu
Format: Article
Language:English
Published: Elsevier 2021-12-01
Series:Ecotoxicology and Environmental Safety
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S0147651321008848
id doaj-8c2623f2496e4215a8c23b3366b378b6
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Ping Zhang
Wenjie Ma
Feng Wen
Lei Liu
Lianwei Yang
Jia Song
Ning Wang
Qi Liu
spellingShingle Ping Zhang
Wenjie Ma
Feng Wen
Lei Liu
Lianwei Yang
Jia Song
Ning Wang
Qi Liu
Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
Ecotoxicology and Environmental Safety
PM2.5
Machine learning
GA-SVM
Land use regression
Method improvement
Spatial clustering
author_facet Ping Zhang
Wenjie Ma
Feng Wen
Lei Liu
Lianwei Yang
Jia Song
Ning Wang
Qi Liu
author_sort Ping Zhang
title Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
title_short Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
title_full Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
title_fullStr Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
title_full_unstemmed Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, China
title_sort estimating pm2.5 concentration using the machine learning ga-svm method to improve the land use regression model in shaanxi, china
publisher Elsevier
series Ecotoxicology and Environmental Safety
issn 0147-6513
publishDate 2021-12-01
description With rapid economic growth, urbanization and industrialization, fine particulate matter with aerodynamic diameters ≤ 2.5 µm (PM2.5) has become a major pollutant and shows adverse effects on both human health and the atmospheric environment. Many studies on estimating PM2.5 concentrations have been performed using statistical regression models and satellite remote sensing. However, the accuracy of PM2.5 concentration estimates is limited by traditional regression models; machine learning methods have high predictive power, but fewer studies have been performed on the complementary advantages of different approaches. This study estimates PM2.5 concentrations from satellite remote sensing-derived aerosol optical depth (AOD) products, meteorological data, terrain data and other predictors in 2015 in Shaanxi, China, using a combined genetic algorithm-support vector machine (GA-SVM) method, after which the spatial clustering pattern was explored at the season and year levels. The results indicated that temperature (r = −0.684), precipitation (r = −0.602) and normalized difference vegetation index (NDVI) (r = −0.523) were significantly negatively correlated with the PM2.5 concentration, while AOD (r = 0.337) was significantly positively correlated with the PM2.5 concentration. Compared to conventional land use regression (LUR) and SVM models and previous related studies, the GA-SVM method demonstrated a significantly better prediction accuracy of PM2.5 concentration, with a higher 10-fold cross-validation coefficient of determination (R2) of 0.84 and lower root mean square error (RMSE) and mean absolute error (MAE) of 12.1 μg/m3 and 10.07 μg/m3, respectively. Y-scrambling test shows that the models have no chance correlation. The central and southern parts of Shaanxi have high PM2.5 concentrations, which are mainly due to the pollutant emissions and meteorological and topographical conditions in those areas. There was a positive spatial agglomeration characteristic of regional PM2.5 pollution, and the spatial spillover effect of PM2.5 pollution for seasonal and annual variations does exist. In general, the GA-SVM method is robust and accurately estimates PM2.5 concentrations via a novel modeling framework application and high-quality spatiotemporal information. It also has great significance for the exploration of PM2.5 pollution estimation and high-precision mapping methods, especially early warning in high-risk areas. Finally, the prevention and control of atmospheric pollution should take pollution control measures from major cities and surrounding cities, and focus on the joint pollution control measures for plain cities.
topic PM2.5
Machine learning
GA-SVM
Land use regression
Method improvement
Spatial clustering
url http://www.sciencedirect.com/science/article/pii/S0147651321008848
work_keys_str_mv AT pingzhang estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT wenjiema estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT fengwen estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT leiliu estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT lianweiyang estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT jiasong estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT ningwang estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
AT qiliu estimatingpm25concentrationusingthemachinelearninggasvmmethodtoimprovethelanduseregressionmodelinshaanxichina
_version_ 1716862333021585408
spelling doaj-8c2623f2496e4215a8c23b3366b378b62021-10-01T04:44:11ZengElsevierEcotoxicology and Environmental Safety0147-65132021-12-01225112772Estimating PM2.5 concentration using the machine learning GA-SVM method to improve the land use regression model in Shaanxi, ChinaPing Zhang0Wenjie Ma1Feng Wen2Lei Liu3Lianwei Yang4Jia Song5Ning Wang6Qi Liu7School of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, China; Shaanxi Key Laboratory of Land Consolidation, Xi’an 710075, China; State Key Laboratory of Urban and Regional Ecology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China; Corresponding author at: School of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, China.School of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, ChinaSchool of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, ChinaSchool of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, ChinaSchool of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, ChinaSchool of Information Science and Technology, Yunnan Normal University, Kunming 650000, ChinaSchool of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, China; Corresponding author.School of Environmental and Chemical Engineering, Xi’an Polytechnic University, Xi’an 710048, ChinaWith rapid economic growth, urbanization and industrialization, fine particulate matter with aerodynamic diameters ≤ 2.5 µm (PM2.5) has become a major pollutant and shows adverse effects on both human health and the atmospheric environment. Many studies on estimating PM2.5 concentrations have been performed using statistical regression models and satellite remote sensing. However, the accuracy of PM2.5 concentration estimates is limited by traditional regression models; machine learning methods have high predictive power, but fewer studies have been performed on the complementary advantages of different approaches. This study estimates PM2.5 concentrations from satellite remote sensing-derived aerosol optical depth (AOD) products, meteorological data, terrain data and other predictors in 2015 in Shaanxi, China, using a combined genetic algorithm-support vector machine (GA-SVM) method, after which the spatial clustering pattern was explored at the season and year levels. The results indicated that temperature (r = −0.684), precipitation (r = −0.602) and normalized difference vegetation index (NDVI) (r = −0.523) were significantly negatively correlated with the PM2.5 concentration, while AOD (r = 0.337) was significantly positively correlated with the PM2.5 concentration. Compared to conventional land use regression (LUR) and SVM models and previous related studies, the GA-SVM method demonstrated a significantly better prediction accuracy of PM2.5 concentration, with a higher 10-fold cross-validation coefficient of determination (R2) of 0.84 and lower root mean square error (RMSE) and mean absolute error (MAE) of 12.1 μg/m3 and 10.07 μg/m3, respectively. Y-scrambling test shows that the models have no chance correlation. The central and southern parts of Shaanxi have high PM2.5 concentrations, which are mainly due to the pollutant emissions and meteorological and topographical conditions in those areas. There was a positive spatial agglomeration characteristic of regional PM2.5 pollution, and the spatial spillover effect of PM2.5 pollution for seasonal and annual variations does exist. In general, the GA-SVM method is robust and accurately estimates PM2.5 concentrations via a novel modeling framework application and high-quality spatiotemporal information. It also has great significance for the exploration of PM2.5 pollution estimation and high-precision mapping methods, especially early warning in high-risk areas. Finally, the prevention and control of atmospheric pollution should take pollution control measures from major cities and surrounding cities, and focus on the joint pollution control measures for plain cities.http://www.sciencedirect.com/science/article/pii/S0147651321008848PM2.5Machine learningGA-SVMLand use regressionMethod improvementSpatial clustering