Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia
Southeast Asia (SEA) is a hotspot region for atmospheric pollution and haze conditions, due to extensive forest, agricultural and peat fires. This study aims to estimate the PM<sub>2.5</sub> concentrations across Malaysia using machine-learning (ML) models like Random Forest (RF) and Sup...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-08-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/11/16/7326 |
id |
doaj-ff0f588e1d35459e97f71fa6ddd6d5ca |
---|---|
record_format |
Article |
spelling |
doaj-ff0f588e1d35459e97f71fa6ddd6d5ca2021-08-26T13:29:34ZengMDPI AGApplied Sciences2076-34172021-08-01117326732610.3390/app11167326Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across MalaysiaNurul Amalin Fatihah Kamarul Zaman0Kasturi Devi Kanniah1Dimitris G. Kaskaoutis2Mohd Talib Latif3Tropical Map Research Group, Faculty of Built Environment & Surveying, Universiti Teknologi Malaysia, Skudai 81310, Johor, MalaysiaTropical Map Research Group, Faculty of Built Environment & Surveying, Universiti Teknologi Malaysia, Skudai 81310, Johor, MalaysiaInstitute for Environmental Research and Sustainable Development, National Observatory of Athens, 15236 Athens, GreeceDepartment of Earth Sciences and Environment, Faculty of Science and Technology, Universiti Kebangsaan Malaysia, Bangi 43600, Selangor, MalaysiaSoutheast Asia (SEA) is a hotspot region for atmospheric pollution and haze conditions, due to extensive forest, agricultural and peat fires. This study aims to estimate the PM<sub>2.5</sub> concentrations across Malaysia using machine-learning (ML) models like Random Forest (RF) and Support Vector Regression (SVR), based on satellite AOD (aerosol optical depth) observations, ground measured air pollutants (NO<sub>2</sub>, SO<sub>2</sub>, CO, O<sub>3</sub>) and meteorological parameters (air temperature, relative humidity, wind speed and direction). The estimated PM<sub>2.5</sub> concentrations for a two-year period (2018–2019) are evaluated against measurements performed at 65 air-quality monitoring stations located at urban, industrial, suburban and rural sites. PM<sub>2.5</sub> concentrations varied widely between the stations, with higher values (mean of 24.2 ± 21.6 µg m<sup>−3</sup>) at urban/industrial stations and lower (mean of 21.3 ± 18.4 µg m<sup>−3</sup>) at suburban/rural sites. Furthermore, pronounced seasonal variability in PM<sub>2.5</sub> is recorded across Malaysia, with highest concentrations during the dry season (June–September). Seven models were developed for PM<sub>2.5</sub> predictions, i.e., separately for urban/industrial and suburban/rural sites, for the four dominant seasons (dry, wet and two inter-monsoon), and an overall model, which displayed accuracies in the order of R<sup>2</sup> = 0.46–0.76. The validation analysis reveals that the RF model (R<sup>2</sup> = 0.53–0.76) exhibits slightly better performance than SVR, except for the overall model. This is the first study conducted in Malaysia for PM<sub>2.5</sub> estimations at a national scale combining satellite aerosol retrievals with ground-based pollutants, meteorological factors and ML techniques. The satisfactory prediction of PM<sub>2.5</sub> concentrations across Malaysia allows a continuous monitoring of the pollution levels at remote areas with absence of measurement networks.https://www.mdpi.com/2076-3417/11/16/7326PM<sub>2.5</sub>Himawari-8random forestsupport vector regressionair pollutionMalaysia |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nurul Amalin Fatihah Kamarul Zaman Kasturi Devi Kanniah Dimitris G. Kaskaoutis Mohd Talib Latif |
spellingShingle |
Nurul Amalin Fatihah Kamarul Zaman Kasturi Devi Kanniah Dimitris G. Kaskaoutis Mohd Talib Latif Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia Applied Sciences PM<sub>2.5</sub> Himawari-8 random forest support vector regression air pollution Malaysia |
author_facet |
Nurul Amalin Fatihah Kamarul Zaman Kasturi Devi Kanniah Dimitris G. Kaskaoutis Mohd Talib Latif |
author_sort |
Nurul Amalin Fatihah Kamarul Zaman |
title |
Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia |
title_short |
Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia |
title_full |
Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia |
title_fullStr |
Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia |
title_full_unstemmed |
Evaluation of Machine Learning Models for Estimating PM<sub>2.5</sub> Concentrations across Malaysia |
title_sort |
evaluation of machine learning models for estimating pm<sub>2.5</sub> concentrations across malaysia |
publisher |
MDPI AG |
series |
Applied Sciences |
issn |
2076-3417 |
publishDate |
2021-08-01 |
description |
Southeast Asia (SEA) is a hotspot region for atmospheric pollution and haze conditions, due to extensive forest, agricultural and peat fires. This study aims to estimate the PM<sub>2.5</sub> concentrations across Malaysia using machine-learning (ML) models like Random Forest (RF) and Support Vector Regression (SVR), based on satellite AOD (aerosol optical depth) observations, ground measured air pollutants (NO<sub>2</sub>, SO<sub>2</sub>, CO, O<sub>3</sub>) and meteorological parameters (air temperature, relative humidity, wind speed and direction). The estimated PM<sub>2.5</sub> concentrations for a two-year period (2018–2019) are evaluated against measurements performed at 65 air-quality monitoring stations located at urban, industrial, suburban and rural sites. PM<sub>2.5</sub> concentrations varied widely between the stations, with higher values (mean of 24.2 ± 21.6 µg m<sup>−3</sup>) at urban/industrial stations and lower (mean of 21.3 ± 18.4 µg m<sup>−3</sup>) at suburban/rural sites. Furthermore, pronounced seasonal variability in PM<sub>2.5</sub> is recorded across Malaysia, with highest concentrations during the dry season (June–September). Seven models were developed for PM<sub>2.5</sub> predictions, i.e., separately for urban/industrial and suburban/rural sites, for the four dominant seasons (dry, wet and two inter-monsoon), and an overall model, which displayed accuracies in the order of R<sup>2</sup> = 0.46–0.76. The validation analysis reveals that the RF model (R<sup>2</sup> = 0.53–0.76) exhibits slightly better performance than SVR, except for the overall model. This is the first study conducted in Malaysia for PM<sub>2.5</sub> estimations at a national scale combining satellite aerosol retrievals with ground-based pollutants, meteorological factors and ML techniques. The satisfactory prediction of PM<sub>2.5</sub> concentrations across Malaysia allows a continuous monitoring of the pollution levels at remote areas with absence of measurement networks. |
topic |
PM<sub>2.5</sub> Himawari-8 random forest support vector regression air pollution Malaysia |
url |
https://www.mdpi.com/2076-3417/11/16/7326 |
work_keys_str_mv |
AT nurulamalinfatihahkamarulzaman evaluationofmachinelearningmodelsforestimatingpmsub25subconcentrationsacrossmalaysia AT kasturidevikanniah evaluationofmachinelearningmodelsforestimatingpmsub25subconcentrationsacrossmalaysia AT dimitrisgkaskaoutis evaluationofmachinelearningmodelsforestimatingpmsub25subconcentrationsacrossmalaysia AT mohdtaliblatif evaluationofmachinelearningmodelsforestimatingpmsub25subconcentrationsacrossmalaysia |
_version_ |
1721195209687040000 |