Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method

Changes in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appro...

Full description

Bibliographic Details
Main Authors: Seoro Lee, Jonggun Kim, Gwanjae Lee, Jiyeong Hong, Joo Hyun Bae, Kyoung Jae Lim
Format: Article
Language:English
Published: MDPI AG 2021-09-01
Series:Sustainability
Subjects:
Online Access:https://www.mdpi.com/2071-1050/13/18/10435
id doaj-875b83ccedb34cc6a49d5377fc2cd0ea
record_format Article
spelling doaj-875b83ccedb34cc6a49d5377fc2cd0ea2021-09-26T01:29:59ZengMDPI AGSustainability2071-10502021-09-0113104351043510.3390/su131810435Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation MethodSeoro Lee0Jonggun Kim1Gwanjae Lee2Jiyeong Hong3Joo Hyun Bae4Kyoung Jae Lim5Department of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Earth and Environment, Boston University, Boston, MA 02215, USAKorea Water Environment Research Institute, Chuncheon-si 24408, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaChanges in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appropriate strategic plans. Recently, machine learning (ML) models have been widely used to solve hydrological and environmental problems in various fields. However, in general, collecting sufficient data for ML training is time-consuming and labor-intensive. Especially in classification problems, data imbalance can lead to erroneous prediction results of ML models. In this study, we proposed a method to solve the data imbalance problem through data augmentation based on Wasserstein Generative Adversarial Network (WGAN) and to efficiently predict the grades (from A to E grades) of AEH indices (i.e., Benthic Macroinvertebrate Index (BMI), Trophic Diatom Index (TDI), Fish Assessment Index (FAI)) through the ML models. Raw datasets for the AEH indices composed of various physicochemical factors (i.e., WT, DO, BOD<sub>5</sub>, SS, TN, TP, and Flow) and AEH grades were built and augmented through the WGAN. The performance of each ML model was evaluated through a 10-fold cross-validation (CV), and the performances of the ML models trained on the raw and WGAN-based training sets were compared and analyzed through AEH grade prediction on the test sets. The results showed that the ML models trained on the WGAN-based training set had an average F1-score for grades of each AEH index of 0.9 or greater for the test set, which was superior to the models trained on the raw training set (fewer data compared to other datasets) only. Through the above results, it was confirmed that by using the dataset augmented through WGAN, the ML model can yield better AEH grade predictive performance compared to the model trained on limited datasets; this approach reduces the effort needed for actual data collection from rivers which requires enormous time and cost. In the future, the results of this study can be used as basic data to construct big data of aquatic ecosystems, needed to efficiently evaluate and predict AEH in rivers based on the ML models.https://www.mdpi.com/2071-1050/13/18/10435aquatic ecosystem healthmachine learning modelsWGANdata augmentation
collection DOAJ
language English
format Article
sources DOAJ
author Seoro Lee
Jonggun Kim
Gwanjae Lee
Jiyeong Hong
Joo Hyun Bae
Kyoung Jae Lim
spellingShingle Seoro Lee
Jonggun Kim
Gwanjae Lee
Jiyeong Hong
Joo Hyun Bae
Kyoung Jae Lim
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
Sustainability
aquatic ecosystem health
machine learning models
WGAN
data augmentation
author_facet Seoro Lee
Jonggun Kim
Gwanjae Lee
Jiyeong Hong
Joo Hyun Bae
Kyoung Jae Lim
author_sort Seoro Lee
title Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
title_short Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
title_full Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
title_fullStr Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
title_full_unstemmed Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
title_sort prediction of aquatic ecosystem health indices through machine learning models using the wgan-based data augmentation method
publisher MDPI AG
series Sustainability
issn 2071-1050
publishDate 2021-09-01
description Changes in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appropriate strategic plans. Recently, machine learning (ML) models have been widely used to solve hydrological and environmental problems in various fields. However, in general, collecting sufficient data for ML training is time-consuming and labor-intensive. Especially in classification problems, data imbalance can lead to erroneous prediction results of ML models. In this study, we proposed a method to solve the data imbalance problem through data augmentation based on Wasserstein Generative Adversarial Network (WGAN) and to efficiently predict the grades (from A to E grades) of AEH indices (i.e., Benthic Macroinvertebrate Index (BMI), Trophic Diatom Index (TDI), Fish Assessment Index (FAI)) through the ML models. Raw datasets for the AEH indices composed of various physicochemical factors (i.e., WT, DO, BOD<sub>5</sub>, SS, TN, TP, and Flow) and AEH grades were built and augmented through the WGAN. The performance of each ML model was evaluated through a 10-fold cross-validation (CV), and the performances of the ML models trained on the raw and WGAN-based training sets were compared and analyzed through AEH grade prediction on the test sets. The results showed that the ML models trained on the WGAN-based training set had an average F1-score for grades of each AEH index of 0.9 or greater for the test set, which was superior to the models trained on the raw training set (fewer data compared to other datasets) only. Through the above results, it was confirmed that by using the dataset augmented through WGAN, the ML model can yield better AEH grade predictive performance compared to the model trained on limited datasets; this approach reduces the effort needed for actual data collection from rivers which requires enormous time and cost. In the future, the results of this study can be used as basic data to construct big data of aquatic ecosystems, needed to efficiently evaluate and predict AEH in rivers based on the ML models.
topic aquatic ecosystem health
machine learning models
WGAN
data augmentation
url https://www.mdpi.com/2071-1050/13/18/10435
work_keys_str_mv AT seorolee predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
AT jonggunkim predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
AT gwanjaelee predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
AT jiyeonghong predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
AT joohyunbae predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
AT kyoungjaelim predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod
_version_ 1716868854878044160