Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method
Changes in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appro...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-09-01
|
Series: | Sustainability |
Subjects: | |
Online Access: | https://www.mdpi.com/2071-1050/13/18/10435 |
id |
doaj-875b83ccedb34cc6a49d5377fc2cd0ea |
---|---|
record_format |
Article |
spelling |
doaj-875b83ccedb34cc6a49d5377fc2cd0ea2021-09-26T01:29:59ZengMDPI AGSustainability2071-10502021-09-0113104351043510.3390/su131810435Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation MethodSeoro Lee0Jonggun Kim1Gwanjae Lee2Jiyeong Hong3Joo Hyun Bae4Kyoung Jae Lim5Department of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaDepartment of Earth and Environment, Boston University, Boston, MA 02215, USAKorea Water Environment Research Institute, Chuncheon-si 24408, KoreaDepartment of Regional Infrastructure Engineering, Kangwon National University, Chuncheon-si 24341, KoreaChanges in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appropriate strategic plans. Recently, machine learning (ML) models have been widely used to solve hydrological and environmental problems in various fields. However, in general, collecting sufficient data for ML training is time-consuming and labor-intensive. Especially in classification problems, data imbalance can lead to erroneous prediction results of ML models. In this study, we proposed a method to solve the data imbalance problem through data augmentation based on Wasserstein Generative Adversarial Network (WGAN) and to efficiently predict the grades (from A to E grades) of AEH indices (i.e., Benthic Macroinvertebrate Index (BMI), Trophic Diatom Index (TDI), Fish Assessment Index (FAI)) through the ML models. Raw datasets for the AEH indices composed of various physicochemical factors (i.e., WT, DO, BOD<sub>5</sub>, SS, TN, TP, and Flow) and AEH grades were built and augmented through the WGAN. The performance of each ML model was evaluated through a 10-fold cross-validation (CV), and the performances of the ML models trained on the raw and WGAN-based training sets were compared and analyzed through AEH grade prediction on the test sets. The results showed that the ML models trained on the WGAN-based training set had an average F1-score for grades of each AEH index of 0.9 or greater for the test set, which was superior to the models trained on the raw training set (fewer data compared to other datasets) only. Through the above results, it was confirmed that by using the dataset augmented through WGAN, the ML model can yield better AEH grade predictive performance compared to the model trained on limited datasets; this approach reduces the effort needed for actual data collection from rivers which requires enormous time and cost. In the future, the results of this study can be used as basic data to construct big data of aquatic ecosystems, needed to efficiently evaluate and predict AEH in rivers based on the ML models.https://www.mdpi.com/2071-1050/13/18/10435aquatic ecosystem healthmachine learning modelsWGANdata augmentation |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Seoro Lee Jonggun Kim Gwanjae Lee Jiyeong Hong Joo Hyun Bae Kyoung Jae Lim |
spellingShingle |
Seoro Lee Jonggun Kim Gwanjae Lee Jiyeong Hong Joo Hyun Bae Kyoung Jae Lim Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method Sustainability aquatic ecosystem health machine learning models WGAN data augmentation |
author_facet |
Seoro Lee Jonggun Kim Gwanjae Lee Jiyeong Hong Joo Hyun Bae Kyoung Jae Lim |
author_sort |
Seoro Lee |
title |
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method |
title_short |
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method |
title_full |
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method |
title_fullStr |
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method |
title_full_unstemmed |
Prediction of Aquatic Ecosystem Health Indices through Machine Learning Models Using the WGAN-Based Data Augmentation Method |
title_sort |
prediction of aquatic ecosystem health indices through machine learning models using the wgan-based data augmentation method |
publisher |
MDPI AG |
series |
Sustainability |
issn |
2071-1050 |
publishDate |
2021-09-01 |
description |
Changes in hydrological characteristics and increases in various pollutant loadings due to rapid climate change and urbanization have a significant impact on the deterioration of aquatic ecosystem health (AEH). Therefore, it is important to effectively evaluate the AEH in advance and establish appropriate strategic plans. Recently, machine learning (ML) models have been widely used to solve hydrological and environmental problems in various fields. However, in general, collecting sufficient data for ML training is time-consuming and labor-intensive. Especially in classification problems, data imbalance can lead to erroneous prediction results of ML models. In this study, we proposed a method to solve the data imbalance problem through data augmentation based on Wasserstein Generative Adversarial Network (WGAN) and to efficiently predict the grades (from A to E grades) of AEH indices (i.e., Benthic Macroinvertebrate Index (BMI), Trophic Diatom Index (TDI), Fish Assessment Index (FAI)) through the ML models. Raw datasets for the AEH indices composed of various physicochemical factors (i.e., WT, DO, BOD<sub>5</sub>, SS, TN, TP, and Flow) and AEH grades were built and augmented through the WGAN. The performance of each ML model was evaluated through a 10-fold cross-validation (CV), and the performances of the ML models trained on the raw and WGAN-based training sets were compared and analyzed through AEH grade prediction on the test sets. The results showed that the ML models trained on the WGAN-based training set had an average F1-score for grades of each AEH index of 0.9 or greater for the test set, which was superior to the models trained on the raw training set (fewer data compared to other datasets) only. Through the above results, it was confirmed that by using the dataset augmented through WGAN, the ML model can yield better AEH grade predictive performance compared to the model trained on limited datasets; this approach reduces the effort needed for actual data collection from rivers which requires enormous time and cost. In the future, the results of this study can be used as basic data to construct big data of aquatic ecosystems, needed to efficiently evaluate and predict AEH in rivers based on the ML models. |
topic |
aquatic ecosystem health machine learning models WGAN data augmentation |
url |
https://www.mdpi.com/2071-1050/13/18/10435 |
work_keys_str_mv |
AT seorolee predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod AT jonggunkim predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod AT gwanjaelee predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod AT jiyeonghong predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod AT joohyunbae predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod AT kyoungjaelim predictionofaquaticecosystemhealthindicesthroughmachinelearningmodelsusingthewganbaseddataaugmentationmethod |
_version_ |
1716868854878044160 |