USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentratio...
Main Author: | |
---|---|
Format: | Others |
Published: |
OpenSIUC
2021
|
Subjects: | |
Online Access: | https://opensiuc.lib.siu.edu/theses/2873 https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses |
id |
ndltd-siu.edu-oai-opensiuc.lib.siu.edu-theses-3887 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-siu.edu-oai-opensiuc.lib.siu.edu-theses-38872021-09-22T05:13:30Z USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS Sarkar, Supria Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models. 2021-09-01T07:00:00Z text application/pdf https://opensiuc.lib.siu.edu/theses/2873 https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses Theses OpenSIUC Cyanobacteria Geographic Information System Remote Sensing Water Quality |
collection |
NDLTD |
format |
Others
|
sources |
NDLTD |
topic |
Cyanobacteria Geographic Information System Remote Sensing Water Quality |
spellingShingle |
Cyanobacteria Geographic Information System Remote Sensing Water Quality Sarkar, Supria USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
description |
Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models. |
author |
Sarkar, Supria |
author_facet |
Sarkar, Supria |
author_sort |
Sarkar, Supria |
title |
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
title_short |
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
title_full |
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
title_fullStr |
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
title_full_unstemmed |
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS |
title_sort |
using machine learning to understand the spatiotemporal variability of harmful algae blooms in illinois waters |
publisher |
OpenSIUC |
publishDate |
2021 |
url |
https://opensiuc.lib.siu.edu/theses/2873 https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses |
work_keys_str_mv |
AT sarkarsupria usingmachinelearningtounderstandthespatiotemporalvariabilityofharmfulalgaebloomsinillinoiswaters |
_version_ |
1719483026417647616 |