USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS

Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentratio...

Full description

Bibliographic Details
Main Author: Sarkar, Supria
Format: Others
Published: OpenSIUC 2021
Subjects:
Online Access:https://opensiuc.lib.siu.edu/theses/2873
https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses
id ndltd-siu.edu-oai-opensiuc.lib.siu.edu-theses-3887
record_format oai_dc
spelling ndltd-siu.edu-oai-opensiuc.lib.siu.edu-theses-38872021-09-22T05:13:30Z USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS Sarkar, Supria Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models. 2021-09-01T07:00:00Z text application/pdf https://opensiuc.lib.siu.edu/theses/2873 https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses Theses OpenSIUC Cyanobacteria Geographic Information System Remote Sensing Water Quality
collection NDLTD
format Others
sources NDLTD
topic Cyanobacteria
Geographic Information System
Remote Sensing
Water Quality
spellingShingle Cyanobacteria
Geographic Information System
Remote Sensing
Water Quality
Sarkar, Supria
USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
description Harmful Algae Blooms (HABs) in inland waterbodies (e.g., lakes and ponds) pose serious threat to human health and natural ecosystem. Thus, it is imperative to assess HABs and their potential triggering factors over broader spatiotemporal scales. This study utilizes Chlorophyll-a (Chl-a) concentration in water samples collected from lakes in Illinois as an indirect measurement of HABs. The major objectives were to assess the spatiotemporal pattern of HABs over Illinois regions in recent decades, and to examine different machine learning models for predicting the Chl-a concentration based on publicly available water quality datasets. The Chl-a dataset was compiled from two different sources, the regular monitoring program by Illinois Environmental Protection Agency (IEPA) and the Voluntary Lake Monitoring Program (VLMP), the latter of which was primarily collected by citizen participants. Seven environmental and water quality zones were selected for spatial analyses. Additionally, the temporal patterns were assessed using time-series decomposition of monthly Chl-a concentration datasets. The machine learning pipeline includes two tasks: a regression modeling task for predicting Chl-a concentration, and a classification task for estimating lake trophic status. Different meteorological, land use and land cover, and lake morphometry variables were used as independent variables. Four regression models, i.e., Partial Least Squares Regression (PLSR), Support Vector Machine Regression (SVR), Artificial Neural Network Regression (ANNR), and Random Forest Regression (RFR) were used for the first task of the modeling pipeline, and four classification models, i.e., Logistic Regression Classification (LRC), Support Vector Machine Classification (SVC), Artificial Neural Network Classification (ANNC), and Random Forest Classification (RFC), were used for the second task. Results indicate that: a) the Collinsville region in southwestern part of Illinois exhibited higher mean concentration of Chl-a in its lakes than any other regions from 1998 to 2018; b) the lakes that showed increasing trends in their monthly mean Chl-a concentrations were also clustered in the southwestern region; c) Random Forest outperformed all other models in both classification (Accuracy=60.06%) and regression (R2=38.88%); and d) the land use and land cover variables were found as the most important set of variables in Random Forest models.
author Sarkar, Supria
author_facet Sarkar, Supria
author_sort Sarkar, Supria
title USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
title_short USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
title_full USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
title_fullStr USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
title_full_unstemmed USING MACHINE LEARNING TO UNDERSTAND THE SPATIOTEMPORAL VARIABILITY OF HARMFUL ALGAE BLOOMS IN ILLINOIS WATERS
title_sort using machine learning to understand the spatiotemporal variability of harmful algae blooms in illinois waters
publisher OpenSIUC
publishDate 2021
url https://opensiuc.lib.siu.edu/theses/2873
https://opensiuc.lib.siu.edu/cgi/viewcontent.cgi?article=3887&context=theses
work_keys_str_mv AT sarkarsupria usingmachinelearningtounderstandthespatiotemporalvariabilityofharmfulalgaebloomsinillinoiswaters
_version_ 1719483026417647616