Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes

The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytop...

Full description

Bibliographic Details
Main Authors: Shuibo Hu, Huizeng Liu, Wenjing Zhao, Tiezhu Shi, Zhongwen Hu, Qingquan Li, Guofeng Wu
Format: Article
Language:English
Published: MDPI AG 2018-03-01
Series:Remote Sensing
Subjects:
Online Access:http://www.mdpi.com/2072-4292/10/3/191
id doaj-9c5f2bd462944dce97e181a0bd387e8f
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Shuibo Hu
Huizeng Liu
Wenjing Zhao
Tiezhu Shi
Zhongwen Hu
Qingquan Li
Guofeng Wu
spellingShingle Shuibo Hu
Huizeng Liu
Wenjing Zhao
Tiezhu Shi
Zhongwen Hu
Qingquan Li
Guofeng Wu
Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
Remote Sensing
phytoplankton size classes
machine learning
feature selection
random forest
remote sensing
author_facet Shuibo Hu
Huizeng Liu
Wenjing Zhao
Tiezhu Shi
Zhongwen Hu
Qingquan Li
Guofeng Wu
author_sort Shuibo Hu
title Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
title_short Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
title_full Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
title_fullStr Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
title_full_unstemmed Comparison of Machine Learning Techniques in Inferring Phytoplankton Size Classes
title_sort comparison of machine learning techniques in inferring phytoplankton size classes
publisher MDPI AG
series Remote Sensing
issn 2072-4292
publishDate 2018-03-01
description The size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs), in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD) high performance liquid chromatography (HPLC) database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS), artificial neural networks (ANN), support vector machine (SVM) and random forests (RF), and feature selection techniques, including genetic algorithm (GA), successive projection algorithm (SPA) and recursive feature elimination based on support vector machine (SVM-RFE), for inferring PSCs from remote sensing data. Results showed that: (1) SVM-RFE worked better in selecting sensitive features; (2) RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3) machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4) sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5) the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.
topic phytoplankton size classes
machine learning
feature selection
random forest
remote sensing
url http://www.mdpi.com/2072-4292/10/3/191
work_keys_str_mv AT shuibohu comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT huizengliu comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT wenjingzhao comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT tiezhushi comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT zhongwenhu comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT qingquanli comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
AT guofengwu comparisonofmachinelearningtechniquesininferringphytoplanktonsizeclasses
_version_ 1725695943851900928
spelling doaj-9c5f2bd462944dce97e181a0bd387e8f2020-11-24T22:43:26ZengMDPI AGRemote Sensing2072-42922018-03-0110319110.3390/rs10030191rs10030191Comparison of Machine Learning Techniques in Inferring Phytoplankton Size ClassesShuibo Hu0Huizeng Liu1Wenjing Zhao2Tiezhu Shi3Zhongwen Hu4Qingquan Li5Guofeng Wu6Key Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaSouth China Institute of Environmental Sciences, the Ministry of Environmental Protection of RPC, Guangzhou 510535, ChinaKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaKey Laboratory for Geo-Environmental Monitoring of Coastal Zone of the National Administration of Surveying, Mapping and GeoInformation & Shenzhen Key Laboratory of Spatial Smart Sensing and Services & Research Institute for Smart Cities & Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen University, Shenzhen 518060, ChinaThe size of phytoplankton not only influences its physiology, metabolic rates and marine food web, but also serves as an indicator of phytoplankton functional roles in ecological and biogeochemical processes. Therefore, some algorithms have been developed to infer the synoptic distribution of phytoplankton cell size, denoted as phytoplankton size classes (PSCs), in surface ocean waters, by the means of remotely sensed variables. This study, using the NASA bio-Optical Marine Algorithm Data set (NOMAD) high performance liquid chromatography (HPLC) database, and satellite match-ups, aimed to compare the effectiveness of modeling techniques, including partial least square (PLS), artificial neural networks (ANN), support vector machine (SVM) and random forests (RF), and feature selection techniques, including genetic algorithm (GA), successive projection algorithm (SPA) and recursive feature elimination based on support vector machine (SVM-RFE), for inferring PSCs from remote sensing data. Results showed that: (1) SVM-RFE worked better in selecting sensitive features; (2) RF performed better than PLS, ANN and SVM in calibrating PSCs retrieval models; (3) machine learning techniques produced better performance than the chlorophyll-a based three-component method; (4) sea surface temperature, wind stress, and spectral curvature derived from the remote sensing reflectance at 490, 510, and 555 nm were among the most sensitive features to PSCs; and (5) the combination of SVM-RFE feature selection techniques and random forests regression was recommended for inferring PSCs. This study demonstrated the effectiveness of machine learning techniques in selecting sensitive features and calibrating models for PSCs estimations with remote sensing.http://www.mdpi.com/2072-4292/10/3/191phytoplankton size classesmachine learningfeature selectionrandom forestremote sensing