Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods

Seabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling...

Full description

Bibliographic Details
Main Authors: Jin Li, Justy Siwabessy, Zhi Huang, Scott Nichol
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Geosciences
Subjects:
Online Access:https://www.mdpi.com/2076-3263/9/4/180
id doaj-d229005947d0444eac07feb6e0a97a55
record_format Article
spelling doaj-d229005947d0444eac07feb6e0a97a552020-11-24T22:19:42ZengMDPI AGGeosciences2076-32632019-04-019418010.3390/geosciences9040180geosciences9040180Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid MethodsJin Li0Justy Siwabessy1Zhi Huang2Scott Nichol3National Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaSeabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling (GBM), to seabed sand content point data and acoustic multibeam data and their derived variables, to develop an accurate model to predict seabed sand content at a local scale. We also addressed relevant issues with variable selection. It was found that: (1) backscatter-related variables are more important than bathymetry-related variables for sand predictive modelling; (2) the inclusion of highly correlated predictors can improve predictive accuracy; (3) the rank orders of averaged variable importance (AVI) and accuracy contribution change with input predictors for RF and are not necessarily matched; (4) a knowledge-informed AVI method (KIAVI2) is recommended for RF; (5) the hybrid methods and their averaging can significantly improve predictive accuracy and are recommended; (6) relationships between sand and predictors are non-linear; and (7) variable selection methods for GBM need further study. Accuracy-improved predictions of sand content are generated at high resolution, which provide important baseline information for environmental management and conservation.https://www.mdpi.com/2076-3263/9/4/180machine learningvariable importancevariable selectionmodel selectionpredictive accuracyspatial predictive modelacoustic multibeam dataspatial predictions
collection DOAJ
language English
format Article
sources DOAJ
author Jin Li
Justy Siwabessy
Zhi Huang
Scott Nichol
spellingShingle Jin Li
Justy Siwabessy
Zhi Huang
Scott Nichol
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
Geosciences
machine learning
variable importance
variable selection
model selection
predictive accuracy
spatial predictive model
acoustic multibeam data
spatial predictions
author_facet Jin Li
Justy Siwabessy
Zhi Huang
Scott Nichol
author_sort Jin Li
title Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
title_short Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
title_full Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
title_fullStr Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
title_full_unstemmed Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
title_sort developing an optimal spatial predictive model for seabed sand content using machine learning, geostatistics, and their hybrid methods
publisher MDPI AG
series Geosciences
issn 2076-3263
publishDate 2019-04-01
description Seabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling (GBM), to seabed sand content point data and acoustic multibeam data and their derived variables, to develop an accurate model to predict seabed sand content at a local scale. We also addressed relevant issues with variable selection. It was found that: (1) backscatter-related variables are more important than bathymetry-related variables for sand predictive modelling; (2) the inclusion of highly correlated predictors can improve predictive accuracy; (3) the rank orders of averaged variable importance (AVI) and accuracy contribution change with input predictors for RF and are not necessarily matched; (4) a knowledge-informed AVI method (KIAVI2) is recommended for RF; (5) the hybrid methods and their averaging can significantly improve predictive accuracy and are recommended; (6) relationships between sand and predictors are non-linear; and (7) variable selection methods for GBM need further study. Accuracy-improved predictions of sand content are generated at high resolution, which provide important baseline information for environmental management and conservation.
topic machine learning
variable importance
variable selection
model selection
predictive accuracy
spatial predictive model
acoustic multibeam data
spatial predictions
url https://www.mdpi.com/2076-3263/9/4/180
work_keys_str_mv AT jinli developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods
AT justysiwabessy developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods
AT zhihuang developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods
AT scottnichol developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods
_version_ 1725777979612594176