Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods
Seabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-04-01
|
Series: | Geosciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3263/9/4/180 |
id |
doaj-d229005947d0444eac07feb6e0a97a55 |
---|---|
record_format |
Article |
spelling |
doaj-d229005947d0444eac07feb6e0a97a552020-11-24T22:19:42ZengMDPI AGGeosciences2076-32632019-04-019418010.3390/geosciences9040180geosciences9040180Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid MethodsJin Li0Justy Siwabessy1Zhi Huang2Scott Nichol3National Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaNational Earth and Marine Observations Branch, Environmental Geoscience Division, Geoscience Australia, GPO Box 378, Canberra, ACT 2601, AustraliaSeabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling (GBM), to seabed sand content point data and acoustic multibeam data and their derived variables, to develop an accurate model to predict seabed sand content at a local scale. We also addressed relevant issues with variable selection. It was found that: (1) backscatter-related variables are more important than bathymetry-related variables for sand predictive modelling; (2) the inclusion of highly correlated predictors can improve predictive accuracy; (3) the rank orders of averaged variable importance (AVI) and accuracy contribution change with input predictors for RF and are not necessarily matched; (4) a knowledge-informed AVI method (KIAVI2) is recommended for RF; (5) the hybrid methods and their averaging can significantly improve predictive accuracy and are recommended; (6) relationships between sand and predictors are non-linear; and (7) variable selection methods for GBM need further study. Accuracy-improved predictions of sand content are generated at high resolution, which provide important baseline information for environmental management and conservation.https://www.mdpi.com/2076-3263/9/4/180machine learningvariable importancevariable selectionmodel selectionpredictive accuracyspatial predictive modelacoustic multibeam dataspatial predictions |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jin Li Justy Siwabessy Zhi Huang Scott Nichol |
spellingShingle |
Jin Li Justy Siwabessy Zhi Huang Scott Nichol Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods Geosciences machine learning variable importance variable selection model selection predictive accuracy spatial predictive model acoustic multibeam data spatial predictions |
author_facet |
Jin Li Justy Siwabessy Zhi Huang Scott Nichol |
author_sort |
Jin Li |
title |
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods |
title_short |
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods |
title_full |
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods |
title_fullStr |
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods |
title_full_unstemmed |
Developing an Optimal Spatial Predictive Model for Seabed Sand Content Using Machine Learning, Geostatistics, and Their Hybrid Methods |
title_sort |
developing an optimal spatial predictive model for seabed sand content using machine learning, geostatistics, and their hybrid methods |
publisher |
MDPI AG |
series |
Geosciences |
issn |
2076-3263 |
publishDate |
2019-04-01 |
description |
Seabed sediment predictions at regional and national scales in Australia are mainly based on bathymetry-related variables due to the lack of backscatter-derived data. In this study, we applied random forests (RFs), hybrid methods of RF and geostatistics, and generalized boosted regression modelling (GBM), to seabed sand content point data and acoustic multibeam data and their derived variables, to develop an accurate model to predict seabed sand content at a local scale. We also addressed relevant issues with variable selection. It was found that: (1) backscatter-related variables are more important than bathymetry-related variables for sand predictive modelling; (2) the inclusion of highly correlated predictors can improve predictive accuracy; (3) the rank orders of averaged variable importance (AVI) and accuracy contribution change with input predictors for RF and are not necessarily matched; (4) a knowledge-informed AVI method (KIAVI2) is recommended for RF; (5) the hybrid methods and their averaging can significantly improve predictive accuracy and are recommended; (6) relationships between sand and predictors are non-linear; and (7) variable selection methods for GBM need further study. Accuracy-improved predictions of sand content are generated at high resolution, which provide important baseline information for environmental management and conservation. |
topic |
machine learning variable importance variable selection model selection predictive accuracy spatial predictive model acoustic multibeam data spatial predictions |
url |
https://www.mdpi.com/2076-3263/9/4/180 |
work_keys_str_mv |
AT jinli developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods AT justysiwabessy developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods AT zhihuang developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods AT scottnichol developinganoptimalspatialpredictivemodelforseabedsandcontentusingmachinelearninggeostatisticsandtheirhybridmethods |
_version_ |
1725777979612594176 |