Assessing the geographic specificity of pH prediction by classification and regression trees.
Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The cu...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2021-01-01
|
Series: | PLoS ONE |
Online Access: | https://doi.org/10.1371/journal.pone.0255119 |
id |
doaj-dc5a9484cb8f4f3b928cfad97cd70d4b |
---|---|
record_format |
Article |
spelling |
doaj-dc5a9484cb8f4f3b928cfad97cd70d4b2021-08-17T04:31:13ZengPublic Library of Science (PLoS)PLoS ONE1932-62032021-01-01168e025511910.1371/journal.pone.0255119Assessing the geographic specificity of pH prediction by classification and regression trees.Jacob EgelbergNina PenaRachel RiveraChristina AndrukSoil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York's Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH.https://doi.org/10.1371/journal.pone.0255119 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jacob Egelberg Nina Pena Rachel Rivera Christina Andruk |
spellingShingle |
Jacob Egelberg Nina Pena Rachel Rivera Christina Andruk Assessing the geographic specificity of pH prediction by classification and regression trees. PLoS ONE |
author_facet |
Jacob Egelberg Nina Pena Rachel Rivera Christina Andruk |
author_sort |
Jacob Egelberg |
title |
Assessing the geographic specificity of pH prediction by classification and regression trees. |
title_short |
Assessing the geographic specificity of pH prediction by classification and regression trees. |
title_full |
Assessing the geographic specificity of pH prediction by classification and regression trees. |
title_fullStr |
Assessing the geographic specificity of pH prediction by classification and regression trees. |
title_full_unstemmed |
Assessing the geographic specificity of pH prediction by classification and regression trees. |
title_sort |
assessing the geographic specificity of ph prediction by classification and regression trees. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2021-01-01 |
description |
Soil pH effects a wide range of critical biogeochemical processes that dictate plant growth and diversity. Previous literature has established the capacity of classification and regression trees (CARTs) to predict soil pH, but limitations of CARTs in this context have not been fully explored. The current study collected soil pH, climatic, and topographic data from 100 locations across New York's Temperate Deciduous Forests (in the United States of America) to investigate the extrapolative capacity of a previously developed CART model as compared to novel CART and random forest (RF) models. Results showed that the previously developed CART underperformed in terms of predictive accuracy (RRMSE = 14.52%) when compared to a novel tree (RRMSE = 9.33%), and that a novel random forest outperformed both models (RRMSE = 8.88%), though its predictions did not differ significantly from the novel tree (p = 0.26). The most important predictors for model construction were climatic factors. These findings confirm existing reports that CART models are constrained by the spatial autocorrelation of geographic data and encourage the restricted application of relevant machine learning models to regions from which training data was collected. They also contradict previous literature implying that random forests should meaningfully boost the predictive accuracy of CARTs in the context of soil pH. |
url |
https://doi.org/10.1371/journal.pone.0255119 |
work_keys_str_mv |
AT jacobegelberg assessingthegeographicspecificityofphpredictionbyclassificationandregressiontrees AT ninapena assessingthegeographicspecificityofphpredictionbyclassificationandregressiontrees AT rachelrivera assessingthegeographicspecificityofphpredictionbyclassificationandregressiontrees AT christinaandruk assessingthegeographicspecificityofphpredictionbyclassificationandregressiontrees |
_version_ |
1721205589605875712 |