Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure.
This paper investigates the consequences of ignoring the clustered data structure on allometric models. Clustered data, in the form of multiple trees sampled from multiple forest stands is commonly used to develop biomass allometric models. Of 102 reviewed papers published between 2012 and 2016 that...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2018-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC6071979?pdf=render |
id |
doaj-1f6c52dbb79e4d4885bffd892fede53b |
---|---|
record_format |
Article |
spelling |
doaj-1f6c52dbb79e4d4885bffd892fede53b2020-11-25T02:12:28ZengPublic Library of Science (PLoS)PLoS ONE1932-62032018-01-01138e020012310.1371/journal.pone.0200123Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure.Ioan DutcăPetru Tudor StăncioiuIoan Vasile AbrudanFlorin IorașThis paper investigates the consequences of ignoring the clustered data structure on allometric models. Clustered data, in the form of multiple trees sampled from multiple forest stands is commonly used to develop biomass allometric models. Of 102 reviewed papers published between 2012 and 2016 that reported biomass allometric models, 84 (82%) have used a clustered sampling design. However, in as many as 80% of these, the clustered data structure was ignored, potentially violating the independence assumption in ordinary least squares methods. The consequences of ignoring clustered data structure were empirically validated using two clustered biomass datasets (of 110 and 220 trees, with the cluster size of 5 and 10 trees respectively). We showed that when Intraclass Correlation Coefficient (ICC) was higher than zero, ignoring the clustered data structure returned underestimated standard errors, affecting further the confidence interval and t-test results. The underestimation level depended on ICC (which shows the variance proportion that was caused by the forest stand) and on cluster size (the number of trees sampled from one forest stand). We also showed that using first-order autocorrelation tests, such as the traditional Durbin-Watson statistic, to detect the autocorrelation due to clustered structure could be misleading as the test may show lack of autocorrelation even though ICC is different from zero. In conclusion, when ICC is higher than zero, ignoring the clustered data structure yields over-confident biomass predictions (due to underestimated confidence interval) and/or incorrect research conclusions (due to overestimated evidence against null hypothesis in t-test). Therefore, using a modelling approach that accounts for the hierarchical structure of the data is highly recommended when any form of clustering can be identified, even if the autocorrelation is not significant.http://europepmc.org/articles/PMC6071979?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Ioan Dutcă Petru Tudor Stăncioiu Ioan Vasile Abrudan Florin Ioraș |
spellingShingle |
Ioan Dutcă Petru Tudor Stăncioiu Ioan Vasile Abrudan Florin Ioraș Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. PLoS ONE |
author_facet |
Ioan Dutcă Petru Tudor Stăncioiu Ioan Vasile Abrudan Florin Ioraș |
author_sort |
Ioan Dutcă |
title |
Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. |
title_short |
Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. |
title_full |
Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. |
title_fullStr |
Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. |
title_full_unstemmed |
Using clustered data to develop biomass allometric models: The consequences of ignoring the clustered data structure. |
title_sort |
using clustered data to develop biomass allometric models: the consequences of ignoring the clustered data structure. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2018-01-01 |
description |
This paper investigates the consequences of ignoring the clustered data structure on allometric models. Clustered data, in the form of multiple trees sampled from multiple forest stands is commonly used to develop biomass allometric models. Of 102 reviewed papers published between 2012 and 2016 that reported biomass allometric models, 84 (82%) have used a clustered sampling design. However, in as many as 80% of these, the clustered data structure was ignored, potentially violating the independence assumption in ordinary least squares methods. The consequences of ignoring clustered data structure were empirically validated using two clustered biomass datasets (of 110 and 220 trees, with the cluster size of 5 and 10 trees respectively). We showed that when Intraclass Correlation Coefficient (ICC) was higher than zero, ignoring the clustered data structure returned underestimated standard errors, affecting further the confidence interval and t-test results. The underestimation level depended on ICC (which shows the variance proportion that was caused by the forest stand) and on cluster size (the number of trees sampled from one forest stand). We also showed that using first-order autocorrelation tests, such as the traditional Durbin-Watson statistic, to detect the autocorrelation due to clustered structure could be misleading as the test may show lack of autocorrelation even though ICC is different from zero. In conclusion, when ICC is higher than zero, ignoring the clustered data structure yields over-confident biomass predictions (due to underestimated confidence interval) and/or incorrect research conclusions (due to overestimated evidence against null hypothesis in t-test). Therefore, using a modelling approach that accounts for the hierarchical structure of the data is highly recommended when any form of clustering can be identified, even if the autocorrelation is not significant. |
url |
http://europepmc.org/articles/PMC6071979?pdf=render |
work_keys_str_mv |
AT ioandutca usingclustereddatatodevelopbiomassallometricmodelstheconsequencesofignoringtheclustereddatastructure AT petrutudorstancioiu usingclustereddatatodevelopbiomassallometricmodelstheconsequencesofignoringtheclustereddatastructure AT ioanvasileabrudan usingclustereddatatodevelopbiomassallometricmodelstheconsequencesofignoringtheclustereddatastructure AT florinioras usingclustereddatatodevelopbiomassallometricmodelstheconsequencesofignoringtheclustereddatastructure |
_version_ |
1724909104809377792 |