Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.

Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, t...

Full description

Bibliographic Details
Main Authors:	Denis A Shah, Erick D De Wolf, Pierce A Paul, Laurence V Madden
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2021-03-01
Series:	PLoS Computational Biology
Online Access:	https://doi.org/10.1371/journal.pcbi.1008831

id	doaj-16b25fceac14438794d7433fffaa3106
record_format	Article
spelling	doaj-16b25fceac14438794d7433fffaa31062021-08-01T04:30:59ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-03-01173e100883110.1371/journal.pcbi.1008831Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.Denis A ShahErick D De WolfPierce A PaulLaurence V MaddenEnsembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.https://doi.org/10.1371/journal.pcbi.1008831
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Denis A Shah Erick D De Wolf Pierce A Paul Laurence V Madden
spellingShingle	Denis A Shah Erick D De Wolf Pierce A Paul Laurence V Madden Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models. PLoS Computational Biology
author_facet	Denis A Shah Erick D De Wolf Pierce A Paul Laurence V Madden
author_sort	Denis A Shah
title	Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
title_short	Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
title_full	Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
title_fullStr	Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
title_full_unstemmed	Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
title_sort	accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.
publisher	Public Library of Science (PLoS)
series	PLoS Computational Biology
issn	1553-734X 1553-7358
publishDate	2021-03-01
description	Ensembling combines the predictions made by individual component base models with the goal of achieving a predictive accuracy that is better than that of any one of the constituent member models. Diversity among the base models in terms of predictions is a crucial criterion in ensembling. However, there are practical instances when the available base models produce highly correlated predictions, because they may have been developed within the same research group or may have been built from the same underlying algorithm. We investigated, via a case study on Fusarium head blight (FHB) on wheat in the U.S., whether ensembles of simple yet highly correlated models for predicting the risk of FHB epidemics, all generated from logistic regression, provided any benefit to predictive performance, despite relatively low levels of base model diversity. Three ensembling methods were explored: soft voting, weighted averaging of smaller subsets of the base models, and penalized regression as a stacking algorithm. Soft voting and weighted model averages were generally better at classification than the base models, though not universally so. The performances of stacked regressions were superior to those of the other two ensembling methods we analyzed in this study. Ensembling simple yet correlated models is computationally feasible and is therefore worth pursuing for models of epidemic risk.
url	https://doi.org/10.1371/journal.pcbi.1008831
work_keys_str_mv	AT denisashah accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels AT erickddewolf accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels AT pierceapaul accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels AT laurencevmadden accuracyinthepredictionofdiseaseepidemicswhenensemblingsimplebuthighlycorrelatedmodels
_version_	1721246380003950592

Accuracy in the prediction of disease epidemics when ensembling simple but highly correlated models.

Similar Items