Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water

Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studie...

Full description

Bibliographic Details
Main Authors:	Daniel L. Weller, Tanzy M. T. Love, Martin Wiedmann
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2021-05-01
Series:	Frontiers in Artificial Intelligence
Subjects:	E. coli machine learning predictive model food safety water quality
Online Access:	https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full

id	doaj-2b686710169a45d28511dabe1839814b
record_format	Article
spelling	doaj-2b686710169a45d28511dabe1839814b2021-05-14T08:21:03ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122021-05-01410.3389/frai.2021.628441628441Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural WaterDaniel L. Weller0Daniel L. Weller1Daniel L. Weller2Tanzy M. T. Love3Martin Wiedmann4Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesCurrent Affiliation, Department of Environmental and Forest Biology, SUNY College of Environmental Science and Forestry, Syracuse, NY, United StatesDepartment of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesSince E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present.https://www.frontiersin.org/articles/10.3389/frai.2021.628441/fullE. colimachine learningpredictive modelfood safetywater quality
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann
spellingShingle	Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water Frontiers in Artificial Intelligence E. coli machine learning predictive model food safety water quality
author_facet	Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann
author_sort	Daniel L. Weller
title	Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_short	Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_full	Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_fullStr	Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_full_unstemmed	Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_sort	interpretability versus accuracy: a comparison of machine learning models built using different algorithms, performance measures, and features to predict e. coli levels in agricultural water
publisher	Frontiers Media S.A.
series	Frontiers in Artificial Intelligence
issn	2624-8212
publishDate	2021-05-01
description	Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present.
topic	E. coli machine learning predictive model food safety water quality
url	https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full
work_keys_str_mv	AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT tanzymtlove interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT martinwiedmann interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
_version_	1721441150768775168

Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water

Similar Items