Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water

Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studie...

Full description

Bibliographic Details
Main Authors: Daniel L. Weller, Tanzy M. T. Love, Martin Wiedmann
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-05-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full
id doaj-2b686710169a45d28511dabe1839814b
record_format Article
spelling doaj-2b686710169a45d28511dabe1839814b2021-05-14T08:21:03ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122021-05-01410.3389/frai.2021.628441628441Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural WaterDaniel L. Weller0Daniel L. Weller1Daniel L. Weller2Tanzy M. T. Love3Martin Wiedmann4Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesCurrent Affiliation, Department of Environmental and Forest Biology, SUNY College of Environmental Science and Forestry, Syracuse, NY, United StatesDepartment of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesSince E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present.https://www.frontiersin.org/articles/10.3389/frai.2021.628441/fullE. colimachine learningpredictive modelfood safetywater quality
collection DOAJ
language English
format Article
sources DOAJ
author Daniel L. Weller
Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Martin Wiedmann
spellingShingle Daniel L. Weller
Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Martin Wiedmann
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
Frontiers in Artificial Intelligence
E. coli
machine learning
predictive model
food safety
water quality
author_facet Daniel L. Weller
Daniel L. Weller
Daniel L. Weller
Tanzy M. T. Love
Martin Wiedmann
author_sort Daniel L. Weller
title Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_short Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_full Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_fullStr Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_full_unstemmed Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
title_sort interpretability versus accuracy: a comparison of machine learning models built using different algorithms, performance measures, and features to predict e. coli levels in agricultural water
publisher Frontiers Media S.A.
series Frontiers in Artificial Intelligence
issn 2624-8212
publishDate 2021-05-01
description Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present.
topic E. coli
machine learning
predictive model
food safety
water quality
url https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full
work_keys_str_mv AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
AT tanzymtlove interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
AT martinwiedmann interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater
_version_ 1721441150768775168