Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water
Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studie...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-05-01
|
Series: | Frontiers in Artificial Intelligence |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full |
id |
doaj-2b686710169a45d28511dabe1839814b |
---|---|
record_format |
Article |
spelling |
doaj-2b686710169a45d28511dabe1839814b2021-05-14T08:21:03ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122021-05-01410.3389/frai.2021.628441628441Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural WaterDaniel L. Weller0Daniel L. Weller1Daniel L. Weller2Tanzy M. T. Love3Martin Wiedmann4Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesCurrent Affiliation, Department of Environmental and Forest Biology, SUNY College of Environmental Science and Forestry, Syracuse, NY, United StatesDepartment of Biostatistics and Computational Biology, University of Rochester, Rochester, NY, United StatesDepartment of Food Science, Cornell University, Ithaca, NY, United StatesSince E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present.https://www.frontiersin.org/articles/10.3389/frai.2021.628441/fullE. colimachine learningpredictive modelfood safetywater quality |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann |
spellingShingle |
Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water Frontiers in Artificial Intelligence E. coli machine learning predictive model food safety water quality |
author_facet |
Daniel L. Weller Daniel L. Weller Daniel L. Weller Tanzy M. T. Love Martin Wiedmann |
author_sort |
Daniel L. Weller |
title |
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water |
title_short |
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water |
title_full |
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water |
title_fullStr |
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water |
title_full_unstemmed |
Interpretability Versus Accuracy: A Comparison of Machine Learning Models Built Using Different Algorithms, Performance Measures, and Features to Predict E. coli Levels in Agricultural Water |
title_sort |
interpretability versus accuracy: a comparison of machine learning models built using different algorithms, performance measures, and features to predict e. coli levels in agricultural water |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Artificial Intelligence |
issn |
2624-8212 |
publishDate |
2021-05-01 |
description |
Since E. coli is considered a fecal indicator in surface water, government water quality standards and industry guidance often rely on E. coli monitoring to identify when there is an increased risk of pathogen contamination of water used for produce production (e.g., for irrigation). However, studies have indicated that E. coli testing can present an economic burden to growers and that time lags between sampling and obtaining results may reduce the utility of these data. Models that predict E. coli levels in agricultural water may provide a mechanism for overcoming these obstacles. Thus, this proof-of-concept study uses previously published datasets to train, test, and compare E. coli predictive models using multiple algorithms and performance measures. Since the collection of different feature data carries specific costs for growers, predictive performance was compared for models built using different feature types [geospatial, water quality, stream traits, and/or weather features]. Model performance was assessed against baseline regression models. Model performance varied considerably with root-mean-squared errors and Kendall’s Tau ranging between 0.37 and 1.03, and 0.07 and 0.55, respectively. Overall, models that included turbidity, rain, and temperature outperformed all other models regardless of the algorithm used. Turbidity and weather factors were also found to drive model accuracy even when other feature types were included in the model. These findings confirm previous conclusions that machine learning models may be useful for predicting when, where, and at what level E. coli (and associated hazards) are likely to be present in preharvest agricultural water sources. This study also identifies specific algorithm-predictor combinations that should be the foci of future efforts to develop deployable models (i.e., models that can be used to guide on-farm decision-making and risk mitigation). When deploying E. coli predictive models in the field, it is important to note that past research indicates an inconsistent relationship between E. coli levels and foodborne pathogen presence. Thus, models that predict E. coli levels in agricultural water may be useful for assessing fecal contamination status and ensuring compliance with regulations but should not be used to assess the risk that specific pathogens of concern (e.g., Salmonella, Listeria) are present. |
topic |
E. coli machine learning predictive model food safety water quality |
url |
https://www.frontiersin.org/articles/10.3389/frai.2021.628441/full |
work_keys_str_mv |
AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT daniellweller interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT tanzymtlove interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater AT martinwiedmann interpretabilityversusaccuracyacomparisonofmachinelearningmodelsbuiltusingdifferentalgorithmsperformancemeasuresandfeaturestopredictecolilevelsinagriculturalwater |
_version_ |
1721441150768775168 |