dtoolAI: Reproducibility for Deep Learning

Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and proce...

Full description

Bibliographic Details
Main Authors:	Matthew Hartley, Tjelvar S.G. Olsson
Format:	Article
Language:	English
Published:	Elsevier 2020-08-01
Series:	Patterns
Subjects:	data data management AI artificial intelligence deep learning machine learning
Online Access:	http://www.sciencedirect.com/science/article/pii/S2666389920300933

id	doaj-1757b00476594568963cda57215b00a1
record_format	Article
spelling	doaj-1757b00476594568963cda57215b00a12020-11-25T04:06:45ZengElsevierPatterns2666-38992020-08-0115100073dtoolAI: Reproducibility for Deep LearningMatthew Hartley0Tjelvar S.G. Olsson1Computational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UK; Corresponding authorComputational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UKSummary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning.http://www.sciencedirect.com/science/article/pii/S2666389920300933datadata managementAIartificial intelligencedeep learningmachine learning
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Matthew Hartley Tjelvar S.G. Olsson
spellingShingle	Matthew Hartley Tjelvar S.G. Olsson dtoolAI: Reproducibility for Deep Learning Patterns data data management AI artificial intelligence deep learning machine learning
author_facet	Matthew Hartley Tjelvar S.G. Olsson
author_sort	Matthew Hartley
title	dtoolAI: Reproducibility for Deep Learning
title_short	dtoolAI: Reproducibility for Deep Learning
title_full	dtoolAI: Reproducibility for Deep Learning
title_fullStr	dtoolAI: Reproducibility for Deep Learning
title_full_unstemmed	dtoolAI: Reproducibility for Deep Learning
title_sort	dtoolai: reproducibility for deep learning
publisher	Elsevier
series	Patterns
issn	2666-3899
publishDate	2020-08-01
description	Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning.
topic	data data management AI artificial intelligence deep learning machine learning
url	http://www.sciencedirect.com/science/article/pii/S2666389920300933
work_keys_str_mv	AT matthewhartley dtoolaireproducibilityfordeeplearning AT tjelvarsgolsson dtoolaireproducibilityfordeeplearning
_version_	1724430825133441024

dtoolAI: Reproducibility for Deep Learning

Similar Items