dtoolAI: Reproducibility for Deep Learning

Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and proce...

Full description

Bibliographic Details
Main Authors: Matthew Hartley, Tjelvar S.G. Olsson
Format: Article
Language:English
Published: Elsevier 2020-08-01
Series:Patterns
Subjects:
AI
Online Access:http://www.sciencedirect.com/science/article/pii/S2666389920300933
id doaj-1757b00476594568963cda57215b00a1
record_format Article
spelling doaj-1757b00476594568963cda57215b00a12020-11-25T04:06:45ZengElsevierPatterns2666-38992020-08-0115100073dtoolAI: Reproducibility for Deep LearningMatthew Hartley0Tjelvar S.G. Olsson1Computational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UK; Corresponding authorComputational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UKSummary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning.http://www.sciencedirect.com/science/article/pii/S2666389920300933datadata managementAIartificial intelligencedeep learningmachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Matthew Hartley
Tjelvar S.G. Olsson
spellingShingle Matthew Hartley
Tjelvar S.G. Olsson
dtoolAI: Reproducibility for Deep Learning
Patterns
data
data management
AI
artificial intelligence
deep learning
machine learning
author_facet Matthew Hartley
Tjelvar S.G. Olsson
author_sort Matthew Hartley
title dtoolAI: Reproducibility for Deep Learning
title_short dtoolAI: Reproducibility for Deep Learning
title_full dtoolAI: Reproducibility for Deep Learning
title_fullStr dtoolAI: Reproducibility for Deep Learning
title_full_unstemmed dtoolAI: Reproducibility for Deep Learning
title_sort dtoolai: reproducibility for deep learning
publisher Elsevier
series Patterns
issn 2666-3899
publishDate 2020-08-01
description Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning.
topic data
data management
AI
artificial intelligence
deep learning
machine learning
url http://www.sciencedirect.com/science/article/pii/S2666389920300933
work_keys_str_mv AT matthewhartley dtoolaireproducibilityfordeeplearning
AT tjelvarsgolsson dtoolaireproducibilityfordeeplearning
_version_ 1724430825133441024