dtoolAI: Reproducibility for Deep Learning
Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and proce...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2020-08-01
|
Series: | Patterns |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S2666389920300933 |
id |
doaj-1757b00476594568963cda57215b00a1 |
---|---|
record_format |
Article |
spelling |
doaj-1757b00476594568963cda57215b00a12020-11-25T04:06:45ZengElsevierPatterns2666-38992020-08-0115100073dtoolAI: Reproducibility for Deep LearningMatthew Hartley0Tjelvar S.G. Olsson1Computational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UK; Corresponding authorComputational Systems Biology, John Innes Centre, Norwich, Norfolk NR4 7UH, UKSummary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning.http://www.sciencedirect.com/science/article/pii/S2666389920300933datadata managementAIartificial intelligencedeep learningmachine learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Matthew Hartley Tjelvar S.G. Olsson |
spellingShingle |
Matthew Hartley Tjelvar S.G. Olsson dtoolAI: Reproducibility for Deep Learning Patterns data data management AI artificial intelligence deep learning machine learning |
author_facet |
Matthew Hartley Tjelvar S.G. Olsson |
author_sort |
Matthew Hartley |
title |
dtoolAI: Reproducibility for Deep Learning |
title_short |
dtoolAI: Reproducibility for Deep Learning |
title_full |
dtoolAI: Reproducibility for Deep Learning |
title_fullStr |
dtoolAI: Reproducibility for Deep Learning |
title_full_unstemmed |
dtoolAI: Reproducibility for Deep Learning |
title_sort |
dtoolai: reproducibility for deep learning |
publisher |
Elsevier |
series |
Patterns |
issn |
2666-3899 |
publishDate |
2020-08-01 |
description |
Summary: Deep learning, a set of approaches using artificial neural networks, has generated rapid recent advancements in machine learning. Deep learning does, however, have the potential to reduce the reproducibility of scientific results. Model outputs are critically dependent on the data and processing approach used to initially generate the model, but this provenance information is usually lost during model training. To avoid a future reproducibility crisis, we need to improve our deep-learning model management. The FAIR principles for data stewardship and software/workflow implementation give excellent high-level guidance on ensuring effective reuse of data and software. We suggest some specific guidelines for the generation and use of deep-learning models in science and explain how these relate to the FAIR principles. We then present dtoolAI, a Python package that we have developed to implement these guidelines. The package implements automatic capture of provenance information during model training and simplifies model distribution. The Bigger Picture: Science has made use of machine learning, a way of teaching computers to understand patterns in data, for a long time. Deep learning, based on the way that real brains process data, has brought enormous improvements in the speed and accuracy of image and language processing over the last few years. However, the “black box” nature of deep-learning models makes scientific analyses that make use of them difficult to reproduce.In this work, we show how we might be able to improve long-term reproducibility for data analyses that rely on deep-learning models. We do this by giving guidance on how specific aspects of the FAIR principles for data management can be applied to training and using these models. We also present dtoolAI, a software tool and code library we have developed. We hope that in the future, adoption of our guidelines or similar principles will improve our collective trust in results that arise from deep learning. |
topic |
data data management AI artificial intelligence deep learning machine learning |
url |
http://www.sciencedirect.com/science/article/pii/S2666389920300933 |
work_keys_str_mv |
AT matthewhartley dtoolaireproducibilityfordeeplearning AT tjelvarsgolsson dtoolaireproducibilityfordeeplearning |
_version_ |
1724430825133441024 |