The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]

As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platf...

Full description

Bibliographic Details
Main Authors: Brian D. O'Connor, Denis Yuen, Vincent Chung, Andrew G. Duncan, Xiang Kun Liu, Janice Patricia, Benedict Paten, Lincoln Stein, Vincent Ferretti
Format: Article
Language:English
Published: F1000 Research Ltd 2017-01-01
Series:F1000Research
Subjects:
Online Access:https://f1000research.com/articles/6-52/v1
id doaj-dde76a7db51b43209a4b37b03553209b
record_format Article
spelling doaj-dde76a7db51b43209a4b37b03553209b2020-11-25T04:04:04ZengF1000 Research LtdF1000Research2046-14022017-01-01610.12688/f1000research.10137.110919The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]Brian D. O'Connor0Denis Yuen1Vincent Chung2Andrew G. Duncan3Xiang Kun Liu4Janice Patricia5Benedict Paten6Lincoln Stein7Vincent Ferretti8UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USAOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaUC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USAOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaAs genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore (https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).https://f1000research.com/articles/6-52/v1Bioinformatics
collection DOAJ
language English
format Article
sources DOAJ
author Brian D. O'Connor
Denis Yuen
Vincent Chung
Andrew G. Duncan
Xiang Kun Liu
Janice Patricia
Benedict Paten
Lincoln Stein
Vincent Ferretti
spellingShingle Brian D. O'Connor
Denis Yuen
Vincent Chung
Andrew G. Duncan
Xiang Kun Liu
Janice Patricia
Benedict Paten
Lincoln Stein
Vincent Ferretti
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
F1000Research
Bioinformatics
author_facet Brian D. O'Connor
Denis Yuen
Vincent Chung
Andrew G. Duncan
Xiang Kun Liu
Janice Patricia
Benedict Paten
Lincoln Stein
Vincent Ferretti
author_sort Brian D. O'Connor
title The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
title_short The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
title_full The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
title_fullStr The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
title_full_unstemmed The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
title_sort dockstore: enabling modular, community-focused sharing of docker-based genomics tools and workflows [version 1; referees: 2 approved]
publisher F1000 Research Ltd
series F1000Research
issn 2046-1402
publishDate 2017-01-01
description As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore (https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).
topic Bioinformatics
url https://f1000research.com/articles/6-52/v1
work_keys_str_mv AT briandoconnor thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT denisyuen thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT vincentchung thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT andrewgduncan thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT xiangkunliu thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT janicepatricia thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT benedictpaten thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT lincolnstein thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT vincentferretti thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT briandoconnor dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT denisyuen dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT vincentchung dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT andrewgduncan dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT xiangkunliu dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT janicepatricia dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT benedictpaten dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT lincolnstein dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
AT vincentferretti dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved
_version_ 1724437948972138496