The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]
As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platf...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
F1000 Research Ltd
2017-01-01
|
Series: | F1000Research |
Subjects: | |
Online Access: | https://f1000research.com/articles/6-52/v1 |
id |
doaj-dde76a7db51b43209a4b37b03553209b |
---|---|
record_format |
Article |
spelling |
doaj-dde76a7db51b43209a4b37b03553209b2020-11-25T04:04:04ZengF1000 Research LtdF1000Research2046-14022017-01-01610.12688/f1000research.10137.110919The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved]Brian D. O'Connor0Denis Yuen1Vincent Chung2Andrew G. Duncan3Xiang Kun Liu4Janice Patricia5Benedict Paten6Lincoln Stein7Vincent Ferretti8UC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USAOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaUC Santa Cruz Genomics Institute, University of California, Santa Cruz, CA, USAOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaOntario Institute for Cancer Research, MaRS Centre, Toronto, CanadaAs genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore (https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH).https://f1000research.com/articles/6-52/v1Bioinformatics |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Brian D. O'Connor Denis Yuen Vincent Chung Andrew G. Duncan Xiang Kun Liu Janice Patricia Benedict Paten Lincoln Stein Vincent Ferretti |
spellingShingle |
Brian D. O'Connor Denis Yuen Vincent Chung Andrew G. Duncan Xiang Kun Liu Janice Patricia Benedict Paten Lincoln Stein Vincent Ferretti The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] F1000Research Bioinformatics |
author_facet |
Brian D. O'Connor Denis Yuen Vincent Chung Andrew G. Duncan Xiang Kun Liu Janice Patricia Benedict Paten Lincoln Stein Vincent Ferretti |
author_sort |
Brian D. O'Connor |
title |
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] |
title_short |
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] |
title_full |
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] |
title_fullStr |
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] |
title_full_unstemmed |
The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows [version 1; referees: 2 approved] |
title_sort |
dockstore: enabling modular, community-focused sharing of docker-based genomics tools and workflows [version 1; referees: 2 approved] |
publisher |
F1000 Research Ltd |
series |
F1000Research |
issn |
2046-1402 |
publishDate |
2017-01-01 |
description |
As genomic datasets continue to grow, the feasibility of downloading data to a local organization and running analysis on a traditional compute environment is becoming increasingly problematic. Current large-scale projects, such as the ICGC PanCancer Analysis of Whole Genomes (PCAWG), the Data Platform for the U.S. Precision Medicine Initiative, and the NIH Big Data to Knowledge Center for Translational Genomics, are using cloud-based infrastructure to both host and perform analysis across large data sets. In PCAWG, over 5,800 whole human genomes were aligned and variant called across 14 cloud and HPC environments; the processed data was then made available on the cloud for further analysis and sharing. If run locally, an operation at this scale would have monopolized a typical academic data centre for many months, and would have presented major challenges for data storage and distribution. However, this scale is increasingly typical for genomics projects and necessitates a rethink of how analytical tools are packaged and moved to the data. For PCAWG, we embraced the use of highly portable Docker images for encapsulating and sharing complex alignment and variant calling workflows across highly variable environments. While successful, this endeavor revealed a limitation in Docker containers, namely the lack of a standardized way to describe and execute the tools encapsulated inside the container. As a result, we created the Dockstore (https://dockstore.org), a project that brings together Docker images with standardized, machine-readable ways of describing and running the tools contained within. This service greatly improves the sharing and reuse of genomics tools and promotes interoperability with similar projects through emerging web service standards developed by the Global Alliance for Genomics and Health (GA4GH). |
topic |
Bioinformatics |
url |
https://f1000research.com/articles/6-52/v1 |
work_keys_str_mv |
AT briandoconnor thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT denisyuen thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT vincentchung thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT andrewgduncan thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT xiangkunliu thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT janicepatricia thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT benedictpaten thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT lincolnstein thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT vincentferretti thedockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT briandoconnor dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT denisyuen dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT vincentchung dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT andrewgduncan dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT xiangkunliu dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT janicepatricia dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT benedictpaten dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT lincolnstein dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved AT vincentferretti dockstoreenablingmodularcommunityfocusedsharingofdockerbasedgenomicstoolsandworkflowsversion1referees2approved |
_version_ |
1724437948972138496 |