Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]

Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open...

Full description

Bibliographic Details
Main Authors: Maxime Garcia, Szilveszter Juhos, Malin Larsson, Pall I. Olason, Marcel Martin, Jesper Eisfeldt, Sebastian DiLorenzo, Johanna Sandgren, Teresita Díaz De Ståhl, Philip Ewels, Valtteri Wirta, Monica Nistér, Max Käller, Björn Nystedt
Format: Article
Language:English
Published: F1000 Research Ltd 2020-01-01
Series:F1000Research
Online Access:https://f1000research.com/articles/9-63/v1
id doaj-73960a360a4448db9833dbe29df58531
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author Maxime Garcia
Szilveszter Juhos
Malin Larsson
Pall I. Olason
Marcel Martin
Jesper Eisfeldt
Sebastian DiLorenzo
Johanna Sandgren
Teresita Díaz De Ståhl
Philip Ewels
Valtteri Wirta
Monica Nistér
Max Käller
Björn Nystedt
spellingShingle Maxime Garcia
Szilveszter Juhos
Malin Larsson
Pall I. Olason
Marcel Martin
Jesper Eisfeldt
Sebastian DiLorenzo
Johanna Sandgren
Teresita Díaz De Ståhl
Philip Ewels
Valtteri Wirta
Monica Nistér
Max Käller
Björn Nystedt
Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
F1000Research
author_facet Maxime Garcia
Szilveszter Juhos
Malin Larsson
Pall I. Olason
Marcel Martin
Jesper Eisfeldt
Sebastian DiLorenzo
Johanna Sandgren
Teresita Díaz De Ståhl
Philip Ewels
Valtteri Wirta
Monica Nistér
Max Käller
Björn Nystedt
author_sort Maxime Garcia
title Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
title_short Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
title_full Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
title_fullStr Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
title_full_unstemmed Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
title_sort sarek: a portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]
publisher F1000 Research Ltd
series F1000Research
issn 2046-1402
publishDate 2020-01-01
description Whole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.
url https://f1000research.com/articles/9-63/v1
work_keys_str_mv AT maximegarcia sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT szilveszterjuhos sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT malinlarsson sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT palliolason sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT marcelmartin sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT jespereisfeldt sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT sebastiandilorenzo sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT johannasandgren sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT teresitadiazdestahl sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT philipewels sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT valtteriwirta sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT monicanister sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT maxkaller sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
AT bjornnystedt sarekaportableworkflowforwholegenomesequencinganalysisofgermlineandsomaticvariantsversion1peerreview2approved
_version_ 1724468462046150656
spelling doaj-73960a360a4448db9833dbe29df585312020-11-25T03:55:44ZengF1000 Research LtdF1000Research2046-14022020-01-01910.12688/f1000research.16665.118214Sarek: A portable workflow for whole-genome sequencing analysis of germline and somatic variants [version 1; peer review: 2 approved]Maxime Garcia0Szilveszter Juhos1Malin Larsson2Pall I. Olason3Marcel Martin4Jesper Eisfeldt5Sebastian DiLorenzo6Johanna Sandgren7Teresita Díaz De Ståhl8Philip Ewels9Valtteri Wirta10Monica Nistér11Max Käller12Björn Nystedt13Department of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, SwedenDepartment of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, SwedenDepartment of Physics, Chemistry and Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Linköping University, Linköping, 58183, SwedenDepartment of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, SwedenDepartment of Biochemistry and Biophysics, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Stockholm University, Box 1031, Solna, 17121, SwedenClinical Genetics, Department of Molecular Medicine and Surgery, Karolinska Institutet, MMK L1:00, Karolinska University Hospital at Solna, Stockholm, 171 76, SwedenDepartment of Medical Sciences, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, SwedenDepartment of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, SwedenDepartment of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, SwedenDepartment of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Box 1031, Solna, 17121, SwedenDepartment of Microbiology, Tumor and Cell Biology, Clinical Genomics Facility, Science for Life Laboratory, Karolinska Institutet, Box 1031, Solna, 171 21, SwedenDepartment of Oncology-Pathology, Karolinska Institutet, J5:30 BioClinicum, Visionsgatan 4, Karolinska University Hospital at Solna, Solna, 17164, SwedenSchool of Engineering Sciences in Chemistry, Biotechnology and Health, Science for Life Laboratory, KTH Royal Institute of Technology, Box 1031, Solna, 17121, SwedenDepartment of Cell and Molecular Biology, National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, Uppsala University, Husargatan 3, Uppsala, 752 37, SwedenWhole-genome sequencing (WGS) is a fundamental technology for research to advance precision medicine, but the limited availability of portable and user-friendly workflows for WGS analyses poses a major challenge for many research groups and hampers scientific progress. Here we present Sarek, an open-source workflow to detect germline variants and somatic mutations based on sequencing data from WGS, whole-exome sequencing (WES), or gene panels. Sarek features (i) easy installation, (ii) robust portability across different computer environments, (iii) comprehensive documentation, (iv) transparent and easy-to-read code, and (v) extensive quality metrics reporting. Sarek is implemented in the Nextflow workflow language and supports both Docker and Singularity containers as well as Conda environments, making it ideal for easy deployment on any POSIX-compatible computers and cloud compute environments. Sarek follows the GATK best-practice recommendations for read alignment and pre-processing, and includes a wide range of software for the identification and annotation of germline and somatic single-nucleotide variants, insertion and deletion variants, structural variants, tumour sample purity, and variations in ploidy and copy number. Sarek offers easy, efficient, and reproducible WGS analyses, and can readily be used both as a production workflow at sequencing facilities and as a powerful stand-alone tool for individual research groups. The Sarek source code, documentation and installation instructions are freely available at https://github.com/nf-core/sarek and at https://nf-co.re/sarek/.https://f1000research.com/articles/9-63/v1