Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms
Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware us...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Korea Genome Organization
2020-03-01
|
Series: | Genomics & Informatics |
Subjects: | |
Online Access: | http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdf |
id |
doaj-a323f86f8f3a41dca865dee463a61c28 |
---|---|
record_format |
Article |
spelling |
doaj-a323f86f8f3a41dca865dee463a61c282020-11-25T02:40:35ZengKorea Genome OrganizationGenomics & Informatics2234-07422020-03-0118110.5808/GI.2020.18.1.e10598Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithmsKarl R. Franke0Erin L. Crowgey1 Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USA Department of Pediatrics, Nemours Alfred I duPont Hospital for Children, Wilmington, DE 19803, USAAdvancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping.http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdfclinical genomicsgenome analysis toolkitgpusnext generation sequencingvariant detection |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Karl R. Franke Erin L. Crowgey |
spellingShingle |
Karl R. Franke Erin L. Crowgey Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms Genomics & Informatics clinical genomics genome analysis toolkit gpus next generation sequencing variant detection |
author_facet |
Karl R. Franke Erin L. Crowgey |
author_sort |
Karl R. Franke |
title |
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms |
title_short |
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms |
title_full |
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms |
title_fullStr |
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms |
title_full_unstemmed |
Accelerating next generation sequencing data analysis: an evaluation of optimized best practices for Genome Analysis Toolkit algorithms |
title_sort |
accelerating next generation sequencing data analysis: an evaluation of optimized best practices for genome analysis toolkit algorithms |
publisher |
Korea Genome Organization |
series |
Genomics & Informatics |
issn |
2234-0742 |
publishDate |
2020-03-01 |
description |
Advancements in next generation sequencing (NGS) technologies have significantly increased the translational use of genomics data in the medical field as well as the demand for computational infrastructure capable processing that data. To enhance the current understanding of software and hardware used to compute large scale human genomic datasets (NGS), the performance and accuracy of optimized versions of GATK algorithms, including Parabricks and Sentieon, were compared to the results of the original application (GATK V4.1.0, Intel x86 CPUs). Parabricks was able to process a 50× whole-genome sequencing library in under 3 h and Sentieon finished in under 8 h, whereas GATK v4.1.0 needed nearly 24 h. These results were achieved while maintaining greater than 99% accuracy and precision compared to stock GATK. Sentieon’s somatic pipeline achieved similar results greater than 99%. Additionally, the IBM POWER9 CPU performed well on bioinformatic workloads when tested with 10 different tools for alignment/mapping. |
topic |
clinical genomics genome analysis toolkit gpus next generation sequencing variant detection |
url |
http://genominfo.org/upload/pdf/gi-2020-18-1-e10.pdf |
work_keys_str_mv |
AT karlrfranke acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms AT erinlcrowgey acceleratingnextgenerationsequencingdataanalysisanevaluationofoptimizedbestpracticesforgenomeanalysistoolkitalgorithms |
_version_ |
1724780684643401728 |