BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs

Abstract Background The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. Results We propose BLAMM, a simple and efficient tool...

Full description

Bibliographic Details
Main Author: Jan Fostier
Format: Article
Language:English
Published: BMC 2020-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-020-3348-6
id doaj-75eedc7fcddc4ad88dd5a15791b7f380
record_format Article
spelling doaj-75eedc7fcddc4ad88dd5a15791b7f3802020-11-25T02:56:33ZengBMCBMC Bioinformatics1471-21052020-03-0121S211310.1186/s12859-020-3348-6BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUsJan Fostier0Department of Information Technology - IDLab, Ghent University - imecAbstract Background The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. Results We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10−4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. Conclusions BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.http://link.springer.com/article/10.1186/s12859-020-3348-6Position weight matrix (PWM)High performance computing (HPC)Basic linear algebra subprograms (BLAS)Graphics processing units (GPUs)
collection DOAJ
language English
format Article
sources DOAJ
author Jan Fostier
spellingShingle Jan Fostier
BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
BMC Bioinformatics
Position weight matrix (PWM)
High performance computing (HPC)
Basic linear algebra subprograms (BLAS)
Graphics processing units (GPUs)
author_facet Jan Fostier
author_sort Jan Fostier
title BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
title_short BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
title_full BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
title_fullStr BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
title_full_unstemmed BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs
title_sort blamm: blas-based algorithm for finding position weight matrix occurrences in dna sequences on cpus and gpus
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2020-03-01
description Abstract Background The identification of all matches of a large set of position weight matrices (PWMs) in long DNA sequences requires significant computational resources for which a number of efficient yet complex algorithms have been proposed. Results We propose BLAMM, a simple and efficient tool inspired by high performance computing techniques. The workload is expressed in terms of matrix-matrix products that are evaluated with high efficiency using optimized BLAS library implementations. The algorithm is easy to parallelize and implement on CPUs and GPUs and has a runtime that is independent of the selected p-value. In terms of single-core performance, it is competitive with state-of-the-art software for PWM matching while being much more efficient when using multithreading. Additionally, BLAMM requires negligible memory. For example, both strands of the entire human genome can be scanned for 1404 PWMs in the JASPAR database in 13 min with a p-value of 10−4 using a 36-core machine. On a dual GPU system, the same task can be performed in under 5 min. Conclusions BLAMM is an efficient tool for identifying PWM matches in large DNA sequences. Its C++ source code is available under the GNU General Public License Version 3 at https://github.com/biointec/blamm.
topic Position weight matrix (PWM)
High performance computing (HPC)
Basic linear algebra subprograms (BLAS)
Graphics processing units (GPUs)
url http://link.springer.com/article/10.1186/s12859-020-3348-6
work_keys_str_mv AT janfostier blammblasbasedalgorithmforfindingpositionweightmatrixoccurrencesindnasequencesoncpusandgpus
_version_ 1724713523356893184