HCudaBLAST: an implementation of BLAST on Hadoop and Cuda

Abstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing....

Full description

Bibliographic Details
Main Authors: Nilay Khare, Alind Khare, Farhan Khan
Format: Article
Language:English
Published: SpringerOpen 2017-11-01
Series:Journal of Big Data
Subjects:
Online Access:http://link.springer.com/article/10.1186/s40537-017-0102-7
id doaj-c5ebb937e92e48beaf051d8f3f9d9575
record_format Article
spelling doaj-c5ebb937e92e48beaf051d8f3f9d95752020-11-25T00:39:34ZengSpringerOpenJournal of Big Data2196-11152017-11-01411810.1186/s40537-017-0102-7HCudaBLAST: an implementation of BLAST on Hadoop and CudaNilay Khare0Alind Khare1Farhan Khan2Maulana Azad National Institute of TechnologyIIITMaulana Azad National Institute of TechnologyAbstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing. BLAST is a tool given by National Center for Biotechnology Information (NCBI) to compare nucleotide or protein sequences to sequence databases and calculate the statistical significance of matches. Many variants of BLAST such as blastn, blastp, blastx, etc. are used to search for nucleotides, proteins, nucleotides-to-proteins sequences respectively. GPU-BLAST and HBLAST have already been proposed to handle the vast amount of data involved in searching DNA sequencing and they also speedup the searching process. In this article, we propose a new model for searching DNA sequences—HCudaBLAST. It involves CUDA processing and Hadoop combined for efficient searching. The results recorded after implementing HCudaBLAST are shown. This solution combines the multi-core parallelism of GPGPUs and the scalability feature provided by the Hadoop framework.http://link.springer.com/article/10.1186/s40537-017-0102-7DNA SearchingBLASTCUDAHadoop
collection DOAJ
language English
format Article
sources DOAJ
author Nilay Khare
Alind Khare
Farhan Khan
spellingShingle Nilay Khare
Alind Khare
Farhan Khan
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
Journal of Big Data
DNA Searching
BLAST
CUDA
Hadoop
author_facet Nilay Khare
Alind Khare
Farhan Khan
author_sort Nilay Khare
title HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
title_short HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
title_full HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
title_fullStr HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
title_full_unstemmed HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
title_sort hcudablast: an implementation of blast on hadoop and cuda
publisher SpringerOpen
series Journal of Big Data
issn 2196-1115
publishDate 2017-11-01
description Abstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing. BLAST is a tool given by National Center for Biotechnology Information (NCBI) to compare nucleotide or protein sequences to sequence databases and calculate the statistical significance of matches. Many variants of BLAST such as blastn, blastp, blastx, etc. are used to search for nucleotides, proteins, nucleotides-to-proteins sequences respectively. GPU-BLAST and HBLAST have already been proposed to handle the vast amount of data involved in searching DNA sequencing and they also speedup the searching process. In this article, we propose a new model for searching DNA sequences—HCudaBLAST. It involves CUDA processing and Hadoop combined for efficient searching. The results recorded after implementing HCudaBLAST are shown. This solution combines the multi-core parallelism of GPGPUs and the scalability feature provided by the Hadoop framework.
topic DNA Searching
BLAST
CUDA
Hadoop
url http://link.springer.com/article/10.1186/s40537-017-0102-7
work_keys_str_mv AT nilaykhare hcudablastanimplementationofblastonhadoopandcuda
AT alindkhare hcudablastanimplementationofblastonhadoopandcuda
AT farhankhan hcudablastanimplementationofblastonhadoopandcuda
_version_ 1725293683115294720