HCudaBLAST: an implementation of BLAST on Hadoop and Cuda
Abstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing....
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
SpringerOpen
2017-11-01
|
Series: | Journal of Big Data |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s40537-017-0102-7 |
id |
doaj-c5ebb937e92e48beaf051d8f3f9d9575 |
---|---|
record_format |
Article |
spelling |
doaj-c5ebb937e92e48beaf051d8f3f9d95752020-11-25T00:39:34ZengSpringerOpenJournal of Big Data2196-11152017-11-01411810.1186/s40537-017-0102-7HCudaBLAST: an implementation of BLAST on Hadoop and CudaNilay Khare0Alind Khare1Farhan Khan2Maulana Azad National Institute of TechnologyIIITMaulana Azad National Institute of TechnologyAbstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing. BLAST is a tool given by National Center for Biotechnology Information (NCBI) to compare nucleotide or protein sequences to sequence databases and calculate the statistical significance of matches. Many variants of BLAST such as blastn, blastp, blastx, etc. are used to search for nucleotides, proteins, nucleotides-to-proteins sequences respectively. GPU-BLAST and HBLAST have already been proposed to handle the vast amount of data involved in searching DNA sequencing and they also speedup the searching process. In this article, we propose a new model for searching DNA sequences—HCudaBLAST. It involves CUDA processing and Hadoop combined for efficient searching. The results recorded after implementing HCudaBLAST are shown. This solution combines the multi-core parallelism of GPGPUs and the scalability feature provided by the Hadoop framework.http://link.springer.com/article/10.1186/s40537-017-0102-7DNA SearchingBLASTCUDAHadoop |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nilay Khare Alind Khare Farhan Khan |
spellingShingle |
Nilay Khare Alind Khare Farhan Khan HCudaBLAST: an implementation of BLAST on Hadoop and Cuda Journal of Big Data DNA Searching BLAST CUDA Hadoop |
author_facet |
Nilay Khare Alind Khare Farhan Khan |
author_sort |
Nilay Khare |
title |
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda |
title_short |
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda |
title_full |
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda |
title_fullStr |
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda |
title_full_unstemmed |
HCudaBLAST: an implementation of BLAST on Hadoop and Cuda |
title_sort |
hcudablast: an implementation of blast on hadoop and cuda |
publisher |
SpringerOpen |
series |
Journal of Big Data |
issn |
2196-1115 |
publishDate |
2017-11-01 |
description |
Abstract The world of DNA sequencing has not only been a difficult field since it was first worked upon, but it is also growing at an exponential rate. The amount of data involved in DNA searching is huge, thereby normal tools or algorithms are not suitable to handle this degree of data processing. BLAST is a tool given by National Center for Biotechnology Information (NCBI) to compare nucleotide or protein sequences to sequence databases and calculate the statistical significance of matches. Many variants of BLAST such as blastn, blastp, blastx, etc. are used to search for nucleotides, proteins, nucleotides-to-proteins sequences respectively. GPU-BLAST and HBLAST have already been proposed to handle the vast amount of data involved in searching DNA sequencing and they also speedup the searching process. In this article, we propose a new model for searching DNA sequences—HCudaBLAST. It involves CUDA processing and Hadoop combined for efficient searching. The results recorded after implementing HCudaBLAST are shown. This solution combines the multi-core parallelism of GPGPUs and the scalability feature provided by the Hadoop framework. |
topic |
DNA Searching BLAST CUDA Hadoop |
url |
http://link.springer.com/article/10.1186/s40537-017-0102-7 |
work_keys_str_mv |
AT nilaykhare hcudablastanimplementationofblastonhadoopandcuda AT alindkhare hcudablastanimplementationofblastonhadoopandcuda AT farhankhan hcudablastanimplementationofblastonhadoopandcuda |
_version_ |
1725293683115294720 |