Modified HuffBit Compress Algorithm – An Application of R

The databases of genomic sequences are growing at an explicative rate because of the increasing growth of living organisms. Compressing deoxyribonucleic acid (DNA) sequences is a momentous task as the databases are getting closest to its threshold. Various compression algorithms are developed for DN...

Full description

Bibliographic Details
Main Authors:	Habib Nahida, Ahmed Kawsar, Jabin Iffat, Rahman Mohammad Motiur
Format:	Article
Language:	English
Published:	De Gruyter 2018-02-01
Series:	Journal of Integrative Bioinformatics
Subjects:	compression and decompression compression ratio extended binary tree huffbit compress 2-bits encoding method
Online Access:	https://doi.org/10.1515/jib-2017-0057

id	doaj-0f5e35edb42343a89cb67bf678c6182e
record_format	Article
spelling	doaj-0f5e35edb42343a89cb67bf678c6182e2021-09-06T19:40:32ZengDe GruyterJournal of Integrative Bioinformatics1613-45162018-02-011530975888710.1515/jib-2017-0057jib-2017-0057Modified HuffBit Compress Algorithm – An Application of RHabib Nahida0Ahmed Kawsar1Jabin Iffat2Rahman Mohammad Motiur3Department of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Santosh, Tangail 1902, BangladeshDepartment of Information and Communication Technology (ICT), Mawlana Bhashani Science and Technology University (MBSTU), Tangail, BangladeshDepartment of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Tangail, BangladeshDepartment of Computer Science and Engineering (CSE), Mawlana Bhashani Science and Technology University (MBSTU), Tangail, BangladeshThe databases of genomic sequences are growing at an explicative rate because of the increasing growth of living organisms. Compressing deoxyribonucleic acid (DNA) sequences is a momentous task as the databases are getting closest to its threshold. Various compression algorithms are developed for DNA sequence compression. An efficient DNA compression algorithm that works on both repetitive and non-repetitive sequences known as “HuffBit Compress” is based on the concept of Extended Binary Tree. In this paper, here is proposed and developed a modified version of “HuffBit Compress” algorithm to compress and decompress DNA sequences using the R language which will always give the Best Case of the compression ratio but it uses extra 6 bits to compress than best case of “HuffBit Compress” algorithm and can be named as the “Modified HuffBit Compress Algorithm”. The algorithm makes an extended binary tree based on the Huffman Codes and the maximum occurring bases (A, C, G, T). Experimenting with 6 sequences the proposed algorithm gives approximately 16.18 % improvement in compression ration over the “HuffBit Compress” algorithm and 11.12 % improvement in compression ration over the “2-Bits Encoding Method”.https://doi.org/10.1515/jib-2017-0057compression and decompressioncompression ratioextended binary treehuffbit compress2-bits encoding method
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Habib Nahida Ahmed Kawsar Jabin Iffat Rahman Mohammad Motiur
spellingShingle	Habib Nahida Ahmed Kawsar Jabin Iffat Rahman Mohammad Motiur Modified HuffBit Compress Algorithm – An Application of R Journal of Integrative Bioinformatics compression and decompression compression ratio extended binary tree huffbit compress 2-bits encoding method
author_facet	Habib Nahida Ahmed Kawsar Jabin Iffat Rahman Mohammad Motiur
author_sort	Habib Nahida
title	Modified HuffBit Compress Algorithm – An Application of R
title_short	Modified HuffBit Compress Algorithm – An Application of R
title_full	Modified HuffBit Compress Algorithm – An Application of R
title_fullStr	Modified HuffBit Compress Algorithm – An Application of R
title_full_unstemmed	Modified HuffBit Compress Algorithm – An Application of R
title_sort	modified huffbit compress algorithm – an application of r
publisher	De Gruyter
series	Journal of Integrative Bioinformatics
issn	1613-4516
publishDate	2018-02-01
description	The databases of genomic sequences are growing at an explicative rate because of the increasing growth of living organisms. Compressing deoxyribonucleic acid (DNA) sequences is a momentous task as the databases are getting closest to its threshold. Various compression algorithms are developed for DNA sequence compression. An efficient DNA compression algorithm that works on both repetitive and non-repetitive sequences known as “HuffBit Compress” is based on the concept of Extended Binary Tree. In this paper, here is proposed and developed a modified version of “HuffBit Compress” algorithm to compress and decompress DNA sequences using the R language which will always give the Best Case of the compression ratio but it uses extra 6 bits to compress than best case of “HuffBit Compress” algorithm and can be named as the “Modified HuffBit Compress Algorithm”. The algorithm makes an extended binary tree based on the Huffman Codes and the maximum occurring bases (A, C, G, T). Experimenting with 6 sequences the proposed algorithm gives approximately 16.18 % improvement in compression ration over the “HuffBit Compress” algorithm and 11.12 % improvement in compression ration over the “2-Bits Encoding Method”.
topic	compression and decompression compression ratio extended binary tree huffbit compress 2-bits encoding method
url	https://doi.org/10.1515/jib-2017-0057
work_keys_str_mv	AT habibnahida modifiedhuffbitcompressalgorithmanapplicationofr AT ahmedkawsar modifiedhuffbitcompressalgorithmanapplicationofr AT jabiniffat modifiedhuffbitcompressalgorithmanapplicationofr AT rahmanmohammadmotiur modifiedhuffbitcompressalgorithmanapplicationofr
_version_	1717768208195256320

Modified HuffBit Compress Algorithm – An Application of R

Similar Items