Sparse and skew hashing of K-mers

MOTIVATION: A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and supports membership queries. This data structure is at the hearth of many important tasks in computational biology. High-throughput sequencing of DNA can produce very large k-mer sets, in the size of bi...

Full description

Bibliographic Details
Main Author:	Pibiri, G.E (Author)
Format:	Article
Language:	English
Published:	NLM (Medline) 2022
Subjects:	article bioinformatics time trade-off method
Online Access:	View Fulltext in Publisher


LEADER	01625nam a2200169Ia 4500
001	10.1093-bioinformatics-btac245
008	220706s2022 CNT 000 0 und d
020			\|a 13674811 (ISSN)
245	1	0	\|a Sparse and skew hashing of K-mers
260		0	\|b NLM (Medline) \|c 2022
856			\|z View Fulltext in Publisher \|u https://doi.org/10.1093/bioinformatics/btac245
520	3		\|a MOTIVATION: A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and supports membership queries. This data structure is at the hearth of many important tasks in computational biology. High-throughput sequencing of DNA can produce very large k-mer sets, in the size of billions of strings-in such cases, the memory consumption and query efficiency of the data structure is a concrete challenge. RESULTS: To tackle this problem, we describe a compressed and associative dictionary for k-mers, that is: a data structure where strings are represented in compact form and each of them is associated to a unique integer identifier in the range [0,n). We show that some statistical properties of k-mer minimizers can be exploited by minimal perfect hashing to substantially improve the space/time trade-off of the dictionary compared to the best-known solutions. AVAILABILITY AND IMPLEMENTATION: https://github.com/jermp/sshash. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2022. Published by Oxford University Press.
650	0	4	\|a article
650	0	4	\|a bioinformatics
650	0	4	\|a time trade-off method
700	1		\|a Pibiri, G.E. \|e author
773			\|t Bioinformatics (Oxford, England)

Sparse and skew hashing of K-mers

Similar Items