Sparse and skew hashing of K-mers

MOTIVATION: A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and supports membership queries. This data structure is at the hearth of many important tasks in computational biology. High-throughput sequencing of DNA can produce very large k-mer sets, in the size of bi...

Full description

Bibliographic Details
Main Author: Pibiri, G.E (Author)
Format: Article
Language:English
Published: NLM (Medline) 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 01625nam a2200169Ia 4500
001 10.1093-bioinformatics-btac245
008 220706s2022 CNT 000 0 und d
020 |a 13674811 (ISSN) 
245 1 0 |a Sparse and skew hashing of K-mers 
260 0 |b NLM (Medline)  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1093/bioinformatics/btac245 
520 3 |a MOTIVATION: A dictionary of k-mers is a data structure that stores a set of n distinct k-mers and supports membership queries. This data structure is at the hearth of many important tasks in computational biology. High-throughput sequencing of DNA can produce very large k-mer sets, in the size of billions of strings-in such cases, the memory consumption and query efficiency of the data structure is a concrete challenge. RESULTS: To tackle this problem, we describe a compressed and associative dictionary for k-mers, that is: a data structure where strings are represented in compact form and each of them is associated to a unique integer identifier in the range [0,n). We show that some statistical properties of k-mer minimizers can be exploited by minimal perfect hashing to substantially improve the space/time trade-off of the dictionary compared to the best-known solutions. AVAILABILITY AND IMPLEMENTATION: https://github.com/jermp/sshash. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. © The Author(s) 2022. Published by Oxford University Press. 
650 0 4 |a article 
650 0 4 |a bioinformatics 
650 0 4 |a time trade-off method 
700 1 |a Pibiri, G.E.  |e author 
773 |t Bioinformatics (Oxford, England)