VariantStore: an index for large-scale genomic variant search

Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation gra...

Full description

Bibliographic Details
Main Authors: Prashant Pandey, Yinjie Gao, Carl Kingsford
Format: Article
Language:English
Published: BMC 2021-08-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-021-02442-8
Description
Summary:Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation.
ISSN:1474-760X