VariantStore: an index for large-scale genomic variant search

Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation gra...

Full description

Bibliographic Details
Main Authors: Prashant Pandey, Yinjie Gao, Carl Kingsford
Format: Article
Language:English
Published: BMC 2021-08-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-021-02442-8
id doaj-704b27b4a9634b5ca4dd152c991eb668
record_format Article
spelling doaj-704b27b4a9634b5ca4dd152c991eb6682021-08-22T11:46:50ZengBMCGenome Biology1474-760X2021-08-0122112510.1186/s13059-021-02442-8VariantStore: an index for large-scale genomic variant searchPrashant Pandey0Yinjie Gao1Carl Kingsford2Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityComputational Biology Department, School of Computer Science, Carnegie Mellon UniversityComputational Biology Department, School of Computer Science, Carnegie Mellon UniversityAbstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation.https://doi.org/10.1186/s13059-021-02442-8Variation graphGraph genomesPangenomes
collection DOAJ
language English
format Article
sources DOAJ
author Prashant Pandey
Yinjie Gao
Carl Kingsford
spellingShingle Prashant Pandey
Yinjie Gao
Carl Kingsford
VariantStore: an index for large-scale genomic variant search
Genome Biology
Variation graph
Graph genomes
Pangenomes
author_facet Prashant Pandey
Yinjie Gao
Carl Kingsford
author_sort Prashant Pandey
title VariantStore: an index for large-scale genomic variant search
title_short VariantStore: an index for large-scale genomic variant search
title_full VariantStore: an index for large-scale genomic variant search
title_fullStr VariantStore: an index for large-scale genomic variant search
title_full_unstemmed VariantStore: an index for large-scale genomic variant search
title_sort variantstore: an index for large-scale genomic variant search
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2021-08-01
description Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation.
topic Variation graph
Graph genomes
Pangenomes
url https://doi.org/10.1186/s13059-021-02442-8
work_keys_str_mv AT prashantpandey variantstoreanindexforlargescalegenomicvariantsearch
AT yinjiegao variantstoreanindexforlargescalegenomicvariantsearch
AT carlkingsford variantstoreanindexforlargescalegenomicvariantsearch
_version_ 1721199328709574656