VariantStore: an index for large-scale genomic variant search
Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation gra...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2021-08-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-021-02442-8 |
id |
doaj-704b27b4a9634b5ca4dd152c991eb668 |
---|---|
record_format |
Article |
spelling |
doaj-704b27b4a9634b5ca4dd152c991eb6682021-08-22T11:46:50ZengBMCGenome Biology1474-760X2021-08-0122112510.1186/s13059-021-02442-8VariantStore: an index for large-scale genomic variant searchPrashant Pandey0Yinjie Gao1Carl Kingsford2Computational Biology Department, School of Computer Science, Carnegie Mellon UniversityComputational Biology Department, School of Computer Science, Carnegie Mellon UniversityComputational Biology Department, School of Computer Science, Carnegie Mellon UniversityAbstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation.https://doi.org/10.1186/s13059-021-02442-8Variation graphGraph genomesPangenomes |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Prashant Pandey Yinjie Gao Carl Kingsford |
spellingShingle |
Prashant Pandey Yinjie Gao Carl Kingsford VariantStore: an index for large-scale genomic variant search Genome Biology Variation graph Graph genomes Pangenomes |
author_facet |
Prashant Pandey Yinjie Gao Carl Kingsford |
author_sort |
Prashant Pandey |
title |
VariantStore: an index for large-scale genomic variant search |
title_short |
VariantStore: an index for large-scale genomic variant search |
title_full |
VariantStore: an index for large-scale genomic variant search |
title_fullStr |
VariantStore: an index for large-scale genomic variant search |
title_full_unstemmed |
VariantStore: an index for large-scale genomic variant search |
title_sort |
variantstore: an index for large-scale genomic variant search |
publisher |
BMC |
series |
Genome Biology |
issn |
1474-760X |
publishDate |
2021-08-01 |
description |
Abstract Efficiently scaling genomic variant search indexes to thousands of samples is computationally challenging due to the presence of multiple coordinate systems to avoid reference biases. We present VariantStore, a system that indexes genomic variants from multiple samples using a variation graph and enables variant queries across any sample-specific coordinate system. We show the scalability of VariantStore by indexing genomic variants from the TCGA project in 4 h and the 1000 Genomes project in 3 h. Querying for variants in a gene takes between 0.002 and 3 seconds using memory only 10% of the size of the full representation. |
topic |
Variation graph Graph genomes Pangenomes |
url |
https://doi.org/10.1186/s13059-021-02442-8 |
work_keys_str_mv |
AT prashantpandey variantstoreanindexforlargescalegenomicvariantsearch AT yinjiegao variantstoreanindexforlargescalegenomicvariantsearch AT carlkingsford variantstoreanindexforlargescalegenomicvariantsearch |
_version_ |
1721199328709574656 |