Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem

Background: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of struct...

Full description

Bibliographic Details
Main Authors: Geryk, J. (Author), Korabecna, M. (Author), Simková, H. (Author), Stenzl, V. (Author), Zedníková, I. (Author), Zinkova, A. (Author)
Format: Article
Language:English
Published: BioMed Central Ltd 2021
Subjects:
Online Access:View Fulltext in Publisher
LEADER 03450nam a2200637Ia 4500
001 10.1186-s12859-021-04374-3
008 220427s2021 CNT 000 0 und d
020 |a 14712105 (ISSN) 
245 1 0 |a Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem 
260 0 |b BioMed Central Ltd  |c 2021 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1186/s12859-021-04374-3 
520 3 |a Background: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. Results: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. Conclusions: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. © 2021, The Author(s). 
650 0 4 |a article 
650 0 4 |a Breakpoint 
650 0 4 |a Breakpoint uncertainty problem 
650 0 4 |a Breakpoints uncertainty problem 
650 0 4 |a cluster analysis 
650 0 4 |a Cluster analysis 
650 0 4 |a Cluster Analysis 
650 0 4 |a Clustering strategy 
650 0 4 |a Clusterings 
650 0 4 |a Constrained clustering 
650 0 4 |a Constrained clustering 
650 0 4 |a Dissimilarity measures 
650 0 4 |a Genes 
650 0 4 |a genetic variation 
650 0 4 |a Genome, Human 
650 0 4 |a Genomic Structural Variation 
650 0 4 |a genomics 
650 0 4 |a Genomics 
650 0 4 |a high throughput sequencing 
650 0 4 |a High-Throughput Nucleotide Sequencing 
650 0 4 |a human 
650 0 4 |a human genome 
650 0 4 |a Humans 
650 0 4 |a Mendelian inheritance 
650 0 4 |a Mendelian inheritance error 
650 0 4 |a Mendelian inheritance error 
650 0 4 |a prediction 
650 0 4 |a Structural variant 
650 0 4 |a Structural variants 
650 0 4 |a uncertainty 
650 0 4 |a uncertainty 
650 0 4 |a Uncertainty 
650 0 4 |a Uncertainty analysis 
650 0 4 |a Uncertainty problems 
650 0 4 |a whole genome sequencing 
650 0 4 |a Whole genome sequencing 
650 0 4 |a Whole genome sequencing 
700 1 |a Geryk, J.  |e author 
700 1 |a Korabecna, M.  |e author 
700 1 |a Simková, H.  |e author 
700 1 |a Stenzl, V.  |e author 
700 1 |a Zedníková, I.  |e author 
700 1 |a Zinkova, A.  |e author 
773 |t BMC Bioinformatics