|
|
|
|
LEADER |
03450nam a2200637Ia 4500 |
001 |
10.1186-s12859-021-04374-3 |
008 |
220427s2021 CNT 000 0 und d |
020 |
|
|
|a 14712105 (ISSN)
|
245 |
1 |
0 |
|a Improving structural variant clustering to reduce the negative effect of the breakpoint uncertainty problem
|
260 |
|
0 |
|b BioMed Central Ltd
|c 2021
|
856 |
|
|
|z View Fulltext in Publisher
|u https://doi.org/10.1186/s12859-021-04374-3
|
520 |
3 |
|
|a Background: Structural variants (SVs) represent an important source of genetic variation. One of the most critical problems in their detection is breakpoint uncertainty associated with the inability to determine their exact genomic position. Breakpoint uncertainty is a characteristic issue of structural variants detected via short-read sequencing methods and complicates subsequent population analyses. The commonly used heuristic strategy reduces this issue by clustering/merging nearby structural variants of the same type before the data from individual samples are merged. Results: We compared the two most used dissimilarity measures for SV clustering in terms of Mendelian inheritance errors (MIE), kinship prediction, and deviation from Hardy–Weinberg equilibrium. We analyzed the occurrence of Mendelian-inconsistent SV clusters that can be collapsed into one Mendelian-consistent SV as a new measure of dataset consistency. We also developed a new method based on constrained clustering that explicitly identifies these types of clusters. Conclusions: We found that the dissimilarity measure based on the distance between SVs breakpoints produces slightly better results than the measure based on SVs overlap. This difference is evident in trivial and corrected clustering strategy, but not in constrained clustering strategy. However, constrained clustering strategy provided the best results in all aspects, regardless of the dissimilarity measure used. © 2021, The Author(s).
|
650 |
0 |
4 |
|a article
|
650 |
0 |
4 |
|a Breakpoint
|
650 |
0 |
4 |
|a Breakpoint uncertainty problem
|
650 |
0 |
4 |
|a Breakpoints uncertainty problem
|
650 |
0 |
4 |
|a cluster analysis
|
650 |
0 |
4 |
|a Cluster analysis
|
650 |
0 |
4 |
|a Cluster Analysis
|
650 |
0 |
4 |
|a Clustering strategy
|
650 |
0 |
4 |
|a Clusterings
|
650 |
0 |
4 |
|a Constrained clustering
|
650 |
0 |
4 |
|a Constrained clustering
|
650 |
0 |
4 |
|a Dissimilarity measures
|
650 |
0 |
4 |
|a Genes
|
650 |
0 |
4 |
|a genetic variation
|
650 |
0 |
4 |
|a Genome, Human
|
650 |
0 |
4 |
|a Genomic Structural Variation
|
650 |
0 |
4 |
|a genomics
|
650 |
0 |
4 |
|a Genomics
|
650 |
0 |
4 |
|a high throughput sequencing
|
650 |
0 |
4 |
|a High-Throughput Nucleotide Sequencing
|
650 |
0 |
4 |
|a human
|
650 |
0 |
4 |
|a human genome
|
650 |
0 |
4 |
|a Humans
|
650 |
0 |
4 |
|a Mendelian inheritance
|
650 |
0 |
4 |
|a Mendelian inheritance error
|
650 |
0 |
4 |
|a Mendelian inheritance error
|
650 |
0 |
4 |
|a prediction
|
650 |
0 |
4 |
|a Structural variant
|
650 |
0 |
4 |
|a Structural variants
|
650 |
0 |
4 |
|a uncertainty
|
650 |
0 |
4 |
|a uncertainty
|
650 |
0 |
4 |
|a Uncertainty
|
650 |
0 |
4 |
|a Uncertainty analysis
|
650 |
0 |
4 |
|a Uncertainty problems
|
650 |
0 |
4 |
|a whole genome sequencing
|
650 |
0 |
4 |
|a Whole genome sequencing
|
650 |
0 |
4 |
|a Whole genome sequencing
|
700 |
1 |
|
|a Geryk, J.
|e author
|
700 |
1 |
|
|a Korabecna, M.
|e author
|
700 |
1 |
|
|a Simková, H.
|e author
|
700 |
1 |
|
|a Stenzl, V.
|e author
|
700 |
1 |
|
|a Zedníková, I.
|e author
|
700 |
1 |
|
|a Zinkova, A.
|e author
|
773 |
|
|
|t BMC Bioinformatics
|