Regional sequence expansion or collapse in heterozygous genome assemblies.
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algo...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2020-07-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1008104 |
id |
doaj-b63829ddde4e460a8e2769cba77236d5 |
---|---|
record_format |
Article |
spelling |
doaj-b63829ddde4e460a8e2769cba77236d52021-04-21T15:16:29ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-07-01167e100810410.1371/journal.pcbi.1008104Regional sequence expansion or collapse in heterozygous genome assemblies.Kathryn C AsaloneKara M RyanMaryam YamadiAnnastelle L CohenWilliam G FarmerDeborah J GeorgeClaudia JoppertKaitlyn KimMadeeha Froze MughalRana SaidMetin Toksoz-ExleyEvgeny BiskJohn R BrachtHigh levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.https://doi.org/10.1371/journal.pcbi.1008104 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Kathryn C Asalone Kara M Ryan Maryam Yamadi Annastelle L Cohen William G Farmer Deborah J George Claudia Joppert Kaitlyn Kim Madeeha Froze Mughal Rana Said Metin Toksoz-Exley Evgeny Bisk John R Bracht |
spellingShingle |
Kathryn C Asalone Kara M Ryan Maryam Yamadi Annastelle L Cohen William G Farmer Deborah J George Claudia Joppert Kaitlyn Kim Madeeha Froze Mughal Rana Said Metin Toksoz-Exley Evgeny Bisk John R Bracht Regional sequence expansion or collapse in heterozygous genome assemblies. PLoS Computational Biology |
author_facet |
Kathryn C Asalone Kara M Ryan Maryam Yamadi Annastelle L Cohen William G Farmer Deborah J George Claudia Joppert Kaitlyn Kim Madeeha Froze Mughal Rana Said Metin Toksoz-Exley Evgeny Bisk John R Bracht |
author_sort |
Kathryn C Asalone |
title |
Regional sequence expansion or collapse in heterozygous genome assemblies. |
title_short |
Regional sequence expansion or collapse in heterozygous genome assemblies. |
title_full |
Regional sequence expansion or collapse in heterozygous genome assemblies. |
title_fullStr |
Regional sequence expansion or collapse in heterozygous genome assemblies. |
title_full_unstemmed |
Regional sequence expansion or collapse in heterozygous genome assemblies. |
title_sort |
regional sequence expansion or collapse in heterozygous genome assemblies. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2020-07-01 |
description |
High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse. |
url |
https://doi.org/10.1371/journal.pcbi.1008104 |
work_keys_str_mv |
AT kathryncasalone regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT karamryan regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT maryamyamadi regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT annastellelcohen regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT williamgfarmer regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT deborahjgeorge regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT claudiajoppert regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT kaitlynkim regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT madeehafrozemughal regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT ranasaid regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT metintoksozexley regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT evgenybisk regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies AT johnrbracht regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies |
_version_ |
1714667558731251712 |