Regional sequence expansion or collapse in heterozygous genome assemblies.

High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algo...

Full description

Bibliographic Details
Main Authors: Kathryn C Asalone, Kara M Ryan, Maryam Yamadi, Annastelle L Cohen, William G Farmer, Deborah J George, Claudia Joppert, Kaitlyn Kim, Madeeha Froze Mughal, Rana Said, Metin Toksoz-Exley, Evgeny Bisk, John R Bracht
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2020-07-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008104
id doaj-b63829ddde4e460a8e2769cba77236d5
record_format Article
spelling doaj-b63829ddde4e460a8e2769cba77236d52021-04-21T15:16:29ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582020-07-01167e100810410.1371/journal.pcbi.1008104Regional sequence expansion or collapse in heterozygous genome assemblies.Kathryn C AsaloneKara M RyanMaryam YamadiAnnastelle L CohenWilliam G FarmerDeborah J GeorgeClaudia JoppertKaitlyn KimMadeeha Froze MughalRana SaidMetin Toksoz-ExleyEvgeny BiskJohn R BrachtHigh levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.https://doi.org/10.1371/journal.pcbi.1008104
collection DOAJ
language English
format Article
sources DOAJ
author Kathryn C Asalone
Kara M Ryan
Maryam Yamadi
Annastelle L Cohen
William G Farmer
Deborah J George
Claudia Joppert
Kaitlyn Kim
Madeeha Froze Mughal
Rana Said
Metin Toksoz-Exley
Evgeny Bisk
John R Bracht
spellingShingle Kathryn C Asalone
Kara M Ryan
Maryam Yamadi
Annastelle L Cohen
William G Farmer
Deborah J George
Claudia Joppert
Kaitlyn Kim
Madeeha Froze Mughal
Rana Said
Metin Toksoz-Exley
Evgeny Bisk
John R Bracht
Regional sequence expansion or collapse in heterozygous genome assemblies.
PLoS Computational Biology
author_facet Kathryn C Asalone
Kara M Ryan
Maryam Yamadi
Annastelle L Cohen
William G Farmer
Deborah J George
Claudia Joppert
Kaitlyn Kim
Madeeha Froze Mughal
Rana Said
Metin Toksoz-Exley
Evgeny Bisk
John R Bracht
author_sort Kathryn C Asalone
title Regional sequence expansion or collapse in heterozygous genome assemblies.
title_short Regional sequence expansion or collapse in heterozygous genome assemblies.
title_full Regional sequence expansion or collapse in heterozygous genome assemblies.
title_fullStr Regional sequence expansion or collapse in heterozygous genome assemblies.
title_full_unstemmed Regional sequence expansion or collapse in heterozygous genome assemblies.
title_sort regional sequence expansion or collapse in heterozygous genome assemblies.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2020-07-01
description High levels of heterozygosity present a unique genome assembly challenge and can adversely impact downstream analyses, yet is common in sequencing datasets obtained from non-model organisms. Here we show that by re-assembling a heterozygous dataset with variant parameters and different assembly algorithms, we are able to generate assemblies whose protein annotations are statistically enriched for specific gene ontology categories. While total assembly length was not significantly affected by assembly methodologies tested, the assemblies generated varied widely in fragmentation level and we show local assembly collapse or expansion underlying the enrichment or depletion of specific protein functional groups. We show that these statistically significant deviations in gene ontology groups can occur in seemingly high-quality assemblies, and result from difficult-to-detect local sequence expansion or contractions. Given the unpredictable interplay between assembly algorithm, parameter, and biological sequence data heterozygosity, we highlight the need for better measures of assembly quality than N50 value, including methods for assessing local expansion and collapse.
url https://doi.org/10.1371/journal.pcbi.1008104
work_keys_str_mv AT kathryncasalone regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT karamryan regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT maryamyamadi regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT annastellelcohen regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT williamgfarmer regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT deborahjgeorge regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT claudiajoppert regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT kaitlynkim regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT madeehafrozemughal regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT ranasaid regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT metintoksozexley regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT evgenybisk regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
AT johnrbracht regionalsequenceexpansionorcollapseinheterozygousgenomeassemblies
_version_ 1714667558731251712