GMASS: a novel measure for genome assembly structural similarity

Abstract Background Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS...

Full description

Bibliographic Details
Main Authors: Daehong Kwon, Jongin Lee, Jaebum Kim
Format: Article
Language:English
Published: BMC 2019-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2710-z
id doaj-1fe7a149d16046288f1037eb684f42e4
record_format Article
spelling doaj-1fe7a149d16046288f1037eb684f42e42020-11-25T02:50:26ZengBMCBMC Bioinformatics1471-21052019-03-012011910.1186/s12859-019-2710-zGMASS: a novel measure for genome assembly structural similarityDaehong Kwon0Jongin Lee1Jaebum Kim2Department of Biomedical Science and Engineering, Konkuk UniversityDepartment of Biomedical Science and Engineering, Konkuk UniversityDepartment of Biomedical Science and Engineering, Konkuk UniversityAbstract Background Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies. Results We developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies. Conclusion The GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.http://link.springer.com/article/10.1186/s12859-019-2710-zMeasureGenomeAssemblyStructural similarity
collection DOAJ
language English
format Article
sources DOAJ
author Daehong Kwon
Jongin Lee
Jaebum Kim
spellingShingle Daehong Kwon
Jongin Lee
Jaebum Kim
GMASS: a novel measure for genome assembly structural similarity
BMC Bioinformatics
Measure
Genome
Assembly
Structural similarity
author_facet Daehong Kwon
Jongin Lee
Jaebum Kim
author_sort Daehong Kwon
title GMASS: a novel measure for genome assembly structural similarity
title_short GMASS: a novel measure for genome assembly structural similarity
title_full GMASS: a novel measure for genome assembly structural similarity
title_fullStr GMASS: a novel measure for genome assembly structural similarity
title_full_unstemmed GMASS: a novel measure for genome assembly structural similarity
title_sort gmass: a novel measure for genome assembly structural similarity
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-03-01
description Abstract Background Thanks to the recent advancements in next-generation sequencing (NGS) technologies, large amount of genomic data, which are short DNA sequences known as reads, has been accumulating. Diverse assemblers have been developed to generate high quality de novo assemblies using the NGS reads, but their output is very different because of algorithmic differences. However, there are not properly structured measures to show the similarity or difference in assemblies. Results We developed a new measure, called the GMASS score, for comparing two genome assemblies in terms of their structure. The GMASS score was developed based on the distribution pattern of the number and coverage of similar regions between a pair of assemblies. The new measure was able to show structural similarity between assemblies when evaluated by simulated assembly datasets. The application of the GMASS score to compare assemblies in recently published benchmark datasets showed the divergent performance of current assemblers as well as its ability to compare assemblies. Conclusion The GMASS score is a novel measure for representing structural similarity between two assemblies. It will contribute to the understanding of assembly output and developing de novo assemblers.
topic Measure
Genome
Assembly
Structural similarity
url http://link.springer.com/article/10.1186/s12859-019-2710-z
work_keys_str_mv AT daehongkwon gmassanovelmeasureforgenomeassemblystructuralsimilarity
AT jonginlee gmassanovelmeasureforgenomeassemblystructuralsimilarity
AT jaebumkim gmassanovelmeasureforgenomeassemblystructuralsimilarity
_version_ 1724738561378353152