Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.

Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Cu...

Full description

Bibliographic Details
Main Authors:	Andrey Ptitsyn, Ramzi Temanni, Christelle Bouchard, Peter A V Anderson
Format:	Article
Language:	English
Published:	Public Library of Science (PLoS) 2015-01-01
Series:	PLoS ONE
Online Access:	http://europepmc.org/articles/PMC4578894?pdf=render

id	doaj-a119408790ad48f4b2f70f9a98d718df
record_format	Article
spelling	doaj-a119408790ad48f4b2f70f9a98d718df2020-11-25T01:24:09ZengPublic Library of Science (PLoS)PLoS ONE1932-62032015-01-01109e013800610.1371/journal.pone.0138006Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.Andrey PtitsynRamzi TemanniChristelle BouchardPeter A V AndersonTranscriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Currently all methods for de novo assembly are based on the concept of matching the nucleotide context overlapping between short fragments-reads. However, even short reads may still contain biologically relevant information which can be used as hints in guiding the assembly process. We propose a computational workflow for the reconstruction and functional annotation of expressed gene transcripts that does not require a reference genome sequence and can be tolerant to low coverage, high error rates and other issues that often lead to poor results of de novo assembly in studies of non-model organisms. We start with either raw sequences or the output of a context-based de novo transcriptome assembly. Instead of mapping reads to a reference genome or creating a completely unsupervised clustering of reads, we assemble the unknown transcriptome using nearest homologs from a public database as seeds. We consider even distant relations, indirectly linking protein-coding fragments to entire gene families in multiple distantly related genomes. The intended application of the proposed method is an additional step of semantic (based on relations between protein-coding fragments) scaffolding following traditional (i.e. based on sequence overlap) de novo assembly. The method we developed was effective in analysis of the jellyfish Cyanea capillata transcriptome and may be applicable in other studies of gene expression in species lacking a high quality reference genome sequence. Our algorithms are implemented in C and designed for parallel computation using a high-performance computer. The software is available free of charge via an open source license.http://europepmc.org/articles/PMC4578894?pdf=render
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Andrey Ptitsyn Ramzi Temanni Christelle Bouchard Peter A V Anderson
spellingShingle	Andrey Ptitsyn Ramzi Temanni Christelle Bouchard Peter A V Anderson Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome. PLoS ONE
author_facet	Andrey Ptitsyn Ramzi Temanni Christelle Bouchard Peter A V Anderson
author_sort	Andrey Ptitsyn
title	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.
title_short	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.
title_full	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.
title_fullStr	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.
title_full_unstemmed	Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.
title_sort	semantic assembly and annotation of draft rnaseq transcripts without a reference genome.
publisher	Public Library of Science (PLoS)
series	PLoS ONE
issn	1932-6203
publishDate	2015-01-01
description	Transcriptomes are one of the first sources of high-throughput genomic data that have benefitted from the introduction of Next-Gen Sequencing. As sequencing technology becomes more accessible, transcriptome sequencing is applicable to multiple organisms for which genome sequences are unavailable. Currently all methods for de novo assembly are based on the concept of matching the nucleotide context overlapping between short fragments-reads. However, even short reads may still contain biologically relevant information which can be used as hints in guiding the assembly process. We propose a computational workflow for the reconstruction and functional annotation of expressed gene transcripts that does not require a reference genome sequence and can be tolerant to low coverage, high error rates and other issues that often lead to poor results of de novo assembly in studies of non-model organisms. We start with either raw sequences or the output of a context-based de novo transcriptome assembly. Instead of mapping reads to a reference genome or creating a completely unsupervised clustering of reads, we assemble the unknown transcriptome using nearest homologs from a public database as seeds. We consider even distant relations, indirectly linking protein-coding fragments to entire gene families in multiple distantly related genomes. The intended application of the proposed method is an additional step of semantic (based on relations between protein-coding fragments) scaffolding following traditional (i.e. based on sequence overlap) de novo assembly. The method we developed was effective in analysis of the jellyfish Cyanea capillata transcriptome and may be applicable in other studies of gene expression in species lacking a high quality reference genome sequence. Our algorithms are implemented in C and designed for parallel computation using a high-performance computer. The software is available free of charge via an open source license.
url	http://europepmc.org/articles/PMC4578894?pdf=render
work_keys_str_mv	AT andreyptitsyn semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT ramzitemanni semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT christellebouchard semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome AT peteravanderson semanticassemblyandannotationofdraftrnaseqtranscriptswithoutareferencegenome
_version_	1725118505011904512

Semantic Assembly and Annotation of Draft RNAseq Transcripts without a Reference Genome.

Similar Items