A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model

The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (R...

Full description

Bibliographic Details
Main Authors: Mickael Orgeur, Marvin Martens, Stefan T. Börno, Bernd Timmermann, Delphine Duprez, Sigmar Stricker
Format: Article
Language:English
Published: The Company of Biologists 2018-01-01
Series:Biology Open
Subjects:
Online Access:http://bio.biologists.org/content/7/1/bio028498
id doaj-04174ccd027e45ad83a82ab1091f204b
record_format Article
spelling doaj-04174ccd027e45ad83a82ab1091f204b2021-06-02T19:02:31ZengThe Company of BiologistsBiology Open2046-63902018-01-017110.1242/bio.028498028498A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken modelMickael Orgeur0Marvin Martens1Stefan T. Börno2Bernd Timmermann3Delphine Duprez4Sigmar Stricker5 Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.http://bio.biologists.org/content/7/1/bio028498Chicken genome annotationGallus gallusGene predictionGenome-guided transcript discoveryRNA sequencingTranscriptome reconstruction
collection DOAJ
language English
format Article
sources DOAJ
author Mickael Orgeur
Marvin Martens
Stefan T. Börno
Bernd Timmermann
Delphine Duprez
Sigmar Stricker
spellingShingle Mickael Orgeur
Marvin Martens
Stefan T. Börno
Bernd Timmermann
Delphine Duprez
Sigmar Stricker
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
Biology Open
Chicken genome annotation
Gallus gallus
Gene prediction
Genome-guided transcript discovery
RNA sequencing
Transcriptome reconstruction
author_facet Mickael Orgeur
Marvin Martens
Stefan T. Börno
Bernd Timmermann
Delphine Duprez
Sigmar Stricker
author_sort Mickael Orgeur
title A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_short A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_full A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_fullStr A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_full_unstemmed A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
title_sort dual transcript-discovery approach to improve the delimitation of gene features from rna-seq data in the chicken model
publisher The Company of Biologists
series Biology Open
issn 2046-6390
publishDate 2018-01-01
description The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.
topic Chicken genome annotation
Gallus gallus
Gene prediction
Genome-guided transcript discovery
RNA sequencing
Transcriptome reconstruction
url http://bio.biologists.org/content/7/1/bio028498
work_keys_str_mv AT mickaelorgeur adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT marvinmartens adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT stefantborno adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT berndtimmermann adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT delphineduprez adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT sigmarstricker adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT mickaelorgeur dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT marvinmartens dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT stefantborno dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT berndtimmermann dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT delphineduprez dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
AT sigmarstricker dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel
_version_ 1721401908042661888