A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model
The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (R...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
The Company of Biologists
2018-01-01
|
Series: | Biology Open |
Subjects: | |
Online Access: | http://bio.biologists.org/content/7/1/bio028498 |
id |
doaj-04174ccd027e45ad83a82ab1091f204b |
---|---|
record_format |
Article |
spelling |
doaj-04174ccd027e45ad83a82ab1091f204b2021-06-02T19:02:31ZengThe Company of BiologistsBiology Open2046-63902018-01-017110.1242/bio.028498028498A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken modelMickael Orgeur0Marvin Martens1Stefan T. Börno2Bernd Timmermann3Delphine Duprez4Sigmar Stricker5 Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany Max Planck Institute for Molecular Genetics, Development and Disease Group, Ihnestrasse 63-73, 14195 Berlin, Germany Sorbonne Universités, UPMC Univ. Paris 06, CNRS UMR 7622, Inserm U1156, IBPS-Developmental Biology Laboratory, 9 Quai Saint-Bernard, 75252 Paris Cedex 05, France Freie Universität Berlin, Institut für Chemie und Biochemie, Thielallee 63, 14195 Berlin, Germany The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies.http://bio.biologists.org/content/7/1/bio028498Chicken genome annotationGallus gallusGene predictionGenome-guided transcript discoveryRNA sequencingTranscriptome reconstruction |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Mickael Orgeur Marvin Martens Stefan T. Börno Bernd Timmermann Delphine Duprez Sigmar Stricker |
spellingShingle |
Mickael Orgeur Marvin Martens Stefan T. Börno Bernd Timmermann Delphine Duprez Sigmar Stricker A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model Biology Open Chicken genome annotation Gallus gallus Gene prediction Genome-guided transcript discovery RNA sequencing Transcriptome reconstruction |
author_facet |
Mickael Orgeur Marvin Martens Stefan T. Börno Bernd Timmermann Delphine Duprez Sigmar Stricker |
author_sort |
Mickael Orgeur |
title |
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model |
title_short |
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model |
title_full |
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model |
title_fullStr |
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model |
title_full_unstemmed |
A dual transcript-discovery approach to improve the delimitation of gene features from RNA-seq data in the chicken model |
title_sort |
dual transcript-discovery approach to improve the delimitation of gene features from rna-seq data in the chicken model |
publisher |
The Company of Biologists |
series |
Biology Open |
issn |
2046-6390 |
publishDate |
2018-01-01 |
description |
The sequence of the chicken genome, like several other draft genome sequences, is presently not fully covered. Gaps, contigs assigned with low confidence and uncharacterized chromosomes result in gene fragmentation and imprecise gene annotation. Transcript abundance estimation from RNA sequencing (RNA-seq) data relies on read quality, library complexity and expression normalization. In addition, the quality of the genome sequence used to map sequencing reads, and the gene annotation that defines gene features, must also be taken into account. A partially covered genome sequence causes the loss of sequencing reads from the mapping step, while an inaccurate definition of gene features induces imprecise read counts from the assignment step. Both steps can significantly bias interpretation of RNA-seq data. Here, we describe a dual transcript-discovery approach combining a genome-guided gene prediction and a de novo transcriptome assembly. This dual approach enabled us to increase the assignment rate of RNA-seq data by nearly 20% as compared to when using only the chicken reference annotation, contributing therefore to a more accurate estimation of transcript abundance. More generally, this strategy could be applied to any organism with partial genome sequence and/or lacking a manually-curated reference annotation in order to improve the accuracy of gene expression studies. |
topic |
Chicken genome annotation Gallus gallus Gene prediction Genome-guided transcript discovery RNA sequencing Transcriptome reconstruction |
url |
http://bio.biologists.org/content/7/1/bio028498 |
work_keys_str_mv |
AT mickaelorgeur adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT marvinmartens adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT stefantborno adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT berndtimmermann adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT delphineduprez adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT sigmarstricker adualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT mickaelorgeur dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT marvinmartens dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT stefantborno dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT berndtimmermann dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT delphineduprez dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel AT sigmarstricker dualtranscriptdiscoveryapproachtoimprovethedelimitationofgenefeaturesfromrnaseqdatainthechickenmodel |
_version_ |
1721401908042661888 |