SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups

Abstract Background The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron...

Full description

Bibliographic Details
Main Authors: Safa Jammali, Jean-David Aguilar, Esaie Kuitche, Aïda Ouangraoua
Format: Article
Language:English
Published: BMC 2019-03-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-019-2647-2
id doaj-4d6b5c6b3dfd48ea818baeef0c6a2fc8
record_format Article
spelling doaj-4d6b5c6b3dfd48ea818baeef0c6a2fc82020-11-25T03:51:10ZengBMCBMC Bioinformatics1471-21052019-03-0120S3375210.1186/s12859-019-2647-2SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groupsSafa Jammali0Jean-David Aguilar1Esaie Kuitche2Aïda Ouangraoua3Department of Computer science, Faculty of Science, Université de SherbrookeDepartment of Computer science, Faculty of Science, Université de SherbrookeDepartment of Computer science, Faculty of Science, Université de SherbrookeDepartment of Computer science, Faculty of Science, Université de SherbrookeAbstract Background The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. Results The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. Conclusion We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign.http://link.springer.com/article/10.1186/s12859-019-2647-2Spliced alignmentTranscript orthology groupsSplicing structureSplicing orthologyGene family
collection DOAJ
language English
format Article
sources DOAJ
author Safa Jammali
Jean-David Aguilar
Esaie Kuitche
Aïda Ouangraoua
spellingShingle Safa Jammali
Jean-David Aguilar
Esaie Kuitche
Aïda Ouangraoua
SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
BMC Bioinformatics
Spliced alignment
Transcript orthology groups
Splicing structure
Splicing orthology
Gene family
author_facet Safa Jammali
Jean-David Aguilar
Esaie Kuitche
Aïda Ouangraoua
author_sort Safa Jammali
title SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_short SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_full SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_fullStr SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_full_unstemmed SplicedFamAlign: CDS-to-gene spliced alignment and identification of transcript orthology groups
title_sort splicedfamalign: cds-to-gene spliced alignment and identification of transcript orthology groups
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-03-01
description Abstract Background The inference of splicing orthology relationships between gene transcripts is a basic step for the prediction of transcripts and the annotation of gene structures in genomes. The splicing structure of a sequence refers to the exon extremity information in a CDS or the exon-intron extremity information in a gene sequence. Splicing orthologous CDS are pairs of CDS with similar sequences and conserved splicing structures from orthologous genes. Spliced alignment that consists in aligning a spliced cDNA sequence against an unspliced genomic sequence, constitutes a promising, yet unexplored approach for the identification of splicing orthology relationships. Existing spliced alignment algorithms do not exploit the information on the splicing structure of the input sequences, namely the exon structure of the cDNA sequence and the exon-intron structure of the genomic sequences. Yet, this information is often available for coding DNA sequences (CDS) and gene sequences annotated in databases, and it can help improve the accuracy of the computed spliced alignments. To address this issue, we introduce a new spliced alignment problem and a method called SplicedFamAlign (SFA) for computing the alignment of a spliced CDS against a gene sequence while accounting for the splicing structures of the input sequences, and then the inference of transcript splicing orthology groups in a gene family based on spliced alignments. Results The experimental results show that SFA outperforms existing spliced alignment methods in terms of accuracy and execution time for CDS-to-gene alignment. We also show that the performance of SFA remains high for various levels of sequence similarity between input sequences, thanks to accounting for the splicing structure of the input sequences. It is important to notice that unlike all current spliced alignment methods that are meant for cDNA-to-genome alignments and can be used for CDS-to-gene alignments, SFA is the first method specifically designed for CDS-to-gene alignments. Conclusion We show the usefulness of SFA for the comparison of genes and transcripts within a gene family for the purpose of analyzing splicing orthologies. It can also be used for gene structure annotation and alternative splicing analyses. SplicedFamAlign was implemented in Python. Source code is freely available at https://github.com/UdeS-CoBIUS/SpliceFamAlign.
topic Spliced alignment
Transcript orthology groups
Splicing structure
Splicing orthology
Gene family
url http://link.springer.com/article/10.1186/s12859-019-2647-2
work_keys_str_mv AT safajammali splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT jeandavidaguilar splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT esaiekuitche splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
AT aidaouangraoua splicedfamaligncdstogenesplicedalignmentandidentificationoftranscriptorthologygroups
_version_ 1724488390227787776