EXFI: Exon and splice graph prediction without a reference genome

Abstract For population genetic studies in nonmodel organisms, it is important to use every single source of genomic information. This paper presents EXFI, a Python pipeline that predicts the splice graph and exon sequences using an assembled transcriptome and raw whole‐genome sequencing reads. The...

Full description

Bibliographic Details
Main Authors: Jorge Langa, Andone Estonba, Darrell Conklin
Format: Article
Language:English
Published: Wiley 2020-08-01
Series:Ecology and Evolution
Subjects:
Online Access:https://doi.org/10.1002/ece3.6587
id doaj-185b7b7ae6f6495cb1a21286036dbfb4
record_format Article
spelling doaj-185b7b7ae6f6495cb1a21286036dbfb42021-04-02T09:27:12ZengWileyEcology and Evolution2045-77582020-08-0110168880889310.1002/ece3.6587EXFI: Exon and splice graph prediction without a reference genomeJorge Langa0Andone Estonba1Darrell Conklin2Department of Genetics, Physical Anthropology and Animal Physiology Faculty of Science and Technology University of the Basque Country Leioa SpainDepartment of Genetics, Physical Anthropology and Animal Physiology Faculty of Science and Technology University of the Basque Country Leioa SpainDepartment of Computer Science and Artificial Intelligence, Faculty of Computer Science University of the Basque Country UPV/EHU San Sebastián SpainAbstract For population genetic studies in nonmodel organisms, it is important to use every single source of genomic information. This paper presents EXFI, a Python pipeline that predicts the splice graph and exon sequences using an assembled transcriptome and raw whole‐genome sequencing reads. The main algorithm uses Bloom filters to remove reads that are not part of the transcriptome, to predict the intron–exon boundaries, to then proceed to call exons from the assembly, and to generate the underlying splice graph. The results are returned in GFA1 format, which encodes both the predicted exon sequences and how they are connected to form transcripts. EXFI is written in Python, tested on Linux platforms, and the source code is available under the MIT License at https://github.com/jlanga/exfi.https://doi.org/10.1002/ece3.6587exome sequencingexonsequence captureSNP discoverysplice graphtranscriptome
collection DOAJ
language English
format Article
sources DOAJ
author Jorge Langa
Andone Estonba
Darrell Conklin
spellingShingle Jorge Langa
Andone Estonba
Darrell Conklin
EXFI: Exon and splice graph prediction without a reference genome
Ecology and Evolution
exome sequencing
exon
sequence capture
SNP discovery
splice graph
transcriptome
author_facet Jorge Langa
Andone Estonba
Darrell Conklin
author_sort Jorge Langa
title EXFI: Exon and splice graph prediction without a reference genome
title_short EXFI: Exon and splice graph prediction without a reference genome
title_full EXFI: Exon and splice graph prediction without a reference genome
title_fullStr EXFI: Exon and splice graph prediction without a reference genome
title_full_unstemmed EXFI: Exon and splice graph prediction without a reference genome
title_sort exfi: exon and splice graph prediction without a reference genome
publisher Wiley
series Ecology and Evolution
issn 2045-7758
publishDate 2020-08-01
description Abstract For population genetic studies in nonmodel organisms, it is important to use every single source of genomic information. This paper presents EXFI, a Python pipeline that predicts the splice graph and exon sequences using an assembled transcriptome and raw whole‐genome sequencing reads. The main algorithm uses Bloom filters to remove reads that are not part of the transcriptome, to predict the intron–exon boundaries, to then proceed to call exons from the assembly, and to generate the underlying splice graph. The results are returned in GFA1 format, which encodes both the predicted exon sequences and how they are connected to form transcripts. EXFI is written in Python, tested on Linux platforms, and the source code is available under the MIT License at https://github.com/jlanga/exfi.
topic exome sequencing
exon
sequence capture
SNP discovery
splice graph
transcriptome
url https://doi.org/10.1002/ece3.6587
work_keys_str_mv AT jorgelanga exfiexonandsplicegraphpredictionwithoutareferencegenome
AT andoneestonba exfiexonandsplicegraphpredictionwithoutareferencegenome
AT darrellconklin exfiexonandsplicegraphpredictionwithoutareferencegenome
_version_ 1724169283669327872