gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks

Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designe...

Full description

Bibliographic Details
Main Authors: Madison Caballero, Jill Wegrzyn
Format: Article
Language:English
Published: Elsevier 2019-06-01
Series:Genomics, Proteomics & Bioinformatics
Online Access:http://www.sciencedirect.com/science/article/pii/S1672022919301202
id doaj-dd060ba4f85e44eb95774b11916614ca
record_format Article
spelling doaj-dd060ba4f85e44eb95774b11916614ca2020-11-24T21:37:11ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292019-06-01173305310gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction FrameworksMadison Caballero0Jill Wegrzyn1Corresponding authors.; Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USACorresponding authors.; Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USAPublished genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs. Keywords: Genome annotation, Bioinformatics, Protein annotation, Gene prediction, Alignmenthttp://www.sciencedirect.com/science/article/pii/S1672022919301202
collection DOAJ
language English
format Article
sources DOAJ
author Madison Caballero
Jill Wegrzyn
spellingShingle Madison Caballero
Jill Wegrzyn
gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
Genomics, Proteomics & Bioinformatics
author_facet Madison Caballero
Jill Wegrzyn
author_sort Madison Caballero
title gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
title_short gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
title_full gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
title_fullStr gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
title_full_unstemmed gFACs: Gene Filtering, Analysis, and Conversion to Unify Genome Annotations Across Alignment and Gene Prediction Frameworks
title_sort gfacs: gene filtering, analysis, and conversion to unify genome annotations across alignment and gene prediction frameworks
publisher Elsevier
series Genomics, Proteomics & Bioinformatics
issn 1672-0229
publishDate 2019-06-01
description Published genomes frequently contain erroneous gene models that represent issues associated with identification of open reading frames, start sites, splice sites, and related structural features. The source of these inconsistencies is often traced back to integration across text file formats designed to describe long read alignments and predicted gene structures. In addition, the majority of gene prediction frameworks do not provide robust downstream filtering to remove problematic gene annotations, nor do they represent these annotations in a format consistent with current file standards. These frameworks also lack consideration for functional attributes, such as the presence or absence of protein domains that can be used for gene model validation. To provide oversight to the increasing number of published genome annotations, we present a software package, the Gene Filtering, Analysis, and Conversion (gFACs), to filter, analyze, and convert predicted gene models and alignments. The software operates across a wide range of alignment, analysis, and gene prediction files with a flexible framework for defining gene models with reliable structural and functional attributes. gFACs supports common downstream applications, including genome browsers, and generates extensive details on the filtering process, including distributions that can be visualized to further assess the proposed gene space. gFACs is freely available and implemented in Perl with support from BioPerl libraries at https://gitlab.com/PlantGenomicsLab/gFACs. Keywords: Genome annotation, Bioinformatics, Protein annotation, Gene prediction, Alignment
url http://www.sciencedirect.com/science/article/pii/S1672022919301202
work_keys_str_mv AT madisoncaballero gfacsgenefilteringanalysisandconversiontounifygenomeannotationsacrossalignmentandgenepredictionframeworks
AT jillwegrzyn gfacsgenefilteringanalysisandconversiontounifygenomeannotationsacrossalignmentandgenepredictionframeworks
_version_ 1725937792472580096