Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline

Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous m...

Full description

Bibliographic Details
Main Authors: Shujun Ou, Weija Su, Yi Liao, Kapeel Chougule, Jireh R. A. Agda, Adam J. Hellinga, Carlos Santiago Blanco Lugo, Tyler A. Elliott, Doreen Ware, Thomas Peterson, Ning Jiang, Candice N. Hirsch, Matthew B. Hufford
Format: Article
Language:English
Published: BMC 2019-12-01
Series:Genome Biology
Subjects:
Online Access:https://doi.org/10.1186/s13059-019-1905-y
id doaj-e5d6f89b34fa4f84a53ead58e3f77d1c
record_format Article
spelling doaj-e5d6f89b34fa4f84a53ead58e3f77d1c2020-12-20T12:39:43ZengBMCGenome Biology1474-760X2019-12-0120111810.1186/s13059-019-1905-yBenchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipelineShujun Ou0Weija Su1Yi Liao2Kapeel Chougule3Jireh R. A. Agda4Adam J. Hellinga5Carlos Santiago Blanco Lugo6Tyler A. Elliott7Doreen Ware8Thomas Peterson9Ning Jiang10Candice N. Hirsch11Matthew B. Hufford12Department of Ecology, Evolution, and Organismal Biology, Iowa State UniversityDepartment of Genetics, Development, and Cell Biology, Iowa State UniversityDepartment of Ecology and Evolutionary Biology, University of CaliforniaCold Spring Harbor LaboratoryCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCold Spring Harbor LaboratoryDepartment of Genetics, Development, and Cell Biology, Iowa State UniversityDepartment of Horticulture, Michigan State UniversityDepartment of Agronomy and Plant Genetics, University of MinnesotaDepartment of Ecology, Evolution, and Organismal Biology, Iowa State UniversityAbstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F 1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.https://doi.org/10.1186/s13059-019-1905-yTransposable elementAnnotationGenomeBenchmarkingPipeline
collection DOAJ
language English
format Article
sources DOAJ
author Shujun Ou
Weija Su
Yi Liao
Kapeel Chougule
Jireh R. A. Agda
Adam J. Hellinga
Carlos Santiago Blanco Lugo
Tyler A. Elliott
Doreen Ware
Thomas Peterson
Ning Jiang
Candice N. Hirsch
Matthew B. Hufford
spellingShingle Shujun Ou
Weija Su
Yi Liao
Kapeel Chougule
Jireh R. A. Agda
Adam J. Hellinga
Carlos Santiago Blanco Lugo
Tyler A. Elliott
Doreen Ware
Thomas Peterson
Ning Jiang
Candice N. Hirsch
Matthew B. Hufford
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
Genome Biology
Transposable element
Annotation
Genome
Benchmarking
Pipeline
author_facet Shujun Ou
Weija Su
Yi Liao
Kapeel Chougule
Jireh R. A. Agda
Adam J. Hellinga
Carlos Santiago Blanco Lugo
Tyler A. Elliott
Doreen Ware
Thomas Peterson
Ning Jiang
Candice N. Hirsch
Matthew B. Hufford
author_sort Shujun Ou
title Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_short Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_full Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_fullStr Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_full_unstemmed Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
title_sort benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
publisher BMC
series Genome Biology
issn 1474-760X
publishDate 2019-12-01
description Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F 1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.
topic Transposable element
Annotation
Genome
Benchmarking
Pipeline
url https://doi.org/10.1186/s13059-019-1905-y
work_keys_str_mv AT shujunou benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT weijasu benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT yiliao benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT kapeelchougule benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT jirehraagda benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT adamjhellinga benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT carlossantiagoblancolugo benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT tyleraelliott benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT doreenware benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT thomaspeterson benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT ningjiang benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT candicenhirsch benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
AT matthewbhufford benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline
_version_ 1724376314286178304