Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline
Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous m...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-12-01
|
Series: | Genome Biology |
Subjects: | |
Online Access: | https://doi.org/10.1186/s13059-019-1905-y |
id |
doaj-e5d6f89b34fa4f84a53ead58e3f77d1c |
---|---|
record_format |
Article |
spelling |
doaj-e5d6f89b34fa4f84a53ead58e3f77d1c2020-12-20T12:39:43ZengBMCGenome Biology1474-760X2019-12-0120111810.1186/s13059-019-1905-yBenchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipelineShujun Ou0Weija Su1Yi Liao2Kapeel Chougule3Jireh R. A. Agda4Adam J. Hellinga5Carlos Santiago Blanco Lugo6Tyler A. Elliott7Doreen Ware8Thomas Peterson9Ning Jiang10Candice N. Hirsch11Matthew B. Hufford12Department of Ecology, Evolution, and Organismal Biology, Iowa State UniversityDepartment of Genetics, Development, and Cell Biology, Iowa State UniversityDepartment of Ecology and Evolutionary Biology, University of CaliforniaCold Spring Harbor LaboratoryCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCentre for Biodiversity Genomics, University of GuelphCold Spring Harbor LaboratoryDepartment of Genetics, Development, and Cell Biology, Iowa State UniversityDepartment of Horticulture, Michigan State UniversityDepartment of Agronomy and Plant Genetics, University of MinnesotaDepartment of Ecology, Evolution, and Organismal Biology, Iowa State UniversityAbstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F 1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA.https://doi.org/10.1186/s13059-019-1905-yTransposable elementAnnotationGenomeBenchmarkingPipeline |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shujun Ou Weija Su Yi Liao Kapeel Chougule Jireh R. A. Agda Adam J. Hellinga Carlos Santiago Blanco Lugo Tyler A. Elliott Doreen Ware Thomas Peterson Ning Jiang Candice N. Hirsch Matthew B. Hufford |
spellingShingle |
Shujun Ou Weija Su Yi Liao Kapeel Chougule Jireh R. A. Agda Adam J. Hellinga Carlos Santiago Blanco Lugo Tyler A. Elliott Doreen Ware Thomas Peterson Ning Jiang Candice N. Hirsch Matthew B. Hufford Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline Genome Biology Transposable element Annotation Genome Benchmarking Pipeline |
author_facet |
Shujun Ou Weija Su Yi Liao Kapeel Chougule Jireh R. A. Agda Adam J. Hellinga Carlos Santiago Blanco Lugo Tyler A. Elliott Doreen Ware Thomas Peterson Ning Jiang Candice N. Hirsch Matthew B. Hufford |
author_sort |
Shujun Ou |
title |
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_short |
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_full |
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_fullStr |
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_full_unstemmed |
Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
title_sort |
benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline |
publisher |
BMC |
series |
Genome Biology |
issn |
1474-760X |
publishDate |
2019-12-01 |
description |
Abstract Background Sequencing technology and assembly algorithms have matured to the point that high-quality de novo assembly is possible for large, repetitive genomes. Current assemblies traverse transposable elements (TEs) and provide an opportunity for comprehensive annotation of TEs. Numerous methods exist for annotation of each class of TEs, but their relative performances have not been systematically compared. Moreover, a comprehensive pipeline is needed to produce a non-redundant library of TEs for species lacking this resource to generate whole-genome TE annotations. Results We benchmark existing programs based on a carefully curated library of rice TEs. We evaluate the performance of methods annotating long terminal repeat (LTR) retrotransposons, terminal inverted repeat (TIR) transposons, short TIR transposons known as miniature inverted transposable elements (MITEs), and Helitrons. Performance metrics include sensitivity, specificity, accuracy, precision, FDR, and F 1. Using the most robust programs, we create a comprehensive pipeline called Extensive de-novo TE Annotator (EDTA) that produces a filtered non-redundant TE library for annotation of structurally intact and fragmented elements. EDTA also deconvolutes nested TE insertions frequently found in highly repetitive genomic regions. Using other model species with curated TE libraries (maize and Drosophila), EDTA is shown to be robust across both plant and animal species. Conclusions The benchmarking results and pipeline developed here will greatly facilitate TE annotation in eukaryotic genomes. These annotations will promote a much more in-depth understanding of the diversity and evolution of TEs at both intra- and inter-species levels. EDTA is open-source and freely available: https://github.com/oushujun/EDTA. |
topic |
Transposable element Annotation Genome Benchmarking Pipeline |
url |
https://doi.org/10.1186/s13059-019-1905-y |
work_keys_str_mv |
AT shujunou benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT weijasu benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT yiliao benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT kapeelchougule benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT jirehraagda benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT adamjhellinga benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT carlossantiagoblancolugo benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT tyleraelliott benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT doreenware benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT thomaspeterson benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT ningjiang benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT candicenhirsch benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline AT matthewbhufford benchmarkingtransposableelementannotationmethodsforcreationofastreamlinedcomprehensivepipeline |
_version_ |
1724376314286178304 |