TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach

<p>Abstract</p> <p>Background</p> <p>Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potentia...

Full description

Bibliographic Details
Main Authors: Goesmann Alexander, Krause Lutz, Diaz Naryttza N, Niehaus Karsten, Nattkemper Tim W
Format: Article
Language:English
Published: BMC 2009-02-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/10/56
id doaj-b4f99be8d4554dfa84c7ab01829b054e
record_format Article
spelling doaj-b4f99be8d4554dfa84c7ab01829b054e2020-11-24T22:16:23ZengBMCBMC Bioinformatics1471-21052009-02-011015610.1186/1471-2105-10-56TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approachGoesmann AlexanderKrause LutzDiaz Naryttza NNiehaus KarstenNattkemper Tim W<p>Abstract</p> <p>Background</p> <p>Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the <it>k</it>-nearest neighbor with strategies from kernel-based learning.</p> <p>Results</p> <p>Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained.</p> <p>Conclusion</p> <p>An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.</p> http://www.biomedcentral.com/1471-2105/10/56
collection DOAJ
language English
format Article
sources DOAJ
author Goesmann Alexander
Krause Lutz
Diaz Naryttza N
Niehaus Karsten
Nattkemper Tim W
spellingShingle Goesmann Alexander
Krause Lutz
Diaz Naryttza N
Niehaus Karsten
Nattkemper Tim W
TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
BMC Bioinformatics
author_facet Goesmann Alexander
Krause Lutz
Diaz Naryttza N
Niehaus Karsten
Nattkemper Tim W
author_sort Goesmann Alexander
title TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
title_short TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
title_full TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
title_fullStr TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
title_full_unstemmed TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
title_sort tacoa – taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2009-02-01
description <p>Abstract</p> <p>Background</p> <p>Metagenomics, or the sequencing and analysis of collective genomes (metagenomes) of microorganisms isolated from an environment, promises direct access to the "unculturable majority". This emerging field offers the potential to lay solid basis on our understanding of the entire living world. However, the taxonomic classification is an essential task in the analysis of metagenomics data sets that it is still far from being solved. We present a novel strategy to predict the taxonomic origin of environmental genomic fragments. The proposed classifier combines the idea of the <it>k</it>-nearest neighbor with strategies from kernel-based learning.</p> <p>Results</p> <p>Our novel strategy was extensively evaluated using the leave-one-out cross validation strategy on fragments of variable length (800 bp – 50 Kbp) from 373 completely sequenced genomes. TACOA is able to classify genomic fragments of length 800 bp and 1 Kbp with high accuracy until rank class. For longer fragments ≥ 3 Kbp accurate predictions are made at even deeper taxonomic ranks (order and genus). Remarkably, TACOA also produces reliable results when the taxonomic origin of a fragment is not represented in the reference set, thus classifying such fragments to its known broader taxonomic class or simply as "unknown". We compared the classification accuracy of TACOA with the latest intrinsic classifier PhyloPythia using 63 recently published complete genomes. For fragments of length 800 bp and 1 Kbp the overall accuracy of TACOA is higher than that obtained by PhyloPythia at all taxonomic ranks. For all fragment lengths, both methods achieved comparable high specificity results up to rank class and low false negative rates are also obtained.</p> <p>Conclusion</p> <p>An accurate multi-class taxonomic classifier was developed for environmental genomic fragments. TACOA can predict with high reliability the taxonomic origin of genomic fragments as short as 800 bp. The proposed method is transparent, fast, accurate and the reference set can be easily updated as newly sequenced genomes become available. Moreover, the method demonstrated to be competitive when compared to the most current classifier PhyloPythia and has the advantage that it can be locally installed and the reference set can be kept up-to-date.</p>
url http://www.biomedcentral.com/1471-2105/10/56
work_keys_str_mv AT goesmannalexander tacoataxonomicclassificationofenvironmentalgenomicfragmentsusingakernelizednearestneighborapproach
AT krauselutz tacoataxonomicclassificationofenvironmentalgenomicfragmentsusingakernelizednearestneighborapproach
AT diaznaryttzan tacoataxonomicclassificationofenvironmentalgenomicfragmentsusingakernelizednearestneighborapproach
AT niehauskarsten tacoataxonomicclassificationofenvironmentalgenomicfragmentsusingakernelizednearestneighborapproach
AT nattkempertimw tacoataxonomicclassificationofenvironmentalgenomicfragmentsusingakernelizednearestneighborapproach
_version_ 1725790195708592128