ENVirT: inference of ecological characteristics of viruses from metagenomic data

Abstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community bef...

Full description

Bibliographic Details
Main Authors: Duleepa Jayasundara, Damayanthi Herath, Damith Senanayake, Isaam Saeed, Cheng-Yu Yang, Yuan Sun, Bill C. Chang, Sen-Lin Tang, Saman K. Halgamuge
Format: Article
Language:English
Published: BMC 2019-02-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2398-5
id doaj-a343cc90248342818e4e36ba821b3e3d
record_format Article
spelling doaj-a343cc90248342818e4e36ba821b3e3d2020-11-25T02:40:31ZengBMCBMC Bioinformatics1471-21052019-02-0119S1311210.1186/s12859-018-2398-5ENVirT: inference of ecological characteristics of viruses from metagenomic dataDuleepa Jayasundara0Damayanthi Herath1Damith Senanayake2Isaam Saeed3Cheng-Yu Yang4Yuan Sun5Bill C. Chang6Sen-Lin Tang7Saman K. Halgamuge8School of Public Health and Community Medicine, University of New South WalesOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneBiodiversity Research Center, Academia SinicaOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneYourgene BioscienceBiodiversity Research Center, Academia SinicaOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneAbstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.http://link.springer.com/article/10.1186/s12859-018-2398-5Richness estimationViral metagenomicsAverage genome length
collection DOAJ
language English
format Article
sources DOAJ
author Duleepa Jayasundara
Damayanthi Herath
Damith Senanayake
Isaam Saeed
Cheng-Yu Yang
Yuan Sun
Bill C. Chang
Sen-Lin Tang
Saman K. Halgamuge
spellingShingle Duleepa Jayasundara
Damayanthi Herath
Damith Senanayake
Isaam Saeed
Cheng-Yu Yang
Yuan Sun
Bill C. Chang
Sen-Lin Tang
Saman K. Halgamuge
ENVirT: inference of ecological characteristics of viruses from metagenomic data
BMC Bioinformatics
Richness estimation
Viral metagenomics
Average genome length
author_facet Duleepa Jayasundara
Damayanthi Herath
Damith Senanayake
Isaam Saeed
Cheng-Yu Yang
Yuan Sun
Bill C. Chang
Sen-Lin Tang
Saman K. Halgamuge
author_sort Duleepa Jayasundara
title ENVirT: inference of ecological characteristics of viruses from metagenomic data
title_short ENVirT: inference of ecological characteristics of viruses from metagenomic data
title_full ENVirT: inference of ecological characteristics of viruses from metagenomic data
title_fullStr ENVirT: inference of ecological characteristics of viruses from metagenomic data
title_full_unstemmed ENVirT: inference of ecological characteristics of viruses from metagenomic data
title_sort envirt: inference of ecological characteristics of viruses from metagenomic data
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2019-02-01
description Abstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.
topic Richness estimation
Viral metagenomics
Average genome length
url http://link.springer.com/article/10.1186/s12859-018-2398-5
work_keys_str_mv AT duleepajayasundara envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT damayanthiherath envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT damithsenanayake envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT isaamsaeed envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT chengyuyang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT yuansun envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT billcchang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT senlintang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
AT samankhalgamuge envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata
_version_ 1724781139320635392