ENVirT: inference of ecological characteristics of viruses from metagenomic data
Abstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community bef...
Main Authors: | , , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2019-02-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2398-5 |
id |
doaj-a343cc90248342818e4e36ba821b3e3d |
---|---|
record_format |
Article |
spelling |
doaj-a343cc90248342818e4e36ba821b3e3d2020-11-25T02:40:31ZengBMCBMC Bioinformatics1471-21052019-02-0119S1311210.1186/s12859-018-2398-5ENVirT: inference of ecological characteristics of viruses from metagenomic dataDuleepa Jayasundara0Damayanthi Herath1Damith Senanayake2Isaam Saeed3Cheng-Yu Yang4Yuan Sun5Bill C. Chang6Sen-Lin Tang7Saman K. Halgamuge8School of Public Health and Community Medicine, University of New South WalesOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneBiodiversity Research Center, Academia SinicaOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneYourgene BioscienceBiodiversity Research Center, Academia SinicaOptimisation and Pattern Recognition Research Group, Department of Mechanical Engineering, Melbourne School of Engineering, The University of MelbourneAbstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations.http://link.springer.com/article/10.1186/s12859-018-2398-5Richness estimationViral metagenomicsAverage genome length |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Duleepa Jayasundara Damayanthi Herath Damith Senanayake Isaam Saeed Cheng-Yu Yang Yuan Sun Bill C. Chang Sen-Lin Tang Saman K. Halgamuge |
spellingShingle |
Duleepa Jayasundara Damayanthi Herath Damith Senanayake Isaam Saeed Cheng-Yu Yang Yuan Sun Bill C. Chang Sen-Lin Tang Saman K. Halgamuge ENVirT: inference of ecological characteristics of viruses from metagenomic data BMC Bioinformatics Richness estimation Viral metagenomics Average genome length |
author_facet |
Duleepa Jayasundara Damayanthi Herath Damith Senanayake Isaam Saeed Cheng-Yu Yang Yuan Sun Bill C. Chang Sen-Lin Tang Saman K. Halgamuge |
author_sort |
Duleepa Jayasundara |
title |
ENVirT: inference of ecological characteristics of viruses from metagenomic data |
title_short |
ENVirT: inference of ecological characteristics of viruses from metagenomic data |
title_full |
ENVirT: inference of ecological characteristics of viruses from metagenomic data |
title_fullStr |
ENVirT: inference of ecological characteristics of viruses from metagenomic data |
title_full_unstemmed |
ENVirT: inference of ecological characteristics of viruses from metagenomic data |
title_sort |
envirt: inference of ecological characteristics of viruses from metagenomic data |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2019-02-01 |
description |
Abstract Background Estimating the parameters that describe the ecology of viruses,particularly those that are novel, can be made possible using metagenomic approaches. However, the best-performing existing methods require databases to first estimate an average genome length of a viral community before being able to estimate other parameters, such as viral richness. Although this approach has been widely used, it can adversely skew results since the majority of viruses are yet to be catalogued in databases. Results In this paper, we present ENVirT, a method for estimating the richness of novel viral mixtures, and for the first time we also show that it is possible to simultaneously estimate the average genome length without a priori information. This is shown to be a significant improvement over database-dependent methods, since we can now robustly analyze samples that may include novel viral types under-represented in current databases. We demonstrate that the viral richness estimates produced by ENVirT are several orders of magnitude higher in accuracy than the estimates produced by existing methods named PHACCS and CatchAll when benchmarked against simulated data. We repeated the analysis of 20 metavirome samples using ENVirT, which produced results in close agreement with complementary in virto analyses. Conclusions These insights were previously not captured by existing computational methods. As such, ENVirT is shown to be an essential tool for enhancing our understanding of novel viral populations. |
topic |
Richness estimation Viral metagenomics Average genome length |
url |
http://link.springer.com/article/10.1186/s12859-018-2398-5 |
work_keys_str_mv |
AT duleepajayasundara envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT damayanthiherath envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT damithsenanayake envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT isaamsaeed envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT chengyuyang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT yuansun envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT billcchang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT senlintang envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata AT samankhalgamuge envirtinferenceofecologicalcharacteristicsofvirusesfrommetagenomicdata |
_version_ |
1724781139320635392 |