A new pipeline for structural characterization and classification of RNA-Seq microbiome data

Abstract Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the...

Full description

Bibliographic Details
Main Authors:	Sebastian Racedo, Ivan Portnoy, Jorge I. Vélez, Homero San-Juan-Vergara, Marco Sanjuan, Eduardo Zurek
Format:	Article
Language:	English
Published:	BMC 2021-07-01
Series:	BioData Mining
Subjects:	Microbial communities Compositional nature Classification method 16 rRNA sequencing
Online Access:	https://doi.org/10.1186/s13040-021-00266-7

id	doaj-9840b2dce01448abbc30df90ac3e6a54
record_format	Article
spelling	doaj-9840b2dce01448abbc30df90ac3e6a542021-07-11T11:04:28ZengBMCBioData Mining1756-03812021-07-0114111810.1186/s13040-021-00266-7A new pipeline for structural characterization and classification of RNA-Seq microbiome dataSebastian Racedo0Ivan Portnoy1Jorge I. Vélez2Homero San-Juan-Vergara3Marco Sanjuan4Eduardo Zurek5Universidad del NorteUniversidad del NorteUniversidad del NorteUniversidad del NorteUniversidad del NorteUniversidad del NorteAbstract Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.https://doi.org/10.1186/s13040-021-00266-7Microbial communitiesCompositional natureClassification method16 rRNA sequencing
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Sebastian Racedo Ivan Portnoy Jorge I. Vélez Homero San-Juan-Vergara Marco Sanjuan Eduardo Zurek
spellingShingle	Sebastian Racedo Ivan Portnoy Jorge I. Vélez Homero San-Juan-Vergara Marco Sanjuan Eduardo Zurek A new pipeline for structural characterization and classification of RNA-Seq microbiome data BioData Mining Microbial communities Compositional nature Classification method 16 rRNA sequencing
author_facet	Sebastian Racedo Ivan Portnoy Jorge I. Vélez Homero San-Juan-Vergara Marco Sanjuan Eduardo Zurek
author_sort	Sebastian Racedo
title	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_short	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_fullStr	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_full_unstemmed	A new pipeline for structural characterization and classification of RNA-Seq microbiome data
title_sort	new pipeline for structural characterization and classification of rna-seq microbiome data
publisher	BMC
series	BioData Mining
issn	1756-0381
publishDate	2021-07-01
description	Abstract Background High-throughput sequencing enables the analysis of the composition of numerous biological systems, such as microbial communities. The identification of dependencies within these systems requires the analysis and assimilation of the underlying interaction patterns between all the variables that make up that system. However, this task poses a challenge when considering the compositional nature of the data coming from DNA-sequencing experiments because traditional interaction metrics (e.g., correlation) produce unreliable results when analyzing relative fractions instead of absolute abundances. The compositionality-associated challenges extend to the classification task, as it usually involves the characterization of the interactions between the principal descriptive variables of the datasets. The classification of new samples/patients into binary categories corresponding to dissimilar biological settings or phenotypes (e.g., control and cases) could help researchers in the development of treatments/drugs. Results Here, we develop and exemplify a new approach, applicable to compositional data, for the classification of new samples into two groups with different biological settings. We propose a new metric to characterize and quantify the overall correlation structure deviation between these groups and a technique for dimensionality reduction to facilitate graphical representation. We conduct simulation experiments with synthetic data to assess the proposed method’s classification accuracy. Moreover, we illustrate the performance of the proposed approach using Operational Taxonomic Unit (OTU) count tables obtained through 16S rRNA gene sequencing data from two microbiota experiments. Also, compare our method’s performance with that of two state-of-the-art methods. Conclusions Simulation experiments show that our method achieves a classification accuracy equal to or greater than 98% when using synthetic data. Finally, our method outperforms the other classification methods with real datasets from gene sequencing experiments.
topic	Microbial communities Compositional nature Classification method 16 rRNA sequencing
url	https://doi.org/10.1186/s13040-021-00266-7
work_keys_str_mv	AT sebastianracedo anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT ivanportnoy anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT jorgeivelez anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT homerosanjuanvergara anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT marcosanjuan anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT eduardozurek anewpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT sebastianracedo newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT ivanportnoy newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT jorgeivelez newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT homerosanjuanvergara newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT marcosanjuan newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata AT eduardozurek newpipelineforstructuralcharacterizationandclassificationofrnaseqmicrobiomedata
_version_	1721309357333807104

A new pipeline for structural characterization and classification of RNA-Seq microbiome data

Similar Items