Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles

Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/f...

Full description

Bibliographic Details
Main Authors: Anne-Christin Hauschild, Tobias Frisch, Jörg Ingo Baumbach, Jan Baumbach
Format: Article
Language:English
Published: MDPI AG 2015-06-01
Series:Metabolites
Subjects:
Online Access:http://www.mdpi.com/2218-1989/5/2/344
id doaj-f8063b7554f5486e81b2ecd649c9e5e4
record_format Article
spelling doaj-f8063b7554f5486e81b2ecd649c9e5e42020-11-24T21:18:36ZengMDPI AGMetabolites2218-19892015-06-015234436310.3390/metabo5020344metabo5020344Carotta: Revealing Hidden Confounder Markers in Metabolic Breath ProfilesAnne-Christin Hauschild0Tobias Frisch1Jörg Ingo Baumbach2Jan Baumbach3Computational Systems Biology Group, Max Planck Institute for Informatics, Saarbrücken 66123, GermanyComputational Systems Biology Group, Max Planck Institute for Informatics, Saarbrücken 66123, GermanyFaculty of Applied Chemistry, Reutlingen University, Reutlingen 72762, GermanyComputational Biology Group, Department of Mathematics and Computer Science, University of Southern Denmark, Odense 5230, DenmarkComputational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].http://www.mdpi.com/2218-1989/5/2/344breathomicsmulticapillary column/ion mobility spectrometryclusteringbreath analysis
collection DOAJ
language English
format Article
sources DOAJ
author Anne-Christin Hauschild
Tobias Frisch
Jörg Ingo Baumbach
Jan Baumbach
spellingShingle Anne-Christin Hauschild
Tobias Frisch
Jörg Ingo Baumbach
Jan Baumbach
Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
Metabolites
breathomics
multicapillary column/ion mobility spectrometry
clustering
breath analysis
author_facet Anne-Christin Hauschild
Tobias Frisch
Jörg Ingo Baumbach
Jan Baumbach
author_sort Anne-Christin Hauschild
title Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
title_short Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
title_full Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
title_fullStr Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
title_full_unstemmed Carotta: Revealing Hidden Confounder Markers in Metabolic Breath Profiles
title_sort carotta: revealing hidden confounder markers in metabolic breath profiles
publisher MDPI AG
series Metabolites
issn 2218-1989
publishDate 2015-06-01
description Computational breath analysis is a growing research area aiming at identifying volatile organic compounds (VOCs) in human breath to assist medical diagnostics of the next generation. While inexpensive and non-invasive bioanalytical technologies for metabolite detection in exhaled air and bacterial/fungal vapor exist and the first studies on the power of supervised machine learning methods for profiling of the resulting data were conducted, we lack methods to extract hidden data features emerging from confounding factors. Here, we present Carotta, a new cluster analysis framework dedicated to uncovering such hidden substructures by sophisticated unsupervised statistical learning methods. We study the power of transitivity clustering and hierarchical clustering to identify groups of VOCs with similar expression behavior over most patient breath samples and/or groups of patients with a similar VOC intensity pattern. This enables the discovery of dependencies between metabolites. On the one hand, this allows us to eliminate the effect of potential confounding factors hindering disease classification, such as smoking. On the other hand, we may also identify VOCs associated with disease subtypes or concomitant diseases. Carotta is an open source software with an intuitive graphical user interface promoting data handling, analysis and visualization. The back-end is designed to be modular, allowing for easy extensions with plugins in the future, such as new clustering methods and statistics. It does not require much prior knowledge or technical skills to operate. We demonstrate its power and applicability by means of one artificial dataset. We also apply Carotta exemplarily to a real-world example dataset on chronic obstructive pulmonary disease (COPD). While the artificial data are utilized as a proof of concept, we will demonstrate how Carotta finds candidate markers in our real dataset associated with confounders rather than the primary disease (COPD) and bronchial carcinoma (BC). Carotta is publicly available at http://carotta.compbio.sdu.dk [1].
topic breathomics
multicapillary column/ion mobility spectrometry
clustering
breath analysis
url http://www.mdpi.com/2218-1989/5/2/344
work_keys_str_mv AT annechristinhauschild carottarevealinghiddenconfoundermarkersinmetabolicbreathprofiles
AT tobiasfrisch carottarevealinghiddenconfoundermarkersinmetabolicbreathprofiles
AT jorgingobaumbach carottarevealinghiddenconfoundermarkersinmetabolicbreathprofiles
AT janbaumbach carottarevealinghiddenconfoundermarkersinmetabolicbreathprofiles
_version_ 1726008340600848384