Differential networks (and other statistical issues) for the analysis of metabolomic data

Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. Howev...

Full description

Bibliographic Details
Main Author: Macleod, D.
Other Authors: De Stavola, B. L.
Published: London School of Hygiene and Tropical Medicine (University of London) 2017
Subjects:
Online Access:http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442
id ndltd-bl.uk-oai-ethos.bl.uk-713442
record_format oai_dc
spelling ndltd-bl.uk-oai-ethos.bl.uk-7134422018-08-21T03:29:50ZDifferential networks (and other statistical issues) for the analysis of metabolomic dataMacleod, D.De Stavola, B. L.2017Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. However, the high dimensionality of this type of information and its strong interdependencies raise several analytical difficulties. These difficulties were investigated, motivated by the study of 228 metabolites acquired from blood samples as part of the British Womens Heart and Health Study (BWHHS). Issues regarding transformations of the metabolomics data and their reliability were examined. Analytical methods typically adopted with high-dimensional data were reviewed, and then a more recently developed method, differential networks, was examined in detail. When investigating differential networks using simulations of three alternative data generating scenarios, it was found that an edge between two nodes can be induced if the effect of one node on disease is modified by another node, or if the disease causes (or is associated with) a "breaking down" in the relationship between the two nodes. The simulations focused on simplified settings but exemplify the difficulties in interpreting differential networks and helped elucidate the sample sizes required. Further algebraic examination of likely data generating mechanisms identified the potential pitfalls of relying on partial correlations in building differential networks. This shows that, when important nodes influencing the correlation structure are not measured, irrelevant edges may be selected, while relevant ones may be missed. Analysis of the BWHHS metabolite data flagged a small number of metabolites that could potentially be associated with CHD, with small VLDL triglycerides being the strongest candidate. Comparisons were made with the results obtained using regression-based methods as these are more easily accessible to epidemiologists. The fact that there was little overlap in identified biomarkers is an indication of the complexity of this field of research.616.1London School of Hygiene and Tropical Medicine (University of London)10.17037/PUBS.03817570http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442http://researchonline.lshtm.ac.uk/3817570/Electronic Thesis or Dissertation
collection NDLTD
sources NDLTD
topic 616.1
spellingShingle 616.1
Macleod, D.
Differential networks (and other statistical issues) for the analysis of metabolomic data
description Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. However, the high dimensionality of this type of information and its strong interdependencies raise several analytical difficulties. These difficulties were investigated, motivated by the study of 228 metabolites acquired from blood samples as part of the British Womens Heart and Health Study (BWHHS). Issues regarding transformations of the metabolomics data and their reliability were examined. Analytical methods typically adopted with high-dimensional data were reviewed, and then a more recently developed method, differential networks, was examined in detail. When investigating differential networks using simulations of three alternative data generating scenarios, it was found that an edge between two nodes can be induced if the effect of one node on disease is modified by another node, or if the disease causes (or is associated with) a "breaking down" in the relationship between the two nodes. The simulations focused on simplified settings but exemplify the difficulties in interpreting differential networks and helped elucidate the sample sizes required. Further algebraic examination of likely data generating mechanisms identified the potential pitfalls of relying on partial correlations in building differential networks. This shows that, when important nodes influencing the correlation structure are not measured, irrelevant edges may be selected, while relevant ones may be missed. Analysis of the BWHHS metabolite data flagged a small number of metabolites that could potentially be associated with CHD, with small VLDL triglycerides being the strongest candidate. Comparisons were made with the results obtained using regression-based methods as these are more easily accessible to epidemiologists. The fact that there was little overlap in identified biomarkers is an indication of the complexity of this field of research.
author2 De Stavola, B. L.
author_facet De Stavola, B. L.
Macleod, D.
author Macleod, D.
author_sort Macleod, D.
title Differential networks (and other statistical issues) for the analysis of metabolomic data
title_short Differential networks (and other statistical issues) for the analysis of metabolomic data
title_full Differential networks (and other statistical issues) for the analysis of metabolomic data
title_fullStr Differential networks (and other statistical issues) for the analysis of metabolomic data
title_full_unstemmed Differential networks (and other statistical issues) for the analysis of metabolomic data
title_sort differential networks (and other statistical issues) for the analysis of metabolomic data
publisher London School of Hygiene and Tropical Medicine (University of London)
publishDate 2017
url http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442
work_keys_str_mv AT macleodd differentialnetworksandotherstatisticalissuesfortheanalysisofmetabolomicdata
_version_ 1718726289696751616