Differential networks (and other statistical issues) for the analysis of metabolomic data
Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. Howev...
Main Author: | |
---|---|
Other Authors: | |
Published: |
London School of Hygiene and Tropical Medicine (University of London)
2017
|
Subjects: | |
Online Access: | http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442 |
id |
ndltd-bl.uk-oai-ethos.bl.uk-713442 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bl.uk-oai-ethos.bl.uk-7134422018-08-21T03:29:50ZDifferential networks (and other statistical issues) for the analysis of metabolomic dataMacleod, D.De Stavola, B. L.2017Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. However, the high dimensionality of this type of information and its strong interdependencies raise several analytical difficulties. These difficulties were investigated, motivated by the study of 228 metabolites acquired from blood samples as part of the British Womens Heart and Health Study (BWHHS). Issues regarding transformations of the metabolomics data and their reliability were examined. Analytical methods typically adopted with high-dimensional data were reviewed, and then a more recently developed method, differential networks, was examined in detail. When investigating differential networks using simulations of three alternative data generating scenarios, it was found that an edge between two nodes can be induced if the effect of one node on disease is modified by another node, or if the disease causes (or is associated with) a "breaking down" in the relationship between the two nodes. The simulations focused on simplified settings but exemplify the difficulties in interpreting differential networks and helped elucidate the sample sizes required. Further algebraic examination of likely data generating mechanisms identified the potential pitfalls of relying on partial correlations in building differential networks. This shows that, when important nodes influencing the correlation structure are not measured, irrelevant edges may be selected, while relevant ones may be missed. Analysis of the BWHHS metabolite data flagged a small number of metabolites that could potentially be associated with CHD, with small VLDL triglycerides being the strongest candidate. Comparisons were made with the results obtained using regression-based methods as these are more easily accessible to epidemiologists. The fact that there was little overlap in identified biomarkers is an indication of the complexity of this field of research.616.1London School of Hygiene and Tropical Medicine (University of London)10.17037/PUBS.03817570http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442http://researchonline.lshtm.ac.uk/3817570/Electronic Thesis or Dissertation |
collection |
NDLTD |
sources |
NDLTD |
topic |
616.1 |
spellingShingle |
616.1 Macleod, D. Differential networks (and other statistical issues) for the analysis of metabolomic data |
description |
Coronary heart disease (CHD) is the leading cause of death in the UK. Recent technological advances in metabolomics have the potential to contribute to further the understanding of CHD, especially because they are facilitating the collection of metabolomics data in large observational studies. However, the high dimensionality of this type of information and its strong interdependencies raise several analytical difficulties. These difficulties were investigated, motivated by the study of 228 metabolites acquired from blood samples as part of the British Womens Heart and Health Study (BWHHS). Issues regarding transformations of the metabolomics data and their reliability were examined. Analytical methods typically adopted with high-dimensional data were reviewed, and then a more recently developed method, differential networks, was examined in detail. When investigating differential networks using simulations of three alternative data generating scenarios, it was found that an edge between two nodes can be induced if the effect of one node on disease is modified by another node, or if the disease causes (or is associated with) a "breaking down" in the relationship between the two nodes. The simulations focused on simplified settings but exemplify the difficulties in interpreting differential networks and helped elucidate the sample sizes required. Further algebraic examination of likely data generating mechanisms identified the potential pitfalls of relying on partial correlations in building differential networks. This shows that, when important nodes influencing the correlation structure are not measured, irrelevant edges may be selected, while relevant ones may be missed. Analysis of the BWHHS metabolite data flagged a small number of metabolites that could potentially be associated with CHD, with small VLDL triglycerides being the strongest candidate. Comparisons were made with the results obtained using regression-based methods as these are more easily accessible to epidemiologists. The fact that there was little overlap in identified biomarkers is an indication of the complexity of this field of research. |
author2 |
De Stavola, B. L. |
author_facet |
De Stavola, B. L. Macleod, D. |
author |
Macleod, D. |
author_sort |
Macleod, D. |
title |
Differential networks (and other statistical issues) for the analysis of metabolomic data |
title_short |
Differential networks (and other statistical issues) for the analysis of metabolomic data |
title_full |
Differential networks (and other statistical issues) for the analysis of metabolomic data |
title_fullStr |
Differential networks (and other statistical issues) for the analysis of metabolomic data |
title_full_unstemmed |
Differential networks (and other statistical issues) for the analysis of metabolomic data |
title_sort |
differential networks (and other statistical issues) for the analysis of metabolomic data |
publisher |
London School of Hygiene and Tropical Medicine (University of London) |
publishDate |
2017 |
url |
http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.713442 |
work_keys_str_mv |
AT macleodd differentialnetworksandotherstatisticalissuesfortheanalysisofmetabolomicdata |
_version_ |
1718726289696751616 |