An exploratory analysis of large health cohort study using Bayesian networks
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2006. === Includes bibliographical references (p. 91-98). === Large health cohort studies are among the most effective ways in studying the causes, treatments and outcomes of diseases by systematically collecting a wide range o...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2007
|
Subjects: | |
Online Access: | http://dspace.mit.edu/handle/1721.1/34478 http://hdl.handle.net/1721.1/34478 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-34478 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-344782019-05-02T16:06:51Z An exploratory analysis of large health cohort study using Bayesian networks Shen, Delin Peter Szolovits. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Harvard University--MIT Division of Health Sciences and Technology. Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2006. Includes bibliographical references (p. 91-98). Large health cohort studies are among the most effective ways in studying the causes, treatments and outcomes of diseases by systematically collecting a wide range of data over long periods. The wealth of data in such studies may yield important results in addition to the already numerous findings, especially when subjected to newer analytical methods. Bayesian Networks (BN) provide a relatively new method of representing uncertain relationships among variables, using the tools of probability and graph theory, and have been widely used in analyzing dependencies and the interplay between variables. We used BN to perform an exploratory analysis on a rich collection of data from one large health cohort study, the Nurses' Health Study (NHS), with the focus on breast cancer. We explored the data from the NHS using BN to look for breast cancer risk factors, including a group of Single Nucleotide Polymorphisms (SNP). We found no association between the SNPs and breast cancer, but found a dependency between clomid and breast cancer. We evaluated clomid as a potential riskfactor after matching on age and number of children. Our results showed for clomid an increased risk of estrogen receptor positive breast cancer (odds ratio 1.52, 95% CI 1.11-2.09) and a decreased risk of estrogen receptor negative breast cancer (odds ratio 0.46, 95% CI 0.22-0.97). (cont.) We developed breast cancer risk models using BN. We trained models on 75% of the data, and evaluated them on the remaining. Because of the clinical importance of predicting risks for Estrogen Receptor positive and Progesterone Receptor positive breast cancer, we focused on this specific type of breast cancer to predict two-year, four-year, and six-year risks. The concordance statistics of the prediction results on test sets are 0.70 (95% CI: 0.67-0.74), 0.68 (95% CI: 0.64-0.72), and 0.66 (95% CI: 0.62-0.69) for two, four, and six year models, respectively. We also evaluated the calibration performance of the models, and applied a filter to the output to improve the linear relationship between predicted and observed risks using Agglomerative Information Bottleneck clustering without sacrificing much discrimination performance. by Delin Shen. Ph.D. 2007-10-22T16:24:19Z 2007-10-22T16:24:19Z 2006 2006 Thesis http://dspace.mit.edu/handle/1721.1/34478 http://hdl.handle.net/1721.1/34478 70784336 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/34478 http://dspace.mit.edu/handle/1721.1/7582 98 p. application/pdf Massachusetts Institute of Technology |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Harvard University--MIT Division of Health Sciences and Technology. |
spellingShingle |
Harvard University--MIT Division of Health Sciences and Technology. Shen, Delin An exploratory analysis of large health cohort study using Bayesian networks |
description |
Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2006. === Includes bibliographical references (p. 91-98). === Large health cohort studies are among the most effective ways in studying the causes, treatments and outcomes of diseases by systematically collecting a wide range of data over long periods. The wealth of data in such studies may yield important results in addition to the already numerous findings, especially when subjected to newer analytical methods. Bayesian Networks (BN) provide a relatively new method of representing uncertain relationships among variables, using the tools of probability and graph theory, and have been widely used in analyzing dependencies and the interplay between variables. We used BN to perform an exploratory analysis on a rich collection of data from one large health cohort study, the Nurses' Health Study (NHS), with the focus on breast cancer. We explored the data from the NHS using BN to look for breast cancer risk factors, including a group of Single Nucleotide Polymorphisms (SNP). We found no association between the SNPs and breast cancer, but found a dependency between clomid and breast cancer. We evaluated clomid as a potential riskfactor after matching on age and number of children. Our results showed for clomid an increased risk of estrogen receptor positive breast cancer (odds ratio 1.52, 95% CI 1.11-2.09) and a decreased risk of estrogen receptor negative breast cancer (odds ratio 0.46, 95% CI 0.22-0.97). === (cont.) We developed breast cancer risk models using BN. We trained models on 75% of the data, and evaluated them on the remaining. Because of the clinical importance of predicting risks for Estrogen Receptor positive and Progesterone Receptor positive breast cancer, we focused on this specific type of breast cancer to predict two-year, four-year, and six-year risks. The concordance statistics of the prediction results on test sets are 0.70 (95% CI: 0.67-0.74), 0.68 (95% CI: 0.64-0.72), and 0.66 (95% CI: 0.62-0.69) for two, four, and six year models, respectively. We also evaluated the calibration performance of the models, and applied a filter to the output to improve the linear relationship between predicted and observed risks using Agglomerative Information Bottleneck clustering without sacrificing much discrimination performance. === by Delin Shen. === Ph.D. |
author2 |
Peter Szolovits. |
author_facet |
Peter Szolovits. Shen, Delin |
author |
Shen, Delin |
author_sort |
Shen, Delin |
title |
An exploratory analysis of large health cohort study using Bayesian networks |
title_short |
An exploratory analysis of large health cohort study using Bayesian networks |
title_full |
An exploratory analysis of large health cohort study using Bayesian networks |
title_fullStr |
An exploratory analysis of large health cohort study using Bayesian networks |
title_full_unstemmed |
An exploratory analysis of large health cohort study using Bayesian networks |
title_sort |
exploratory analysis of large health cohort study using bayesian networks |
publisher |
Massachusetts Institute of Technology |
publishDate |
2007 |
url |
http://dspace.mit.edu/handle/1721.1/34478 http://hdl.handle.net/1721.1/34478 |
work_keys_str_mv |
AT shendelin anexploratoryanalysisoflargehealthcohortstudyusingbayesiannetworks AT shendelin exploratoryanalysisoflargehealthcohortstudyusingbayesiannetworks |
_version_ |
1719034809832964096 |