Summary: | Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Chemical Engineering, 2008. === MIT Science Library copy printed in leaves. === Includes bibliographical references (p. 281-299). === Human societies face diverse health challenges including a rapidly aging population, rising incidence of metabolic disease, and increasing antibiotic resistance. These problems involve complex interactions between genes and environment and are often not well understood. To address these challenges, high-throughput and reproducible advances in genome sequencing, transcript measurement, and protein measurement have been developed; the information resulting from these techniques has led to an increased understanding of cellular function and the identification of number of novel biomarkers for a variety of diseases.In recent years, the monitoring of such systems-level cellular behavior has naturally extended to the metabolite level, leading to the study of metabolomics. The rise of metabolomics corresponds hand in hand with the desire to address some of the phenotypic informational gaps left behind from genomics, transcriptomics, and proteomics. The study of metabolites carries several advantages. First, the number of metabolites in the human "metabolome," estimated at 2500 metabolites, remains a tractable number for analysis as compared to the 35,000 genes and 100,000-1,000,000 proteins. Metabolites also reliably provide an instantaneous "downstream" biochemical snapshot of a cell, and the typical metabolomics analysis is carried out on relatively noninvasive patient fluids such as urine or plasma.The goal of this thesis is to design, develop, and apply methods for the metabolomic analysis of blood via gas chromatography-mass spectrometry (GC-MS) instrumentation. Despite initial successes, methods in metabolomics vary widely and have not been standardized. This was first addressed via the optimization of the instrumentation itself, a topic rarely addressed in the literature but crucial toward the reliable identification of biomarkers. === (cont) We investigated the different GC-MS parameters found to have the largest impact on data quality and employed D-optimal design to pare down the search space to a feasible number of experiments. These parameters were then optimized via response surface estimation to ensure maximum reproducibility and sensitivity of the entire metabolite mixture. The results from this optimization constitute a significant improvement upon existing methods in the literature.Next, methods were developed for the bioinformatics analysis of raw GC-MS data. Current techniques for metabolite tracking are non-systematic and typically require the laborious use of reference libraries. We developed a method to track conserved metabolites across GC-MS replicates and conditions with the optional use of reference libraries and validated it an E. coli dataset and the differential detection of metabolites in a spiked mixture. In addition, we investigated the best methods for the imputation of missing data as applied to three different metabolomics datasets; to this date, missing data imputation has not been comprehensively addressed in the metabolomics literature, and many methods currently used are needlessly inaccurate. After investigating eight different imputation methods via three deletion methods, it was concluded that k-nearest neighbor algorithms were the best and most accurate method for data imputation.Finally, the instrumental parameter optimization and metabolite tracking methods were applied to the problem of predicting patient mortality in end-stage renal disease (ESRD). Although ESRD is a complex and well-studied disease, known risk factors only account for 50% of patient deaths, and prediction accuracies for the disease remain relatively low; in addition, mortality rates in the first 90 days of dialysis treatment are double that after 90 days. === (cont.) We sought to investigate whether the addition of metabolomic information would result in increased accuracy of mortality prediction. One hundred twenty patient samples were obtained from a national dialysis study (equally representing death and survival within 90 days of starting dialysis) and analyzed according to our protocol. Two feature selection algorithms were applied to identify significant metabolites distinguishing death and survival, and the corresponding models resulted in improved receiver-operating characteristic (ROC) curve areas of 0.85 and 0.93. This result constitutes a significant improvement from existing clinical models, which at best result in ROC curve areas of 0.80. Based on this work, we hypothesize that our observed differential fatty acid concentrations are indicative of impaired fatty acid oxidation, leading to insulin resistance in ESRD patients (regardless of Type II diabetes status) and eventually, patient mortality. === by Lily Victoria Tong. === Ph.D.
|