id ndltd-OhioLink-oai-etd.ohiolink.edu-ohiou1289950347
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ohiou12899503472021-08-03T05:46:45Z A SNP Microarray Analysis Pipeline Using Machine Learning Techniques Evans, Daniel T. Bioinformatics Biology Computer Science Genetics genome-wide association machine learning predictive model machine learning support vector machines single nucleotide polymorphisms <p>A software pipeline has been developed to aide in SNP microarray analysis in case/control genome-wide association (GWA) studies. The pipeline uses data taken from previous GWA studies from the NCBI Gene Expression Omnibus website and analyzes the SNP information from these studies to reate predictive classifiers. These classifiers attempt to accurately predict if individuals have a particular phenotype based on their genotypes. Two dierent methods were used to create these predictive models. One makes use of a popular machine learning technique, support vector machines, and the other is a simpler method that uses genotype total dierences between cases and controls. One major benefit of using the support vector machine method is the ability to integrate and consider many combinations of SNPs in a computationally inexpensive manner. </p><p>The GSE13117 dataset, which consists of mentally retarded children and their parents, and the GSE9222 dataset, which consists of autistic patients and their parents, were used to test the software pipeline. A Bayesian confidence interval was used in reporting classifier performance in addition to 5-repeated 10-fold cross-validation (5r-10cv). For the GSE9222 data set, the top performing model achieved a balanced accuracy of 70.8% and a normal accuracy of 71.7% using 5r-10cv. The model that had the distribution with the highest upper bound had a 95% confidence balanced accuracy interval of 62.1% to 75.3%. For the GSE13117 data set, the top performing classifier achieved a balanced accuracy of 56.2% and a normal accuracy of 65.7% using 5r-10cv. The model that had the distribution with the highest upper bound for the GSE13117 data set had a 95% confidence balanced accuracy interval of 49.6% to 68.3%. Such classifiers will eventually lead to new insights into disease and allow for simpler and more accurate diagnoses in the future.</p><p>The work in this thesis contains ideas and work that is a continuation of previously published abstracts and poster presentations [1, 2], unpublished class reports [3, 4], and unpublished project reports from personal correspondence [5].</p> 2010 English text Ohio University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection NDLTD
language English
sources NDLTD
topic Bioinformatics
Biology
Computer Science
Genetics
genome-wide association
machine learning
predictive model
machine learning
support vector machines
single nucleotide polymorphisms
spellingShingle Bioinformatics
Biology
Computer Science
Genetics
genome-wide association
machine learning
predictive model
machine learning
support vector machines
single nucleotide polymorphisms
Evans, Daniel T.
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
author Evans, Daniel T.
author_facet Evans, Daniel T.
author_sort Evans, Daniel T.
title A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
title_short A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
title_full A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
title_fullStr A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
title_full_unstemmed A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
title_sort snp microarray analysis pipeline using machine learning techniques
publisher Ohio University / OhioLINK
publishDate 2010
url http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347
work_keys_str_mv AT evansdanielt asnpmicroarrayanalysispipelineusingmachinelearningtechniques
AT evansdanielt snpmicroarrayanalysispipelineusingmachinelearningtechniques
_version_ 1719425158462046208