A SNP Microarray Analysis Pipeline Using Machine Learning Techniques
Main Author: | |
---|---|
Language: | English |
Published: |
Ohio University / OhioLINK
2010
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-ohiou1289950347 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-ohiou12899503472021-08-03T05:46:45Z A SNP Microarray Analysis Pipeline Using Machine Learning Techniques Evans, Daniel T. Bioinformatics Biology Computer Science Genetics genome-wide association machine learning predictive model machine learning support vector machines single nucleotide polymorphisms <p>A software pipeline has been developed to aide in SNP microarray analysis in case/control genome-wide association (GWA) studies. The pipeline uses data taken from previous GWA studies from the NCBI Gene Expression Omnibus website and analyzes the SNP information from these studies to reate predictive classifiers. These classifiers attempt to accurately predict if individuals have a particular phenotype based on their genotypes. Two dierent methods were used to create these predictive models. One makes use of a popular machine learning technique, support vector machines, and the other is a simpler method that uses genotype total dierences between cases and controls. One major benefit of using the support vector machine method is the ability to integrate and consider many combinations of SNPs in a computationally inexpensive manner. </p><p>The GSE13117 dataset, which consists of mentally retarded children and their parents, and the GSE9222 dataset, which consists of autistic patients and their parents, were used to test the software pipeline. A Bayesian confidence interval was used in reporting classifier performance in addition to 5-repeated 10-fold cross-validation (5r-10cv). For the GSE9222 data set, the top performing model achieved a balanced accuracy of 70.8% and a normal accuracy of 71.7% using 5r-10cv. The model that had the distribution with the highest upper bound had a 95% confidence balanced accuracy interval of 62.1% to 75.3%. For the GSE13117 data set, the top performing classifier achieved a balanced accuracy of 56.2% and a normal accuracy of 65.7% using 5r-10cv. The model that had the distribution with the highest upper bound for the GSE13117 data set had a 95% confidence balanced accuracy interval of 49.6% to 68.3%. Such classifiers will eventually lead to new insights into disease and allow for simpler and more accurate diagnoses in the future.</p><p>The work in this thesis contains ideas and work that is a continuation of previously published abstracts and poster presentations [1, 2], unpublished class reports [3, 4], and unpublished project reports from personal correspondence [5].</p> 2010 English text Ohio University / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Bioinformatics Biology Computer Science Genetics genome-wide association machine learning predictive model machine learning support vector machines single nucleotide polymorphisms |
spellingShingle |
Bioinformatics Biology Computer Science Genetics genome-wide association machine learning predictive model machine learning support vector machines single nucleotide polymorphisms Evans, Daniel T. A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
author |
Evans, Daniel T. |
author_facet |
Evans, Daniel T. |
author_sort |
Evans, Daniel T. |
title |
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
title_short |
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
title_full |
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
title_fullStr |
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
title_full_unstemmed |
A SNP Microarray Analysis Pipeline Using Machine Learning Techniques |
title_sort |
snp microarray analysis pipeline using machine learning techniques |
publisher |
Ohio University / OhioLINK |
publishDate |
2010 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1289950347 |
work_keys_str_mv |
AT evansdanielt asnpmicroarrayanalysispipelineusingmachinelearningtechniques AT evansdanielt snpmicroarrayanalysispipelineusingmachinelearningtechniques |
_version_ |
1719425158462046208 |