Regulatory Genes Prediction with Microarray Data and Ontology

博士 === 淡江大學 === 資訊工程學系博士班 === 99 === Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. However, microarray gene expression data often contain multiple missing expression values due to many reasons. Effective methods to imput...

Full description

Bibliographic Details
Main Authors: Chao-Hsun Yang, 楊朝勛
Other Authors: 許輝煌
Format: Others
Language:en_US
Published: 2011
Online Access:http://ndltd.ncl.edu.tw/handle/96290397700854506547
id ndltd-TW-099TKU05392007
record_format oai_dc
spelling ndltd-TW-099TKU053920072015-10-30T04:10:10Z http://ndltd.ncl.edu.tw/handle/96290397700854506547 Regulatory Genes Prediction with Microarray Data and Ontology 使用微陣列資料與本體論於基因調控關係預測 Chao-Hsun Yang 楊朝勛 博士 淡江大學 資訊工程學系博士班 99 Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. However, microarray gene expression data often contain multiple missing expression values due to many reasons. Effective methods to impute these missing values are needed since many algorithms for microarray data analysis require a complete matrix of gene expression values. In addition, selecting informative genes from microarray gene expression data is essential while performing data analysis on these large amounts of data. To fit this need, a number of methods were proposed from various points of view. However, most existing methods have their limitations and disadvantages. In this dissertation, we propose a novel approach to predict potential regulatory gene pairs through our distance measurement that estimates the distances between gene pairs effectively. The distance measurement is based on the dynamic time warping (DTW) algorithm and the well-defined gene ontology (GO) structure for genes or proteins. GO contains definition (annotations) for genes that describe the biological meanings of them. The semantic distance of two genes within biological aspect can be measured by performing proper quantitative assessments of their corresponding GO annotations. Our distance measurement takes both DTW distances of expression values and GO semantic distances of gene pairs into consideration. Besides, we also propose a novel missing value imputation approach by combining our distance measurement with the k-nearest neighbor (KNN) method. Experimental results show that our missing value imputation approach outperforms other major methods in terms of the commonly-used assessment. After missing values in microarray time series raw data are estimated effectively with our imputation approach, we then perform our gene regulation prediction approach. According to experimental results, our approach can discover more known regulatory gene pairs compared with other methods. Researches on microarray time series data can hence be improved and facilitated with our approaches. 許輝煌 2011 學位論文 ; thesis 121 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 淡江大學 === 資訊工程學系博士班 === 99 === Microarray technology provides an opportunity for scientists to analyze thousands of gene expression profiles simultaneously. However, microarray gene expression data often contain multiple missing expression values due to many reasons. Effective methods to impute these missing values are needed since many algorithms for microarray data analysis require a complete matrix of gene expression values. In addition, selecting informative genes from microarray gene expression data is essential while performing data analysis on these large amounts of data. To fit this need, a number of methods were proposed from various points of view. However, most existing methods have their limitations and disadvantages. In this dissertation, we propose a novel approach to predict potential regulatory gene pairs through our distance measurement that estimates the distances between gene pairs effectively. The distance measurement is based on the dynamic time warping (DTW) algorithm and the well-defined gene ontology (GO) structure for genes or proteins. GO contains definition (annotations) for genes that describe the biological meanings of them. The semantic distance of two genes within biological aspect can be measured by performing proper quantitative assessments of their corresponding GO annotations. Our distance measurement takes both DTW distances of expression values and GO semantic distances of gene pairs into consideration. Besides, we also propose a novel missing value imputation approach by combining our distance measurement with the k-nearest neighbor (KNN) method. Experimental results show that our missing value imputation approach outperforms other major methods in terms of the commonly-used assessment. After missing values in microarray time series raw data are estimated effectively with our imputation approach, we then perform our gene regulation prediction approach. According to experimental results, our approach can discover more known regulatory gene pairs compared with other methods. Researches on microarray time series data can hence be improved and facilitated with our approaches.
author2 許輝煌
author_facet 許輝煌
Chao-Hsun Yang
楊朝勛
author Chao-Hsun Yang
楊朝勛
spellingShingle Chao-Hsun Yang
楊朝勛
Regulatory Genes Prediction with Microarray Data and Ontology
author_sort Chao-Hsun Yang
title Regulatory Genes Prediction with Microarray Data and Ontology
title_short Regulatory Genes Prediction with Microarray Data and Ontology
title_full Regulatory Genes Prediction with Microarray Data and Ontology
title_fullStr Regulatory Genes Prediction with Microarray Data and Ontology
title_full_unstemmed Regulatory Genes Prediction with Microarray Data and Ontology
title_sort regulatory genes prediction with microarray data and ontology
publishDate 2011
url http://ndltd.ncl.edu.tw/handle/96290397700854506547
work_keys_str_mv AT chaohsunyang regulatorygenespredictionwithmicroarraydataandontology
AT yángcháoxūn regulatorygenespredictionwithmicroarraydataandontology
AT chaohsunyang shǐyòngwēizhènlièzīliàoyǔběntǐlùnyújīyīndiàokòngguānxìyùcè
AT yángcháoxūn shǐyòngwēizhènlièzīliàoyǔběntǐlùnyújīyīndiàokòngguānxìyùcè
_version_ 1718116793289015296