Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation

碩士 === 國立成功大學 === 資訊工程研究所 === 89 === This research presents a new and efficient clustering analysis approach that is suitable for gene expression analysis. The proposed approach is primarily based on similarity-matrix based clustering techniques and complemented by new heuristics for reducing comput...

Full description

Bibliographic Details
Main Authors: Ching-Pin Kao, 高慶斌
Other Authors: Shin-Mu Tseng
Format: Others
Language:zh-TW
Published: 2001
Online Access:http://ndltd.ncl.edu.tw/handle/75711478633378758641
id ndltd-TW-089NCKU0392040
record_format oai_dc
spelling ndltd-TW-089NCKU03920402016-01-29T04:27:54Z http://ndltd.ncl.edu.tw/handle/75711478633378758641 Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation 應用於基因表現探勘之高效率叢集方法及其效能評估 Ching-Pin Kao 高慶斌 碩士 國立成功大學 資訊工程研究所 89 This research presents a new and efficient clustering analysis approach that is suitable for gene expression analysis. The proposed approach is primarily based on similarity-matrix based clustering techniques and complemented by new heuristics for reducing computations. In addition, a validation technique is integrated for verifying the quality of clustering results. The main features of the proposed approach are as follows: 1) High cluster quality: A near-optimal clustering result can be provided by integrating a validation technique with clustering methods, 2) High efficiency: By using new heuristics and a threshold range reduction method, the clustering process can be conducted very efficiently, 3) Automation: The whole analysis process is conducted automatically without requesting the users to adjust any parameters. Through experimental evaluation, the proposed clustering approach outperforms other approaches no matter in execution efficiency or quality of clustering results under various kinds of gene expression data. The experimental results also show that the quality of the clustering result generated by our approach is very close to the optimal value. Therefore, this research presents valuable studies and methods for biologist in conducting gene expression analysis. 1.1 基因表現探勘簡介....................................1 1.2 研究動機............................................2 1.3 研究目的............................................3 1.4 本論文內容與架構....................................3 第二章 相關研究..........................................5 2.1 相似度量測..........................................5 2.1.1 距離量測..........................................6 2.1.2 相關係數..........................................6 2.1.3 關聯係數..........................................7 2.1.4 機率相似係數......................................8 2.2 叢集方法............................................9 2.2.1 階層式叢集方法....................................9 2.2.1.1 階層式凝聚叢集方法.............................10 2.2.2 分割式叢集方法...................................12 2.2.2.1 K-Means........................................13 2.2.2.2 K-Medois.......................................14 2.2.3 密度基礎的叢集方法...............................15 2.2.3.1 叢集相似度搜尋技術.............................15 2.2.4 格子基礎的叢集方法...............................18 2.2.5 模型基礎的叢集方法...............................19 2.2.6 分離物分析.......................................19 2.3 驗證技術...........................................20 2.3.1 Hubert Γ統計.....................................21 第三章 現存方法之評估...................................22 3.1 相似度量測之評估...................................22 3.2 叢集方法之評估.....................................23 3.2.1 階層式叢集方法之評估.............................23 3.2.2 分割式叢集方法之評估.............................24 3.2.3 密度基礎叢集方法之評估...........................24 3.2.4 格子基礎叢集方法之評估...........................25 3.2.5 適合基因表現分析的叢集方法.......................25 3.3 驗證技術之評估.....................................27 第四章 高效率叢集方法...................................28 4.1 增進CAST效率.......................................29 4.1.1 不採用POST-CAST..................................29 4.1.2 新的推論法則.....................................30 4.2 減少執行次數.......................................30 4.2.1 基本方法.........................................30 4.2.2 減少計算量方法...................................33 4.2.2.1 方法一:減少每回合的執行次數...................33 4.2.2.2 方法二:減少回合數.............................33 第五章 效益評估與分析...................................35 5.1 新推論法則之效益評估...............................36 5.2 不使用POST-CAST之效益評估..........................37 5.3 K-Means、K-Medoids與CAST之效益評估.................39 5.4 逐漸縮小門檻值範圍之效益評估(低相似度資料集).....42 5.5 逐漸縮小門檻值範圍之效益評估(中相似度資料集).....44 5.6 逐漸縮小門檻值範圍之效益評估(高相似度資料集).....46 5.7 逐漸縮小門檻值範圍之效益評估(大資料量資料集).....49 第六章 結論與未來研究方向...............................53 6.1 結論...............................................53 6.2 應用...............................................54 6.3 未來研究方向.......................................54 參考文獻................................................56 Shin-Mu Tseng 曾新穆 2001 學位論文 ; thesis 72 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
author2 Shin-Mu Tseng
author_facet Shin-Mu Tseng
Ching-Pin Kao
高慶斌
author Ching-Pin Kao
高慶斌
spellingShingle Ching-Pin Kao
高慶斌
Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
author_sort Ching-Pin Kao
title Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
title_short Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
title_full Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
title_fullStr Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
title_full_unstemmed Efficient Clustering Methods for Gene Expression Mining : a Performance Evaluation
title_sort efficient clustering methods for gene expression mining : a performance evaluation
publishDate 2001
url http://ndltd.ncl.edu.tw/handle/75711478633378758641
work_keys_str_mv AT chingpinkao efficientclusteringmethodsforgeneexpressionminingaperformanceevaluation
AT gāoqìngbīn efficientclusteringmethodsforgeneexpressionminingaperformanceevaluation
AT chingpinkao yīngyòngyújīyīnbiǎoxiàntànkānzhīgāoxiàolǜcóngjífāngfǎjíqíxiàonéngpínggū
AT gāoqìngbīn yīngyòngyújīyīnbiǎoxiàntànkānzhīgāoxiàolǜcóngjífāngfǎjíqíxiàonéngpínggū
_version_ 1718170203816198144
description 碩士 === 國立成功大學 === 資訊工程研究所 === 89 === This research presents a new and efficient clustering analysis approach that is suitable for gene expression analysis. The proposed approach is primarily based on similarity-matrix based clustering techniques and complemented by new heuristics for reducing computations. In addition, a validation technique is integrated for verifying the quality of clustering results. The main features of the proposed approach are as follows: 1) High cluster quality: A near-optimal clustering result can be provided by integrating a validation technique with clustering methods, 2) High efficiency: By using new heuristics and a threshold range reduction method, the clustering process can be conducted very efficiently, 3) Automation: The whole analysis process is conducted automatically without requesting the users to adjust any parameters. Through experimental evaluation, the proposed clustering approach outperforms other approaches no matter in execution efficiency or quality of clustering results under various kinds of gene expression data. The experimental results also show that the quality of the clustering result generated by our approach is very close to the optimal value. Therefore, this research presents valuable studies and methods for biologist in conducting gene expression analysis. 1.1 基因表現探勘簡介....................................1 1.2 研究動機............................................2 1.3 研究目的............................................3 1.4 本論文內容與架構....................................3 第二章 相關研究..........................................5 2.1 相似度量測..........................................5 2.1.1 距離量測..........................................6 2.1.2 相關係數..........................................6 2.1.3 關聯係數..........................................7 2.1.4 機率相似係數......................................8 2.2 叢集方法............................................9 2.2.1 階層式叢集方法....................................9 2.2.1.1 階層式凝聚叢集方法.............................10 2.2.2 分割式叢集方法...................................12 2.2.2.1 K-Means........................................13 2.2.2.2 K-Medois.......................................14 2.2.3 密度基礎的叢集方法...............................15 2.2.3.1 叢集相似度搜尋技術.............................15 2.2.4 格子基礎的叢集方法...............................18 2.2.5 模型基礎的叢集方法...............................19 2.2.6 分離物分析.......................................19 2.3 驗證技術...........................................20 2.3.1 Hubert Γ統計.....................................21 第三章 現存方法之評估...................................22 3.1 相似度量測之評估...................................22 3.2 叢集方法之評估.....................................23 3.2.1 階層式叢集方法之評估.............................23 3.2.2 分割式叢集方法之評估.............................24 3.2.3 密度基礎叢集方法之評估...........................24 3.2.4 格子基礎叢集方法之評估...........................25 3.2.5 適合基因表現分析的叢集方法.......................25 3.3 驗證技術之評估.....................................27 第四章 高效率叢集方法...................................28 4.1 增進CAST效率.......................................29 4.1.1 不採用POST-CAST..................................29 4.1.2 新的推論法則.....................................30 4.2 減少執行次數.......................................30 4.2.1 基本方法.........................................30 4.2.2 減少計算量方法...................................33 4.2.2.1 方法一:減少每回合的執行次數...................33 4.2.2.2 方法二:減少回合數.............................33 第五章 效益評估與分析...................................35 5.1 新推論法則之效益評估...............................36 5.2 不使用POST-CAST之效益評估..........................37 5.3 K-Means、K-Medoids與CAST之效益評估.................39 5.4 逐漸縮小門檻值範圍之效益評估(低相似度資料集).....42 5.5 逐漸縮小門檻值範圍之效益評估(中相似度資料集).....44 5.6 逐漸縮小門檻值範圍之效益評估(高相似度資料集).....46 5.7 逐漸縮小門檻值範圍之效益評估(大資料量資料集).....49 第六章 結論與未來研究方向...............................53 6.1 結論...............................................53 6.2 應用...............................................54 6.3 未來研究方向.......................................54 參考文獻................................................56