Automatic model selection on local Gaussian structures with priors: comparative investigations and applications.
作為機器學習領域中的一個重要課題,模型選擇旨在給定有限樣本的情況下、恰當地確定模型的複雜度。自動模型選擇是指一類快速有效的模型選擇方法,它們以一個足夠大的模型複雜度作為初始,在學習過程中有一種內在機制能夠驅使冗餘結構自動地變為不起作用、從而可以剔除。爲了輔助自動模型選擇的進行,模型的參數通常被假設帶有先驗。對於考慮先驗的各種自動模型選擇方法,已有工作中尚缺乏系統性的比較研究。本篇論文著眼於具有局部高斯結構的模型,進行了系統性的比較分析。 === 具體而言,本文比較了三種典型的自動模型選擇方法的優劣勢,它們分別為變分貝葉斯(Variational Bayesian),最小信息長度(Minimum...
Other Authors: | |
---|---|
Format: | Others |
Language: | English Chinese |
Published: |
2012
|
Subjects: | |
Online Access: | http://library.cuhk.edu.hk/record=b5549417 http://repository.lib.cuhk.edu.hk/en/item/cuhk-327970 |
Summary: | 作為機器學習領域中的一個重要課題,模型選擇旨在給定有限樣本的情況下、恰當地確定模型的複雜度。自動模型選擇是指一類快速有效的模型選擇方法,它們以一個足夠大的模型複雜度作為初始,在學習過程中有一種內在機制能夠驅使冗餘結構自動地變為不起作用、從而可以剔除。爲了輔助自動模型選擇的進行,模型的參數通常被假設帶有先驗。對於考慮先驗的各種自動模型選擇方法,已有工作中尚缺乏系統性的比較研究。本篇論文著眼於具有局部高斯結構的模型,進行了系統性的比較分析。 === 具體而言,本文比較了三種典型的自動模型選擇方法的優劣勢,它們分別為變分貝葉斯(Variational Bayesian),最小信息長度(Minimum Message Length),以及貝葉斯陰陽和諧學習(Bayesian Ying‐Yang harmony learning)。首先,我們研究針對高斯混合模型(Gaussian Mixture Model)的模型選擇,即確定該模型中高斯成份的個數。進而,我们假設每個高斯成份都有子空間結構、并研究混合因子分析模型(Mixture of Factor Analyzers)及局部因子分析模型(Local Factor Analysis)下的模型選擇問題,即確定模型中混合成份的個數及各個局部子空間的維度。 === 本篇論文考慮以上各模型的參數的兩類先驗,分別為共軛型先驗及Jeffreys 先驗。其中,共軛型先驗在高斯混合模型上為DNW(Dirichlet‐Normal‐Wishart)先驗,在混合因子分析模型及局部因子分析模型上均為DNG(Dirichlet‐Normal‐Gamma)先驗。由於推導對應Fisher 信息矩陣的解析表達非常困難,在混合因子分析模型及局部因子分析模型上,我們不考慮Jeffreys 先驗以及最小信息長度方法。 === 通過一系列的仿真實驗及應用分析,本文比較了幾種自動模型選擇算法(包括基於高斯混合模型的6 個算法,基於混合因子分析模型及局部因子分析模型的4 個算法),并得到了如下主要發現:1. 對於各種自動模型選擇方法,在所有參數上加先驗都比僅在混合權重上加先驗的效果好。2. 在高斯混合模型上,考慮 DNW 先驗的效果比考慮Jeffreys 先驗的效果好。其中,考慮Jeffreys 先驗時,最小信息長度比變分貝葉斯的效果略好;而考慮DNW 先驗時,變分貝葉斯比最小信息長度的效果好。3. 在高斯混合模型上,當DNW 先驗的超參數(hyper‐parameters)由保持固定變為根據各自學習準則進行優化時,貝葉斯陰陽和諧學習的效果得到了提高,而變分貝葉斯及最小信息長度的結果都會變差。在基於帶DNG 先驗的混合因子分析模型及局部因子分析模型的比較中,以上觀察結果同樣維持。事實上,變分貝葉斯及最小信息長度都缺乏一種引導先驗超參數優化的良好機制。4. 對以上各種模型、無論考慮哪種先驗、以及無論先驗超參數是否在學習過程中進行優化,貝葉斯陰陽和諧學習的效果都明顯地優於變分貝葉斯和最小信息長度。與后兩者相比,貝葉斯陰陽和諧學習對於先驗的依賴程度不高,它的結果在不考慮先驗的情況下已較好,並在考慮Jeffreys 或共軛型先驗時有進一步提高。5. 儘管混合因子分析模型及局部因子分析模型在最大似然準則的參數估計中等價,它們在變分貝葉斯及貝葉斯陰陽和諧學習下的自動模型選擇中卻表现不同。在這兩種方法下,局部因子分析模型皆以明顯的優勢優於混合因子分析模型。 === 爲進行以上比較分析,除了直接使用已有算法或做少許修改之外,本篇論文還提出了五個新的算法來填補空白。針對高斯混合模型,我們提出了帶Jeffreys 先驗的變分貝葉斯算法;通過邊際化(marginalization),我們得到了有多變量學生分佈(Student’s T‐distribution)形式的后驗,并提出了帶DNW 先驗的貝葉斯陰陽和諧學習算法。針對混合因子分析模型及局部因子分析模型,我們通過一系列的近似邊際化過程,得到了有多個學生分佈乘積形式的后驗,并提出了帶DNG 先驗的貝葉斯陰陽和諧學習算法。對應於已有的基於混合因子分析模型的變分貝葉斯算法,我們還提出了基於局部因子分析模型的變分貝葉斯算法,作為一種更有效的可替代選擇。 === Model selection aims to determine an appropriate model scale given a small size of samples, which is an important topic in machine learning. As one type of efficient solution, an automatic model selection starts from a large enough model scale, and has an intrinsic mechanism to push redundant structures to be ineffective and thus discarded automatically during learning. Priors are usually imposed on parameters to facilitate an automatic model selection. There still lack systematic comparisons on automatic model selection approaches with priors, and this thesis is motivated for such a study based on models with local Gaussian structures. === Particularly, we compare the relative strength and weakness of three typical automatic model selection approaches, namely Variational Bayesian (VB), Minimum Message Length (MML) and Bayesian Ying-Yang (BYY) harmony learning, on models with local Gaussian structures. First, we consider Gaussian Mixture Model (GMM), for which the number of Gaussian components is to be determined. Further assuming each Gaussian component has a subspace structure, we extend to consider two models namely Mixture of Factor Analyzers (MFA) and Local Factor Analysis (LFA), for both of which the component number and local subspace dimensionalities are to be determined. === Two types of priors are imposed on parameters, namely a conjugate form prior and a Jeffreys prior. The conjugate form prior is chosen as a Dirichlet-Normal- Wishart (DNW) prior for GMM, and as a Dirichlet-Normal-Gamma (DNG) prior for both MFA and LFA. The Jeffreys prior and the MML approach are not considered on MFA/LFA due to the difficulty in deriving the corresponding Fisher information matrix. Via extensive simulations and applications, comparisons on the automatic model selection algorithms (six for GMM and four for MFA/LFA), we get following main findings:1. Considering priors on all parameters makes each approach perform better than considering priors merely on the mixing weights.2. For all the three approaches on GMM, the performance with the DNW prior is better than with the Jeffreys prior. Moreover, Jeffreys prior makes MML slightly better than VB, while the DNW prior makes VB better than MML.3. As the DNW prior hyper-parameters on GMM are changed from fixed to freely optimized by each of its own learning principle, BYY improves its performance, while VB and MML deteriorate their performances. This observation remains the same when we compare BYY and VB on either MFA or LFA with the DNG prior. Actually, VB and MML lack a good guide for optimizing prior hyper-parameters.4. For bothGMMand MFA/LFA, BYY considerably outperforms both VB and MML, for any type of priors and whether hyper-parameters are optimized. Being different from VB and MML that rely on appropriate priors, BYY does not highly depend on the type of priors. It performs already well without priors and improves by imposing a Jeffreys or a conjugate form prior. 5. Despite the equivalence in maximum likelihood parameter learning, MFA and LFA affect the performances by VB and BYY in automatic model selection. Particularly, both BYY and VB perform better on LFA than on MFA, and the superiority of LFA is reliable and robust. === In addition to adopting the existing algorithms either directly or with some modifications, this thesis develops five new algorithms to fill the missing gap. Particularly on GMM, the VB algorithm with Jeffreys prior and the BYY algorithm with DNW prior are developed, in the latter of which a multivariate Student’s Tdistribution is obtained as the posterior via marginalization. On MFA and LFA, BYY algorithms with DNG priors are developed, where products of multiple Student’s T-distributions are obtained in posteriors via approximated marginalization. Moreover, a VB algorithm on LFA is developed as an alternative choice to the existing VB algorithm on MFA. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Detailed summary in vernacular field only. === Shi, Lei. === Thesis (Ph.D.)--Chinese University of Hong Kong, 2012. === Includes bibliographical references (leaves 153-166). === Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web. === Abstract also in Chinese. === Abstract --- p.i === Acknowledgement --- p.iv === Chapter 1 --- Introduction --- p.1 === Chapter 1.1 --- Background --- p.3 === Chapter 1.2 --- Main Contributions of the Thesis --- p.11 === Chapter 1.3 --- Outline of the Thesis --- p.14 === Chapter 2 --- Automatic Model Selection on GMM --- p.16 === Chapter 2.1 --- Introduction --- p.17 === Chapter 2.2 --- Gaussian Mixture, Model Selection, and Priors --- p.21 === Chapter 2.2.1 --- Gaussian Mixture Model and EM algorithm --- p.21 === Chapter 2.2.2 --- Three automatic model selection approaches --- p.22 === Chapter 2.2.3 --- Jeffreys prior and Dirichlet-Normal-Wishart prior --- p.24 === Chapter 2.3 --- Algorithms with Jeffreys Priors --- p.25 === Chapter 2.3.1 --- Bayesian Ying-Yang learning and BYY-Jef algorithms --- p.25 === Chapter 2.3.2 --- Variational Bayesian and VB-Jef algorithms --- p.29 === Chapter 2.3.3 --- Minimum Message Length and MML-Jef algorithms --- p.33 === Chapter 2.4 --- Algorithms with Dirichlet and DNW Priors --- p.35 === Chapter 2.4.1 --- Algorithms BYY-Dir(α), VB-Dir(α) and MML-Dir(α) --- p.35 === Chapter 2.4.2 --- Algorithms with DNW priors --- p.40 === Chapter 2.5 --- Empirical Analysis on Simulated Data --- p.44 === Chapter 2.5.1 --- With priors on mixing weights: a quick look --- p.44 === Chapter 2.5.2 --- With full priors: extensive comparisons --- p.51 === Chapter 2.6 --- Concluding Remarks --- p.55 === Chapter 3 --- Applications of GMM Algorithms --- p.57 === Chapter 3.1 --- Face and Handwritten Digit Images Clustering --- p.58 === Chapter 3.2 --- Unsupervised Image Segmentation --- p.59 === Chapter 3.3 --- Image Foreground Extraction --- p.62 === Chapter 3.4 --- Texture Classification --- p.68 === Chapter 3.5 --- Concluding Remarks --- p.71 === Chapter 4 --- Automatic Model Selection on MFA/LFA --- p.73 === Chapter 4.1 --- Introduction --- p.74 === Chapter 4.2 --- MFA/LFA Models and the Priors --- p.78 === Chapter 4.2.1 --- MFA and LFA models --- p.78 === Chapter 4.2.2 --- The Dirichlet-Normal-Gamma priors --- p.79 === Chapter 4.3 --- Algorithms on MFA/LFA with DNG Priors --- p.82 === Chapter 4.3.1 --- BYY algorithm on MFA with DNG prior --- p.83 === Chapter 4.3.2 --- BYY algorithm on LFA with DNG prior --- p.86 === Chapter 4.3.3 --- VB algorithm on MFA with DNG prior --- p.89 === Chapter 4.3.4 --- VB algorithm on LFA with DNG prior --- p.91 === Chapter 4.4 --- Empirical Analysis on Simulated Data --- p.93 === Chapter 4.4.1 --- On the “chair data: a quick look --- p.94 === Chapter 4.4.2 --- Extensive comparisons on four series of simulations --- p.97 === Chapter 4.5 --- Concluding Remarks --- p.101 === Chapter 5 --- Applications of MFA/LFA Algorithms --- p.102 === Chapter 5.1 --- Face and Handwritten Digit Images Clustering --- p.103 === Chapter 5.2 --- Unsupervised Image Segmentation --- p.105 === Chapter 5.3 --- Radar HRRP based Airplane Recognition --- p.106 === Chapter 5.3.1 --- Background of HRRP radar target recognition --- p.106 === Chapter 5.3.2 --- Data description --- p.109 === Chapter 5.3.3 --- Experimental results --- p.111 === Chapter 5.4 --- Concluding Remarks --- p.113 === Chapter 6 --- Conclusions and FutureWorks --- p.114 === Chapter A --- Referred Parametric Distributions --- p.117 === Chapter B --- Derivations of GMM Algorithms --- p.119 === Chapter B.1 --- The BYY-DNW Algorithm --- p.119 === Chapter B.2 --- The MML-DNW Algorithm --- p.124 === Chapter B.3 --- The VB-DNW Algorithm --- p.127 === Chapter C --- Derivations of MFA/LFA Algorithms --- p.130 === Chapter C.1 --- The BYY Algorithms with DNG Priors --- p.130 === Chapter C.1.1 --- The BYY-DNG-MFA algorithm --- p.130 === Chapter C.1.2 --- The BYY-DNG-LFA algorithm --- p.137 === Chapter C.2 --- The VB Algorithms with DNG Priors --- p.145 === Chapter C.2.1 --- The VB-DNG-MFA algorithm --- p.145 === Chapter C.2.2 --- The VB-DNG-LFA algorithm --- p.149 === Bibliography --- p.152 |
---|