Summary: | 博士 === 國立清華大學 === 通訊工程研究所 === 104 === Non-negative blind source separation (nBSS), the focus of this dissertation, has found many successful applications in science and engineering, such as biomedical imaging, gene expression data analysis, and hyperspectral imaging in remote sensing. In contrast to conventional nBSS methods, including non-negative independent component analysis (nICA) and non-negative matrix factorization (NMF), we consider the nBSS problem from the perspective of simplex geometry without requiring sources' statistical independence and existence of pure pixel (fully contributed by a single source).
The columns of the mixing matrix, describing how the non-negative sources are mixed, can be estimated by the vertices (also referred to as endmembers) of the minimum-volume simplex that encloses all pixel vectors---the well-known Craig's nBSS criterion. Empirical experience has suggested that Craig's criterion is capable of unmixing heavily mixed sources, but it was not clear why this is true from a theoretical viewpoint. Before we adopt this powerful criterion for devising a highly efficient and effective nBSS algorithm,
we develop an analysis framework wherein the source mixing level (or data purity level) is quantitatively defined, and prove that Craig's criterion indeed can yield perfect endmember identifiability (in the noiseless scenario) as long as this quantity is greater than a {certain} small threshold. Our theoretical results are substantiated by numerical simulation results.
Considering that existing Craig-simplex-identification (CSI) algorithms suffer from high computational complexity due to heavy simplex volume computations, our identifiability analysis results motivated us to devise a super fast CSI algorithm for nBSS without involving any simplex volume computations. Specifically, by exploiting a convex geometry fact that a simplest simplex of N vertices can be defined by N associated hyperplanes, we reconstruct Craig's simplex from N hyperplane estimates, where each hyperplane is estimated from N-1 affinely independent data pixels. Without resorting to numerical optimization, the proposed algorithm searches for the N(N-1) data pixels via simple linear algebraic computations, accounting for its computational efficiency. Besides an endmember identifiability analysis for its performance support, synthetic/real hyperspectral remote sensing (HRS) imaging data experiments are also provided to demonstrate its superior efficacy over state-of-the-art CSI algorithms in both computational efficiency and estimation accuracy.
Finally, model-order selection (MOS), determining the number of sources N, is done based on an information theoretic-oriented minimum description length (MDL) criterion that {avoids data-dependent parameter tuning} (e.g., eigenvalue threshold). Instead of describing nBSS data via Gaussian competing models (which may be too simplified to advisably describe nBSS data) as in existing MDL-based frameworks, we consider more comprehensive modeling based on the fact that (standardized) nBSS data often can be configured as a simplex. Specifically, we employ a (linearly transformed) Dirichlet distribution to capture the simplex structure embedded in the noiseless counterpart of data, which, together with a Gaussian noise modeling, gives rise to Gaussian-Dirichlet convolution competing models. Then, maximum-likelihood (ML) estimates of the Gaussian-Dirichlet density are derived by building up a link between stochastic ML estimator and simplex geometry. Consequently, the corresponding description lengths are efficiently calculated by Monte Carlo integration. We validate our nBSS-MDL criterion through extensive simulations and experiments on real-world biomedical and HRS imaging datasets, to demonstrate its performance/applicability, and it consistently detects the true number of sources in all of our four case studies.
|