Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while p...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087 |
Summary: | The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a
difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank
subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions
under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing
value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a
progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the
independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning
(GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular
distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into
the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements
for the degree of Doctor of Philosophy. === Spring Semester 2016. === March 29, 2016. === Includes bibliographical references. === Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu,
Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member. |
---|