Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while p...
Other Authors: | |
---|---|
Format: | Others |
Language: | English English |
Published: |
Florida State University
|
Subjects: | |
Online Access: | http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087 |
id |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_360492 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_3604922020-06-24T03:06:55Z Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity Zhang, Qiaoya (authoraut) She, Yiyuan (professor directing dissertation) Ma, Teng (university representative) Niu, Xufeng (committee member) Sinha, Debajyoti (committee member) Slate, Elizabeth H. (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Statistics (degree granting department) Text text Florida State University Florida State University English eng 1 online resource (99 pages) computer application/pdf The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning (GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester 2016. March 29, 2016. Includes bibliographical references. Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu, Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member. Statistics FSU_2016SP_Zhang_fsu_0071E_13087 http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A360492/datastream/TN/view/Sparse%20Generalized%20PCA%20and%20Dependency%20Learning%20for%20Large-Scale%20Applications%20Beyond%20Gaussianity.jpg |
collection |
NDLTD |
language |
English English |
format |
Others
|
sources |
NDLTD |
topic |
Statistics |
spellingShingle |
Statistics Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
description |
The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a
difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank
subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions
under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing
value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a
progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the
independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning
(GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular
distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into
the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements
for the degree of Doctor of Philosophy. === Spring Semester 2016. === March 29, 2016. === Includes bibliographical references. === Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu,
Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member. |
author2 |
Zhang, Qiaoya (authoraut) |
author_facet |
Zhang, Qiaoya (authoraut) |
title |
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
title_short |
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
title_full |
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
title_fullStr |
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
title_full_unstemmed |
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity |
title_sort |
sparse generalized pca and dependency learning for large-scale applications beyond gaussianity |
publisher |
Florida State University |
url |
http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087 |
_version_ |
1719323161402540032 |