Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity

The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while p...

Full description

Bibliographic Details
Other Authors: Zhang, Qiaoya (authoraut)
Format: Others
Language:English
English
Published: Florida State University
Subjects:
Online Access:http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087
id ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_360492
record_format oai_dc
spelling ndltd-fsu.edu-oai-fsu.digital.flvc.org-fsu_3604922020-06-24T03:06:55Z Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity Zhang, Qiaoya (authoraut) She, Yiyuan (professor directing dissertation) Ma, Teng (university representative) Niu, Xufeng (committee member) Sinha, Debajyoti (committee member) Slate, Elizabeth H. (committee member) Florida State University (degree granting institution) College of Arts and Sciences (degree granting college) Department of Statistics (degree granting department) Text text Florida State University Florida State University English eng 1 online resource (99 pages) computer application/pdf The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning (GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. Spring Semester 2016. March 29, 2016. Includes bibliographical references. Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu, Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member. Statistics FSU_2016SP_Zhang_fsu_0071E_13087 http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087 This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s). The copyright in theses and dissertations completed at Florida State University is held by the students who author them. http://diginole.lib.fsu.edu/islandora/object/fsu%3A360492/datastream/TN/view/Sparse%20Generalized%20PCA%20and%20Dependency%20Learning%20for%20Large-Scale%20Applications%20Beyond%20Gaussianity.jpg
collection NDLTD
language English
English
format Others
sources NDLTD
topic Statistics
spellingShingle Statistics
Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
description The age of big data has re-invited much interest in dimension reduction. How to cope with high-dimensional data remains a difficult problem in statistical learning. In this study, we consider the task of dimension reduction---projecting data into a lower-rank subspace while preserving maximal information. We investigate the pitfalls of classical PCA, and propose a set of algorithm that functions under high dimension, extends to all exponential family distributions, performs feature selection at the mean time, and takes missing value into consideration. Based upon the best performing one, we develop the SG-PCA algorithm. With acceleration techniques and a progressive screening scheme, it demonstrates superior scalability and accuracy compared to existing methods. Concerned with the independence assumption of dimension reduction techniques, we propose a novel framework, the Generalized Indirect Dependency Learning (GIDL), to learn and incorporate association structure in multivariate statistical analysis. Without constraints on the particular distribution of the data, GIDL takes any pre-specified smooth loss function and is able to both extract and infuse its association into the regression, classification or dimension reduction problem. Experiments at the end serve to demonstrate its efficacy. === A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. === Spring Semester 2016. === March 29, 2016. === Includes bibliographical references. === Yiyuan She, Professor Directing Dissertation; Teng Ma, University Representative; Xufeng Niu, Committee Member; Debajyoti Sinha, Committee Member; Elizabeth Slate, Committee Member.
author2 Zhang, Qiaoya (authoraut)
author_facet Zhang, Qiaoya (authoraut)
title Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
title_short Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
title_full Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
title_fullStr Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
title_full_unstemmed Sparse Generalized PCA and Dependency Learning for Large-Scale Applications Beyond Gaussianity
title_sort sparse generalized pca and dependency learning for large-scale applications beyond gaussianity
publisher Florida State University
url http://purl.flvc.org/fsu/fd/FSU_2016SP_Zhang_fsu_0071E_13087
_version_ 1719323161402540032