Adaptive Mixture Estimation and Subsampling PCA

Bibliographic Details
Main Author:	Liu, Peng
Language:	English
Published:	Case Western Reserve University School of Graduate Studies / OhioLINK 2009
Subjects:	Statistics large data data mining mixture models Gaussian mixtures parameter estimation adaptive procedure partial EM high-dimensional data large p small n dimension reduction feature selection subsampling
Online Access:	http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686

id	ndltd-OhioLink-oai-etd.ohiolink.edu-case1220644686
record_format	oai_dc
spelling	ndltd-OhioLink-oai-etd.ohiolink.edu-case12206446862021-08-03T05:32:55Z Adaptive Mixture Estimation and Subsampling PCA Liu, Peng Statistics large data data mining mixture models Gaussian mixtures parameter estimation adaptive procedure partial EM high-dimensional data large p small n dimension reduction feature selection subsampling Data mining is important in scientific research, knowledge discovery and decision making. A typical challenge in data mining is that a data set may be too large to be loaded all together, at one time, into computer memory for analyses. Even if it can be loaded all at once for an analysis, too many nuisance features may mask important information in the data. In this dissertation, two new methodologies for analyzing large data are studied. The first methodology is concerned with adaptive estimation of mixture parameters in heterogeneous populations of large-n data. Our adaptive estimation procedures, the partial EM (PEM) and its Bayesian variants (BMAP and BPEM) work well for large or streaming data. They can also handle the situation in which later stage data may contain extra components (a.k.a. "contaminations" or "intrusions") and hence have applications in network traffic analysis and intrusion detection. Furthermore, the partial EM estimate is consistent and efficient. It compares well with a full EM estimate when a full EM procedure is feasible. The second methodology is about subsampling large-p data for selecting important features under the principal component analysis (PCA) framework. Our new method is called subsampling PCA (SPCA). Diagnostic tools for choosing parameter values, such as subsample size and iteration number, in our SPCA procedure are developed. It is shown through analysis and simulation that the SPCA can overcome the masking effect of nuisance features and pick up the important variables and major components. Its application to gene expression data analysis is also demonstrated. 2009 English text Case Western Reserve University School of Graduate Studies / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686 http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686 unrestricted This thesis or dissertation is protected by copyright: all rights reserved. It may not be copied or redistributed beyond the terms of applicable copyright laws.
collection	NDLTD
language	English
sources	NDLTD
topic	Statistics large data data mining mixture models Gaussian mixtures parameter estimation adaptive procedure partial EM high-dimensional data large p small n dimension reduction feature selection subsampling
spellingShingle	Statistics large data data mining mixture models Gaussian mixtures parameter estimation adaptive procedure partial EM high-dimensional data large p small n dimension reduction feature selection subsampling Liu, Peng Adaptive Mixture Estimation and Subsampling PCA
author	Liu, Peng
author_facet	Liu, Peng
author_sort	Liu, Peng
title	Adaptive Mixture Estimation and Subsampling PCA
title_short	Adaptive Mixture Estimation and Subsampling PCA
title_full	Adaptive Mixture Estimation and Subsampling PCA
title_fullStr	Adaptive Mixture Estimation and Subsampling PCA
title_full_unstemmed	Adaptive Mixture Estimation and Subsampling PCA
title_sort	adaptive mixture estimation and subsampling pca
publisher	Case Western Reserve University School of Graduate Studies / OhioLINK
publishDate	2009
url	http://rave.ohiolink.edu/etdc/view?acc_num=case1220644686
work_keys_str_mv	AT liupeng adaptivemixtureestimationandsubsamplingpca
_version_	1719421566248288256

Adaptive Mixture Estimation and Subsampling PCA

Similar Items