Summary: | 碩士 === 國立中正大學 === 數學系統計科學研究所 === 103 === Principal component analysis (PCA) is a well known statistical procedure for dimension reduction in classical data. As the age of big data advance, classical data may be aggregated as symbolic data, which was introduced by Billard and Diday (1987). In literature, Cazes et al. (1997) and Chouakria et al. (1998) proposed vertice PCA and center PCA, Le-Rademacher and Billard (2012) proposed symbolic covariance PCA and Ichino (2011) proposed quantile PCA for symbolic interval-valued data. In this thesis, we first investigate the performances of the forementioned four PCA approaches. However, observations that are suspicious can greatly influence the results of the analysis during the process of conducting PCA. Therefore, detection of such influential intervals becomes an indispensable task. To our knowledge, a study in the influence analysis on PCA for symbolic interval-valued data has not been explored in the literature.Thus, this becomes the emphasis in this thesis. Hampel (1974) proposed influence function that provides a useful tool for influential point diagnosis. In this thesis, we adopt Hampel’s technique and develop three types of influence functions of eigenvalue and eigenvector for symbolic interval-value data, namely empirical influence function, deleted empirical influence function and sample influence function. We illustrate these proposed methods with simulation studies and real data examples.
|