Estimation of discriminant analysis error rate for high dimensional data

Methodologies for data reduction, modeling, and classification of grouped response curves are explored. In particular, the thesis focuses on the analysis of a collection of highly correlated, highly dimensional response-curve data of spectral reflectance curves of wood surface features. In the analy...

Full description

Bibliographic Details
Main Author: Lebow, Patricia K.
Other Authors: Butler, David A.
Language:en_US
Published: 2012
Subjects:
Online Access:http://hdl.handle.net/1957/35903
Description
Summary:Methodologies for data reduction, modeling, and classification of grouped response curves are explored. In particular, the thesis focuses on the analysis of a collection of highly correlated, highly dimensional response-curve data of spectral reflectance curves of wood surface features. In the analysis, questions about the application of cross-validation estimation of discriminant function error rates for data that has been previously transformed by principal component analysis arise. Performing cross-validation requires re-calculating the principal component transformation and discriminant functions of the training sets, a very lengthy process. A more efficient approach of carrying out the cross-validation calculations, plus the alternative of estimating error rates without the re-calculation of the principal component decomposition, are studied to address questions about the cross-validation procedure. If populations are assumed to have common covariance structures, the pooled covariance matrix can be decomposed for the principal component transformation. The leave-one-out cross-validation procedure results in a rank-one update in the pooled covariance matrix for each observation left out. Algorithms have been developed for calculating the updated eigenstructure under rank-one updates and they can be applied to the orthogonal decomposition of the pooled covariance matrix. Use of these algorithms results in much faster computation of error rates, especially when the number of variables is large. The bias and variance of an estimator that performs leave-one-out cross-validation directly on the principal component scores (without re-computation of the principal component transformation for each observation) is also investigated. === Graduation date: 1993