The identification and application of common principal components
Thesis (PhD)--Stellenbosch University, 2014. === ENGLISH ABSTRACT: When estimating the covariance matrices of two or more populations, the covariance matrices are often assumed to be either equal or completely unrelated. The common principal components (CPC) model provides an alternative which is...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_ZA |
Published: |
Stellenbosch : Stellenbosch University
2015
|
Subjects: | |
Online Access: | http://hdl.handle.net/10019.1/96101 |
id |
ndltd-netd.ac.za-oai-union.ndltd.org-sun-oai-scholar.sun.ac.za-10019.1-96101 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
en_ZA |
format |
Others
|
sources |
NDLTD |
topic |
UCTD Dissertations -- Statistics and actuarial science Theses -- Statistics and actuarial science Analysis of covariance Discriminant analysis Monte Carlo method Multivariate analysis Principal components analysis Newborn infants -- Mortality -- Mathematical models Hospital utilization -- Length of stay -- Mathematical models |
spellingShingle |
UCTD Dissertations -- Statistics and actuarial science Theses -- Statistics and actuarial science Analysis of covariance Discriminant analysis Monte Carlo method Multivariate analysis Principal components analysis Newborn infants -- Mortality -- Mathematical models Hospital utilization -- Length of stay -- Mathematical models Pepler, Pieter Theo The identification and application of common principal components |
description |
Thesis (PhD)--Stellenbosch University, 2014. === ENGLISH ABSTRACT: When estimating the covariance matrices of two or more populations,
the covariance matrices are often assumed to be either equal or completely
unrelated. The common principal components (CPC) model provides an
alternative which is situated between these two extreme assumptions: The
assumption is made that the population covariance matrices share the same
set of eigenvectors, but have di erent sets of eigenvalues.
An important question in the application of the CPC model is to determine
whether it is appropriate for the data under consideration. Flury (1988)
proposed two methods, based on likelihood estimation, to address this question.
However, the assumption of multivariate normality is untenable for
many real data sets, making the application of these parametric methods
questionable. A number of non-parametric methods, based on bootstrap
replications of eigenvectors, is proposed to select an appropriate common
eigenvector model for two population covariance matrices. Using simulation
experiments, it is shown that the proposed selection methods outperform the
existing parametric selection methods.
If appropriate, the CPC model can provide covariance matrix estimators
that are less biased than when assuming equality of the covariance matrices,
and of which the elements have smaller standard errors than the elements of
the ordinary unbiased covariance matrix estimators. A regularised covariance
matrix estimator under the CPC model is proposed, and Monte Carlo simulation
results show that it provides more accurate estimates of the population
covariance matrices than the competing covariance matrix estimators.
Covariance matrix estimation forms an integral part of many multivariate
statistical methods. Applications of the CPC model in discriminant analysis,
biplots and regression analysis are investigated. It is shown that, in cases
where the CPC model is appropriate, CPC discriminant analysis provides signi
cantly smaller misclassi cation error rates than both ordinary quadratic
discriminant analysis and linear discriminant analysis. A framework for the
comparison of di erent types of biplots for data with distinct groups is developed,
and CPC biplots constructed from common eigenvectors are compared
to other types of principal component biplots using this framework.
A subset of data from the Vermont Oxford Network (VON), of infants admitted to participating neonatal intensive care units in South Africa and
Namibia during 2009, is analysed using the CPC model. It is shown that
the proposed non-parametric methodology o ers an improvement over the
known parametric methods in the analysis of this data set which originated
from a non-normally distributed multivariate population.
CPC regression is compared to principal component regression and partial least squares regression in the tting of models to predict neonatal mortality
and length of stay for infants in the VON data set. The tted regression
models, using readily available day-of-admission data, can be used by medical
sta and hospital administrators to counsel parents and improve the
allocation of medical care resources. Predicted values from these models can
also be used in benchmarking exercises to assess the performance of neonatal
intensive care units in the Southern African context, as part of larger quality
improvement programmes. === AFRIKAANSE OPSOMMING: Wanneer die kovariansiematrikse van twee of meer populasies beraam
word, word dikwels aanvaar dat die kovariansiematrikse of gelyk, of heeltemal
onverwant is. Die gemeenskaplike hoofkomponente (GHK) model verskaf
'n alternatief wat tussen hierdie twee ekstreme aannames gele e is: Die
aanname word gemaak dat die populasie kovariansiematrikse dieselfde versameling
eievektore deel, maar verskillende versamelings eiewaardes het.
'n Belangrike vraag in die toepassing van die GHK model is om te bepaal
of dit geskik is vir die data wat beskou word. Flury (1988) het twee metodes,
gebaseer op aanneemlikheidsberaming, voorgestel om hierdie vraag aan te
spreek. Die aanname van meerveranderlike normaliteit is egter ongeldig vir
baie werklike datastelle, wat die toepassing van hierdie metodes bevraagteken.
'n Aantal nie-parametriese metodes, gebaseer op skoenlus-herhalings van
eievektore, word voorgestel om 'n geskikte gemeenskaplike eievektor model
te kies vir twee populasie kovariansiematrikse. Met die gebruik van simulasie
eksperimente word aangetoon dat die voorgestelde seleksiemetodes beter vaar
as die bestaande parametriese seleksiemetodes.
Indien toepaslik, kan die GHK model kovariansiematriks beramers verskaf
wat minder sydig is as wanneer aanvaar word dat die kovariansiematrikse
gelyk is, en waarvan die elemente kleiner standaardfoute het as die elemente
van die gewone onsydige kovariansiematriks beramers. 'n Geregulariseerde
kovariansiematriks beramer onder die GHK model word voorgestel, en Monte
Carlo simulasie resultate toon dat dit meer akkurate beramings van die populasie
kovariansiematrikse verskaf as ander mededingende kovariansiematriks
beramers.
Kovariansiematriks beraming vorm 'n integrale deel van baie meerveranderlike
statistiese metodes. Toepassings van die GHK model in diskriminantanalise,
bi-stippings en regressie-analise word ondersoek. Daar word
aangetoon dat, in gevalle waar die GHK model toepaslik is, GHK diskriminantanalise
betekenisvol kleiner misklassi kasie foutkoerse lewer as beide
gewone kwadratiese diskriminantanalise en line^ere diskriminantanalise. 'n
Raamwerk vir die vergelyking van verskillende tipes bi-stippings vir data
met verskeie groepe word ontwikkel, en word gebruik om GHK bi-stippings
gekonstrueer vanaf gemeenskaplike eievektore met ander tipe hoofkomponent
bi-stippings te vergelyk. 'n Deelversameling van data vanaf die Vermont Oxford Network (VON),
van babas opgeneem in deelnemende neonatale intensiewe sorg eenhede in
Suid-Afrika en Namibi e gedurende 2009, word met behulp van die GHK
model ontleed. Daar word getoon dat die voorgestelde nie-parametriese
metodiek 'n verbetering op die bekende parametriese metodes bied in die ontleding van hierdie datastel wat afkomstig is uit 'n nie-normaal verdeelde
meerveranderlike populasie.
GHK regressie word vergelyk met hoofkomponent regressie en parsi ele
kleinste kwadrate regressie in die passing van modelle om neonatale mortaliteit
en lengte van verblyf te voorspel vir babas in die VON datastel. Die
gepasde regressiemodelle, wat maklik bekombare dag-van-toelating data gebruik,
kan deur mediese personeel en hospitaaladministrateurs gebruik word
om ouers te adviseer en die toewysing van mediese sorg hulpbronne te verbeter.
Voorspelde waardes vanaf hierdie modelle kan ook gebruik word in
normwaarde oefeninge om die prestasie van neonatale intensiewe sorg eenhede
in die Suider-Afrikaanse konteks, as deel van groter gehalteverbeteringprogramme,
te evalueer. |
author2 |
Uys, D. W. |
author_facet |
Uys, D. W. Pepler, Pieter Theo |
author |
Pepler, Pieter Theo |
author_sort |
Pepler, Pieter Theo |
title |
The identification and application of common principal components |
title_short |
The identification and application of common principal components |
title_full |
The identification and application of common principal components |
title_fullStr |
The identification and application of common principal components |
title_full_unstemmed |
The identification and application of common principal components |
title_sort |
identification and application of common principal components |
publisher |
Stellenbosch : Stellenbosch University |
publishDate |
2015 |
url |
http://hdl.handle.net/10019.1/96101 |
work_keys_str_mv |
AT peplerpietertheo theidentificationandapplicationofcommonprincipalcomponents AT peplerpietertheo identificationandapplicationofcommonprincipalcomponents |
_version_ |
1718162562285043712 |
spelling |
ndltd-netd.ac.za-oai-union.ndltd.org-sun-oai-scholar.sun.ac.za-10019.1-961012016-01-29T04:02:08Z The identification and application of common principal components Pepler, Pieter Theo Uys, D. W. Nel, D. G. Stellenbosch University. Faculty of Economic and Management Sciences. Dept. of Statistics and Actuarial Science. UCTD Dissertations -- Statistics and actuarial science Theses -- Statistics and actuarial science Analysis of covariance Discriminant analysis Monte Carlo method Multivariate analysis Principal components analysis Newborn infants -- Mortality -- Mathematical models Hospital utilization -- Length of stay -- Mathematical models Thesis (PhD)--Stellenbosch University, 2014. ENGLISH ABSTRACT: When estimating the covariance matrices of two or more populations, the covariance matrices are often assumed to be either equal or completely unrelated. The common principal components (CPC) model provides an alternative which is situated between these two extreme assumptions: The assumption is made that the population covariance matrices share the same set of eigenvectors, but have di erent sets of eigenvalues. An important question in the application of the CPC model is to determine whether it is appropriate for the data under consideration. Flury (1988) proposed two methods, based on likelihood estimation, to address this question. However, the assumption of multivariate normality is untenable for many real data sets, making the application of these parametric methods questionable. A number of non-parametric methods, based on bootstrap replications of eigenvectors, is proposed to select an appropriate common eigenvector model for two population covariance matrices. Using simulation experiments, it is shown that the proposed selection methods outperform the existing parametric selection methods. If appropriate, the CPC model can provide covariance matrix estimators that are less biased than when assuming equality of the covariance matrices, and of which the elements have smaller standard errors than the elements of the ordinary unbiased covariance matrix estimators. A regularised covariance matrix estimator under the CPC model is proposed, and Monte Carlo simulation results show that it provides more accurate estimates of the population covariance matrices than the competing covariance matrix estimators. Covariance matrix estimation forms an integral part of many multivariate statistical methods. Applications of the CPC model in discriminant analysis, biplots and regression analysis are investigated. It is shown that, in cases where the CPC model is appropriate, CPC discriminant analysis provides signi cantly smaller misclassi cation error rates than both ordinary quadratic discriminant analysis and linear discriminant analysis. A framework for the comparison of di erent types of biplots for data with distinct groups is developed, and CPC biplots constructed from common eigenvectors are compared to other types of principal component biplots using this framework. A subset of data from the Vermont Oxford Network (VON), of infants admitted to participating neonatal intensive care units in South Africa and Namibia during 2009, is analysed using the CPC model. It is shown that the proposed non-parametric methodology o ers an improvement over the known parametric methods in the analysis of this data set which originated from a non-normally distributed multivariate population. CPC regression is compared to principal component regression and partial least squares regression in the tting of models to predict neonatal mortality and length of stay for infants in the VON data set. The tted regression models, using readily available day-of-admission data, can be used by medical sta and hospital administrators to counsel parents and improve the allocation of medical care resources. Predicted values from these models can also be used in benchmarking exercises to assess the performance of neonatal intensive care units in the Southern African context, as part of larger quality improvement programmes. AFRIKAANSE OPSOMMING: Wanneer die kovariansiematrikse van twee of meer populasies beraam word, word dikwels aanvaar dat die kovariansiematrikse of gelyk, of heeltemal onverwant is. Die gemeenskaplike hoofkomponente (GHK) model verskaf 'n alternatief wat tussen hierdie twee ekstreme aannames gele e is: Die aanname word gemaak dat die populasie kovariansiematrikse dieselfde versameling eievektore deel, maar verskillende versamelings eiewaardes het. 'n Belangrike vraag in die toepassing van die GHK model is om te bepaal of dit geskik is vir die data wat beskou word. Flury (1988) het twee metodes, gebaseer op aanneemlikheidsberaming, voorgestel om hierdie vraag aan te spreek. Die aanname van meerveranderlike normaliteit is egter ongeldig vir baie werklike datastelle, wat die toepassing van hierdie metodes bevraagteken. 'n Aantal nie-parametriese metodes, gebaseer op skoenlus-herhalings van eievektore, word voorgestel om 'n geskikte gemeenskaplike eievektor model te kies vir twee populasie kovariansiematrikse. Met die gebruik van simulasie eksperimente word aangetoon dat die voorgestelde seleksiemetodes beter vaar as die bestaande parametriese seleksiemetodes. Indien toepaslik, kan die GHK model kovariansiematriks beramers verskaf wat minder sydig is as wanneer aanvaar word dat die kovariansiematrikse gelyk is, en waarvan die elemente kleiner standaardfoute het as die elemente van die gewone onsydige kovariansiematriks beramers. 'n Geregulariseerde kovariansiematriks beramer onder die GHK model word voorgestel, en Monte Carlo simulasie resultate toon dat dit meer akkurate beramings van die populasie kovariansiematrikse verskaf as ander mededingende kovariansiematriks beramers. Kovariansiematriks beraming vorm 'n integrale deel van baie meerveranderlike statistiese metodes. Toepassings van die GHK model in diskriminantanalise, bi-stippings en regressie-analise word ondersoek. Daar word aangetoon dat, in gevalle waar die GHK model toepaslik is, GHK diskriminantanalise betekenisvol kleiner misklassi kasie foutkoerse lewer as beide gewone kwadratiese diskriminantanalise en line^ere diskriminantanalise. 'n Raamwerk vir die vergelyking van verskillende tipes bi-stippings vir data met verskeie groepe word ontwikkel, en word gebruik om GHK bi-stippings gekonstrueer vanaf gemeenskaplike eievektore met ander tipe hoofkomponent bi-stippings te vergelyk. 'n Deelversameling van data vanaf die Vermont Oxford Network (VON), van babas opgeneem in deelnemende neonatale intensiewe sorg eenhede in Suid-Afrika en Namibi e gedurende 2009, word met behulp van die GHK model ontleed. Daar word getoon dat die voorgestelde nie-parametriese metodiek 'n verbetering op die bekende parametriese metodes bied in die ontleding van hierdie datastel wat afkomstig is uit 'n nie-normaal verdeelde meerveranderlike populasie. GHK regressie word vergelyk met hoofkomponent regressie en parsi ele kleinste kwadrate regressie in die passing van modelle om neonatale mortaliteit en lengte van verblyf te voorspel vir babas in die VON datastel. Die gepasde regressiemodelle, wat maklik bekombare dag-van-toelating data gebruik, kan deur mediese personeel en hospitaaladministrateurs gebruik word om ouers te adviseer en die toewysing van mediese sorg hulpbronne te verbeter. Voorspelde waardes vanaf hierdie modelle kan ook gebruik word in normwaarde oefeninge om die prestasie van neonatale intensiewe sorg eenhede in die Suider-Afrikaanse konteks, as deel van groter gehalteverbeteringprogramme, te evalueer. 2015-01-13T11:50:24Z 2015-01-13T11:50:24Z 2014-12 Thesis http://hdl.handle.net/10019.1/96101 en_ZA Stellenbosch University 382 p. : ill. Stellenbosch : Stellenbosch University |