Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.

Discriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others...

Full description

Bibliographic Details
Main Author: Petrcich, William
Other Authors: McNicholas, Dr. Paul
Language:en
Published: 2011
Subjects:
BIC
Online Access:http://hdl.handle.net/10214/2933
id ndltd-LACETR-oai-collectionscanada.gc.ca-OGU.10214-2933
record_format oai_dc
spelling ndltd-LACETR-oai-collectionscanada.gc.ca-OGU.10214-29332013-10-04T04:13:57ZDimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.Petrcich, Williamfood authenticationclassificationvariable selectionBICmclustrandom forestsclustvarselDiscriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others. The resulting models were underperforming, so another approach was adopted. Models were built by adding the variable that maximized new-model accuracy. The two programs were used to generate discriminant-analysis and random forest models for three data sets. An existing software package was also used. The second program outperformed the alternatives. For the small number of runs produced in this study, it outperformed the method that inspired this work. The data sets were studied to identify determinants of performance. No definite conclusions were reached, but the results suggest topics for future study.McNicholas, Dr. Paul2011-08-222011-08-31T13:25:46Z2011-09-10T05:00:05Z2011-08-31Thesishttp://hdl.handle.net/10214/2933en
collection NDLTD
language en
sources NDLTD
topic food authentication
classification
variable selection
BIC
mclust
random forests
clustvarsel
spellingShingle food authentication
classification
variable selection
BIC
mclust
random forests
clustvarsel
Petrcich, William
Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
description Discriminant analysis and random forests are used to create models for classification. The number of variables to be tested for inclusion in a model can be large. The goal of this work was to create an efficient and effective selection program. The first method used was based on the work of others. The resulting models were underperforming, so another approach was adopted. Models were built by adding the variable that maximized new-model accuracy. The two programs were used to generate discriminant-analysis and random forest models for three data sets. An existing software package was also used. The second program outperformed the alternatives. For the small number of runs produced in this study, it outperformed the method that inspired this work. The data sets were studied to identify determinants of performance. No definite conclusions were reached, but the results suggest topics for future study.
author2 McNicholas, Dr. Paul
author_facet McNicholas, Dr. Paul
Petrcich, William
author Petrcich, William
author_sort Petrcich, William
title Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
title_short Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
title_full Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
title_fullStr Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
title_full_unstemmed Dimensionality Reduction in the Creation of Classifiers and the Effects of Correlation, Cluster Overlap, and Modelling Assumptions.
title_sort dimensionality reduction in the creation of classifiers and the effects of correlation, cluster overlap, and modelling assumptions.
publishDate 2011
url http://hdl.handle.net/10214/2933
work_keys_str_mv AT petrcichwilliam dimensionalityreductioninthecreationofclassifiersandtheeffectsofcorrelationclusteroverlapandmodellingassumptions
_version_ 1716601632922271744