Data measures that characterise classification problems

We have a wide-range of classifiers today that are employed in numerous applications, from credit scoring to speech-processing, with great technical and commercial success. No classifier, however, exists that will outperform all other classifiers on all classification tasks, and the process of class...

Full description

Bibliographic Details
Main Author:	Van der Walt, Christiaan Maarten
Other Authors:	Prof E Barnard
Published:	2013
Subjects:	Classifier selection Data measures Data characteristics Artificial data Data analysis Classification Supervised learning Pattern recognition Meta-classification Classification prediction UCTD
Online Access:	http://hdl.handle.net/2263/27624 http://upetd.up.ac.za/thesis/available/etd-08292008-162648/

id	ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-27624
record_format	oai_dc
spelling	ndltd-netd.ac.za-oai-union.ndltd.org-up-oai-repository.up.ac.za-2263-276242017-07-20T04:11:20Z Data measures that characterise classification problems Van der Walt, Christiaan Maarten Prof E Barnard cmvdwalt@gmail.com Classifier selection Data measures Data characteristics Artificial data Data analysis Classification Supervised learning Pattern recognition Meta-classification Classification prediction UCTD We have a wide-range of classifiers today that are employed in numerous applications, from credit scoring to speech-processing, with great technical and commercial success. No classifier, however, exists that will outperform all other classifiers on all classification tasks, and the process of classifier selection is still mainly one of trial and error. The optimal classifier for a classification task is determined by the characteristics of the data set employed; understanding the relationship between data characteristics and the performance of classifiers is therefore crucial to the process of classifier selection. Empirical and theoretical approaches have been employed in the literature to define this relationship. None of these approaches have, however, been very successful in accurately predicting or explaining classifier performance on real-world data. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. The purpose of this meta-classifier is two-fold: (1) to predict the classification performance of real-world classification tasks, and (2) to explain these predictions in order to gain insight into the properties of real-world data. We show that these data measures can be employed successfully to predict the classification performance of real-world data sets; these predictions are accurate in some instances but there is still unpredictable behaviour in other instances. We illustrate that these data measures can give valuable insight into the properties and data structures of real-world data; these insights are extremely valuable for high-dimensional classification problems. Dissertation (MEng)--University of Pretoria, 2008. Electrical, Electronic and Computer Engineering unrestricted 2013-09-07T11:52:19Z 2008-09-09 2013-09-07T11:52:19Z 2008-04-09 2008-09-09 2008-08-29 Dissertation http://hdl.handle.net/2263/27624 a 2008 E1080/gm http://upetd.up.ac.za/thesis/available/etd-08292008-162648/ © University of Pretoria 2008 E1080/
collection	NDLTD
sources	NDLTD
topic	Classifier selection Data measures Data characteristics Artificial data Data analysis Classification Supervised learning Pattern recognition Meta-classification Classification prediction UCTD
spellingShingle	Classifier selection Data measures Data characteristics Artificial data Data analysis Classification Supervised learning Pattern recognition Meta-classification Classification prediction UCTD Van der Walt, Christiaan Maarten Data measures that characterise classification problems
description	We have a wide-range of classifiers today that are employed in numerous applications, from credit scoring to speech-processing, with great technical and commercial success. No classifier, however, exists that will outperform all other classifiers on all classification tasks, and the process of classifier selection is still mainly one of trial and error. The optimal classifier for a classification task is determined by the characteristics of the data set employed; understanding the relationship between data characteristics and the performance of classifiers is therefore crucial to the process of classifier selection. Empirical and theoretical approaches have been employed in the literature to define this relationship. None of these approaches have, however, been very successful in accurately predicting or explaining classifier performance on real-world data. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. We use theoretical properties of classifiers to identify data characteristics that influence classifier performance; these data properties guide us in the development of measures that describe the relationship between data characteristics and classifier performance. We employ these data measures on real-world and artificial data to construct a meta-classification system. The purpose of this meta-classifier is two-fold: (1) to predict the classification performance of real-world classification tasks, and (2) to explain these predictions in order to gain insight into the properties of real-world data. We show that these data measures can be employed successfully to predict the classification performance of real-world data sets; these predictions are accurate in some instances but there is still unpredictable behaviour in other instances. We illustrate that these data measures can give valuable insight into the properties and data structures of real-world data; these insights are extremely valuable for high-dimensional classification problems. === Dissertation (MEng)--University of Pretoria, 2008. === Electrical, Electronic and Computer Engineering === unrestricted
author2	Prof E Barnard
author_facet	Prof E Barnard Van der Walt, Christiaan Maarten
author	Van der Walt, Christiaan Maarten
author_sort	Van der Walt, Christiaan Maarten
title	Data measures that characterise classification problems
title_short	Data measures that characterise classification problems
title_full	Data measures that characterise classification problems
title_fullStr	Data measures that characterise classification problems
title_full_unstemmed	Data measures that characterise classification problems
title_sort	data measures that characterise classification problems
publishDate	2013
url	http://hdl.handle.net/2263/27624 http://upetd.up.ac.za/thesis/available/etd-08292008-162648/
work_keys_str_mv	AT vanderwaltchristiaanmaarten datameasuresthatcharacteriseclassificationproblems
_version_	1718498651374878720

Data measures that characterise classification problems

Similar Items