Robust methods in data mining

The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machin...

Full description

Bibliographic Details
Main Author:	Mwitondi, K. S.
Other Authors:	Taylor, C. C. ; Kent, J. T.
Published:	University of Leeds 2003
Subjects:	006.312
Online Access:	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.400882

id	ndltd-bl.uk-oai-ethos.bl.uk-400882
record_format	oai_dc
spelling	ndltd-bl.uk-oai-ethos.bl.uk-4008822017-10-04T03:32:58ZRobust methods in data miningMwitondi, K. S.Taylor, C. C. ; Kent, J. T.2003The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machine Learning literature as unsupervised and supervised classification respectively, is central to decision making in different fields - the thesis seeks to contribute towards that end. In the first part of the thesis we consider whether robust methods can be applied to clustering - in particular, we perform clustering on fuzzy data using two methods originally developed for outlier-detection. The fuzzy data clusters are characterised by two intersecting lines such that points belonging to the same cluster lie close to the same line. This part of the thesis also investigates a new application of finite mixture of normals to the fuzzy data problem. The second part of the thesis addresses issues relating to classification - in particular, classification trees and boosting. The boosting algorithm is a relative newcomer to the classification portfolio that seeks to enhance the performance of classifiers by iteratively re-weighting the data according to their previous classification status. We explore the performance of "boosted" trees (mainly stumps) based on 3 different models all characterised by a sine-wave boundary. We also carry out a thorough study of the factors that affect the boosting algorithm. Other results include a new look at the concept of randomness in the classification context, particularly because the form of randomness in both training and testing data has directly affects the accuracy and reliability of domain- partitioning rules. Further, we provide statistical interpretations of some of the classification-related concepts, originally used in Computer Science, Machine Learning and Artificial Intelligence. This is important since there exists a need for a unified interpretation of some of the "landmark" concepts in various disciplines, as a step forward towards seeking the principles that can guide and strengthen practical applications.006.312University of Leedshttp://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.400882http://etheses.whiterose.ac.uk/807/Electronic Thesis or Dissertation
collection	NDLTD
sources	NDLTD
topic	006.312
spellingShingle	006.312 Mwitondi, K. S. Robust methods in data mining
description	The thesis focuses on two problems in Data Mining, namely clustering, an exploratory technique to group observations in similar groups, and classification, a technique used to assign new observations to one of the known groups. A thorough study of the two problems, which are also known in the Machine Learning literature as unsupervised and supervised classification respectively, is central to decision making in different fields - the thesis seeks to contribute towards that end. In the first part of the thesis we consider whether robust methods can be applied to clustering - in particular, we perform clustering on fuzzy data using two methods originally developed for outlier-detection. The fuzzy data clusters are characterised by two intersecting lines such that points belonging to the same cluster lie close to the same line. This part of the thesis also investigates a new application of finite mixture of normals to the fuzzy data problem. The second part of the thesis addresses issues relating to classification - in particular, classification trees and boosting. The boosting algorithm is a relative newcomer to the classification portfolio that seeks to enhance the performance of classifiers by iteratively re-weighting the data according to their previous classification status. We explore the performance of "boosted" trees (mainly stumps) based on 3 different models all characterised by a sine-wave boundary. We also carry out a thorough study of the factors that affect the boosting algorithm. Other results include a new look at the concept of randomness in the classification context, particularly because the form of randomness in both training and testing data has directly affects the accuracy and reliability of domain- partitioning rules. Further, we provide statistical interpretations of some of the classification-related concepts, originally used in Computer Science, Machine Learning and Artificial Intelligence. This is important since there exists a need for a unified interpretation of some of the "landmark" concepts in various disciplines, as a step forward towards seeking the principles that can guide and strengthen practical applications.
author2	Taylor, C. C. ; Kent, J. T.
author_facet	Taylor, C. C. ; Kent, J. T. Mwitondi, K. S.
author	Mwitondi, K. S.
author_sort	Mwitondi, K. S.
title	Robust methods in data mining
title_short	Robust methods in data mining
title_full	Robust methods in data mining
title_fullStr	Robust methods in data mining
title_full_unstemmed	Robust methods in data mining
title_sort	robust methods in data mining
publisher	University of Leeds
publishDate	2003
url	http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.400882
work_keys_str_mv	AT mwitondiks robustmethodsindatamining
_version_	1718544524644450304

Robust methods in data mining

Similar Items