A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning

A prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include...

Full description

Bibliographic Details
Main Authors:	Simon Fong, Robert P. Biuk-Aghai, Yain-whar Si, Bee Wah Yap
Format:	Article
Language:	English
Published:	Hindawi Limited 2015-01-01
Series:	Mathematical Problems in Engineering
Online Access:	http://dx.doi.org/10.1155/2015/125781

id	doaj-66d9dcbdbde6437aa0c5ac8b467d9479
record_format	Article
spelling	doaj-66d9dcbdbde6437aa0c5ac8b467d94792020-11-24T22:40:39ZengHindawi LimitedMathematical Problems in Engineering1024-123X1563-51472015-01-01201510.1155/2015/125781125781A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier LearningSimon Fong0Robert P. Biuk-Aghai1Yain-whar Si2Bee Wah Yap3Department of Computer and Information Science, University of Macau, MacauDepartment of Computer and Information Science, University of Macau, MacauDepartment of Computer and Information Science, University of Macau, MacauFaculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, 40450 Shah Alam, Selangor, MalaysiaA prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include, but are not limited to, feature selection, dimensionality reduction, and the removal of noise from training data. One limitation common to all of these techniques is the assumption that the full training dataset must be applied. Although this has been effective for traditional batch training, it may not be practical for incremental classifier learning, also known as data stream mining, where only a single pass of the data stream is seen at a time. Because data streams can amount to infinity and the so-called big data phenomenon, the data preprocessing time must be kept to a minimum. This paper introduces a new data preprocessing strategy suitable for the progressive purging of noisy data from the training dataset without the need to process the whole dataset at one time. This strategy is shown via a computer simulation to provide the significant benefit of allowing for the dynamic removal of bad records from the incremental classifier learning process.http://dx.doi.org/10.1155/2015/125781
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Simon Fong Robert P. Biuk-Aghai Yain-whar Si Bee Wah Yap
spellingShingle	Simon Fong Robert P. Biuk-Aghai Yain-whar Si Bee Wah Yap A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning Mathematical Problems in Engineering
author_facet	Simon Fong Robert P. Biuk-Aghai Yain-whar Si Bee Wah Yap
author_sort	Simon Fong
title	A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning
title_short	A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning
title_full	A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning
title_fullStr	A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning
title_full_unstemmed	A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning
title_sort	lightweight data preprocessing strategy with fast contradiction analysis for incremental classifier learning
publisher	Hindawi Limited
series	Mathematical Problems in Engineering
issn	1024-123X 1563-5147
publishDate	2015-01-01
description	A prime objective in constructing data streaming mining models is to achieve good accuracy, fast learning, and robustness to noise. Although many techniques have been proposed in the past, efforts to improve the accuracy of classification models have been somewhat disparate. These techniques include, but are not limited to, feature selection, dimensionality reduction, and the removal of noise from training data. One limitation common to all of these techniques is the assumption that the full training dataset must be applied. Although this has been effective for traditional batch training, it may not be practical for incremental classifier learning, also known as data stream mining, where only a single pass of the data stream is seen at a time. Because data streams can amount to infinity and the so-called big data phenomenon, the data preprocessing time must be kept to a minimum. This paper introduces a new data preprocessing strategy suitable for the progressive purging of noisy data from the training dataset without the need to process the whole dataset at one time. This strategy is shown via a computer simulation to provide the significant benefit of allowing for the dynamic removal of bad records from the incremental classifier learning process.
url	http://dx.doi.org/10.1155/2015/125781
work_keys_str_mv	AT simonfong alightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT robertpbiukaghai alightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT yainwharsi alightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT beewahyap alightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT simonfong lightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT robertpbiukaghai lightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT yainwharsi lightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning AT beewahyap lightweightdatapreprocessingstrategywithfastcontradictionanalysisforincrementalclassifierlearning
_version_	1725704104452292608

A Lightweight Data Preprocessing Strategy with Fast Contradiction Analysis for Incremental Classifier Learning

Similar Items