A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance

Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal...

Full description

Bibliographic Details
Main Authors: Ge Song, Yunming Ye
Format: Article
Language:English
Published: Hindawi Limited 2014-01-01
Series:The Scientific World Journal
Online Access:http://dx.doi.org/10.1155/2014/497354
id doaj-47303f5d4d2945e0a4463cc83179633f
record_format Article
spelling doaj-47303f5d4d2945e0a4463cc83179633f2020-11-25T00:50:37ZengHindawi LimitedThe Scientific World Journal2356-61401537-744X2014-01-01201410.1155/2014/497354497354A Dynamic Ensemble Framework for Mining Textual Streams with Class ImbalanceGe Song0Yunming Ye1Shenzhen Key Laboratory of Internet Information Collaboration, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, ChinaShenzhen Key Laboratory of Internet Information Collaboration, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, ChinaTextual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.http://dx.doi.org/10.1155/2014/497354
collection DOAJ
language English
format Article
sources DOAJ
author Ge Song
Yunming Ye
spellingShingle Ge Song
Yunming Ye
A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
The Scientific World Journal
author_facet Ge Song
Yunming Ye
author_sort Ge Song
title A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
title_short A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
title_full A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
title_fullStr A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
title_full_unstemmed A Dynamic Ensemble Framework for Mining Textual Streams with Class Imbalance
title_sort dynamic ensemble framework for mining textual streams with class imbalance
publisher Hindawi Limited
series The Scientific World Journal
issn 2356-6140
1537-744X
publishDate 2014-01-01
description Textual stream classification has become a realistic and challenging issue since large-scale, high-dimensional, and non-stationary streams with class imbalance have been widely used in various real-life applications. According to the characters of textual streams, it is technically difficult to deal with the classification of textual stream, especially in imbalanced environment. In this paper, we propose a new ensemble framework, clustering forest, for learning from the textual imbalanced stream with concept drift (CFIM). The CFIM is based on ensemble learning by integrating a set of clustering trees (CTs). An adaptive selection method, which flexibly chooses the useful CTs by the property of the stream, is presented in CFIM. In particular, to deal with the problem of class imbalance, we collect and reuse both rare-class instances and misclassified instances from the historical chunks. Compared to most existing approaches, it is worth pointing out that our approach assumes that both majority class and rareclass may suffer from concept drift. Thus the distribution of resampled instances is similar to the current concept. The effectiveness of CFIM is examined in five real-world textual streams under an imbalanced nonstationary environment. Experimental results demonstrate that CFIM achieves better performance than four state-of-the-art ensemble models.
url http://dx.doi.org/10.1155/2014/497354
work_keys_str_mv AT gesong adynamicensembleframeworkforminingtextualstreamswithclassimbalance
AT yunmingye adynamicensembleframeworkforminingtextualstreamswithclassimbalance
AT gesong dynamicensembleframeworkforminingtextualstreamswithclassimbalance
AT yunmingye dynamicensembleframeworkforminingtextualstreamswithclassimbalance
_version_ 1725247445359656960