Overfitting Reduction of Text Classification Based on AdaBELM

Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this p...

Full description

Bibliographic Details
Main Authors:	Xiaoyue Feng, Yanchun Liang, Xiaohu Shi, Dong Xu, Xu Wang, Renchu Guan
Format:	Article
Language:	English
Published:	MDPI AG 2017-07-01
Series:	Entropy
Subjects:	machine learning overfitting AdaBoost feedforward neural network extreme learning machine
Online Access:	https://www.mdpi.com/1099-4300/19/7/330

id	doaj-34ba0cfca0a54765afaaf98a7e060333
record_format	Article
spelling	doaj-34ba0cfca0a54765afaaf98a7e0603332020-11-25T00:57:51ZengMDPI AGEntropy1099-43002017-07-0119733010.3390/e19070330e19070330Overfitting Reduction of Text Classification Based on AdaBELMXiaoyue Feng0Yanchun Liang1Xiaohu Shi2Dong Xu3Xu Wang4Renchu Guan5Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaOverfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.https://www.mdpi.com/1099-4300/19/7/330machine learningoverfittingAdaBoostfeedforward neural networkextreme learning machine
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan
spellingShingle	Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan Overfitting Reduction of Text Classification Based on AdaBELM Entropy machine learning overfitting AdaBoost feedforward neural network extreme learning machine
author_facet	Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan
author_sort	Xiaoyue Feng
title	Overfitting Reduction of Text Classification Based on AdaBELM
title_short	Overfitting Reduction of Text Classification Based on AdaBELM
title_full	Overfitting Reduction of Text Classification Based on AdaBELM
title_fullStr	Overfitting Reduction of Text Classification Based on AdaBELM
title_full_unstemmed	Overfitting Reduction of Text Classification Based on AdaBELM
title_sort	overfitting reduction of text classification based on adabelm
publisher	MDPI AG
series	Entropy
issn	1099-4300
publishDate	2017-07-01
description	Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.
topic	machine learning overfitting AdaBoost feedforward neural network extreme learning machine
url	https://www.mdpi.com/1099-4300/19/7/330
work_keys_str_mv	AT xiaoyuefeng overfittingreductionoftextclassificationbasedonadabelm AT yanchunliang overfittingreductionoftextclassificationbasedonadabelm AT xiaohushi overfittingreductionoftextclassificationbasedonadabelm AT dongxu overfittingreductionoftextclassificationbasedonadabelm AT xuwang overfittingreductionoftextclassificationbasedonadabelm AT renchuguan overfittingreductionoftextclassificationbasedonadabelm
_version_	1725222617203343360

Overfitting Reduction of Text Classification Based on AdaBELM

Similar Items