Overfitting Reduction of Text Classification Based on AdaBELM

Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this p...

Full description

Bibliographic Details
Main Authors: Xiaoyue Feng, Yanchun Liang, Xiaohu Shi, Dong Xu, Xu Wang, Renchu Guan
Format: Article
Language:English
Published: MDPI AG 2017-07-01
Series:Entropy
Subjects:
Online Access:https://www.mdpi.com/1099-4300/19/7/330
id doaj-34ba0cfca0a54765afaaf98a7e060333
record_format Article
spelling doaj-34ba0cfca0a54765afaaf98a7e0603332020-11-25T00:57:51ZengMDPI AGEntropy1099-43002017-07-0119733010.3390/e19070330e19070330Overfitting Reduction of Text Classification Based on AdaBELMXiaoyue Feng0Yanchun Liang1Xiaohu Shi2Dong Xu3Xu Wang4Renchu Guan5Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaOverfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.https://www.mdpi.com/1099-4300/19/7/330machine learningoverfittingAdaBoostfeedforward neural networkextreme learning machine
collection DOAJ
language English
format Article
sources DOAJ
author Xiaoyue Feng
Yanchun Liang
Xiaohu Shi
Dong Xu
Xu Wang
Renchu Guan
spellingShingle Xiaoyue Feng
Yanchun Liang
Xiaohu Shi
Dong Xu
Xu Wang
Renchu Guan
Overfitting Reduction of Text Classification Based on AdaBELM
Entropy
machine learning
overfitting
AdaBoost
feedforward neural network
extreme learning machine
author_facet Xiaoyue Feng
Yanchun Liang
Xiaohu Shi
Dong Xu
Xu Wang
Renchu Guan
author_sort Xiaoyue Feng
title Overfitting Reduction of Text Classification Based on AdaBELM
title_short Overfitting Reduction of Text Classification Based on AdaBELM
title_full Overfitting Reduction of Text Classification Based on AdaBELM
title_fullStr Overfitting Reduction of Text Classification Based on AdaBELM
title_full_unstemmed Overfitting Reduction of Text Classification Based on AdaBELM
title_sort overfitting reduction of text classification based on adabelm
publisher MDPI AG
series Entropy
issn 1099-4300
publishDate 2017-07-01
description Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.
topic machine learning
overfitting
AdaBoost
feedforward neural network
extreme learning machine
url https://www.mdpi.com/1099-4300/19/7/330
work_keys_str_mv AT xiaoyuefeng overfittingreductionoftextclassificationbasedonadabelm
AT yanchunliang overfittingreductionoftextclassificationbasedonadabelm
AT xiaohushi overfittingreductionoftextclassificationbasedonadabelm
AT dongxu overfittingreductionoftextclassificationbasedonadabelm
AT xuwang overfittingreductionoftextclassificationbasedonadabelm
AT renchuguan overfittingreductionoftextclassificationbasedonadabelm
_version_ 1725222617203343360