Overfitting Reduction of Text Classification Based on AdaBELM
Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this p...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2017-07-01
|
Series: | Entropy |
Subjects: | |
Online Access: | https://www.mdpi.com/1099-4300/19/7/330 |
id |
doaj-34ba0cfca0a54765afaaf98a7e060333 |
---|---|
record_format |
Article |
spelling |
doaj-34ba0cfca0a54765afaaf98a7e0603332020-11-25T00:57:51ZengMDPI AGEntropy1099-43002017-07-0119733010.3390/e19070330e19070330Overfitting Reduction of Text Classification Based on AdaBELMXiaoyue Feng0Yanchun Liang1Xiaohu Shi2Dong Xu3Xu Wang4Renchu Guan5Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaKey Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, ChinaOverfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability.https://www.mdpi.com/1099-4300/19/7/330machine learningoverfittingAdaBoostfeedforward neural networkextreme learning machine |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan |
spellingShingle |
Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan Overfitting Reduction of Text Classification Based on AdaBELM Entropy machine learning overfitting AdaBoost feedforward neural network extreme learning machine |
author_facet |
Xiaoyue Feng Yanchun Liang Xiaohu Shi Dong Xu Xu Wang Renchu Guan |
author_sort |
Xiaoyue Feng |
title |
Overfitting Reduction of Text Classification Based on AdaBELM |
title_short |
Overfitting Reduction of Text Classification Based on AdaBELM |
title_full |
Overfitting Reduction of Text Classification Based on AdaBELM |
title_fullStr |
Overfitting Reduction of Text Classification Based on AdaBELM |
title_full_unstemmed |
Overfitting Reduction of Text Classification Based on AdaBELM |
title_sort |
overfitting reduction of text classification based on adabelm |
publisher |
MDPI AG |
series |
Entropy |
issn |
1099-4300 |
publishDate |
2017-07-01 |
description |
Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability. |
topic |
machine learning overfitting AdaBoost feedforward neural network extreme learning machine |
url |
https://www.mdpi.com/1099-4300/19/7/330 |
work_keys_str_mv |
AT xiaoyuefeng overfittingreductionoftextclassificationbasedonadabelm AT yanchunliang overfittingreductionoftextclassificationbasedonadabelm AT xiaohushi overfittingreductionoftextclassificationbasedonadabelm AT dongxu overfittingreductionoftextclassificationbasedonadabelm AT xuwang overfittingreductionoftextclassificationbasedonadabelm AT renchuguan overfittingreductionoftextclassificationbasedonadabelm |
_version_ |
1725222617203343360 |