A Study of Automatic Text Categorization based on Directional Term Structure

碩士 === 國立中興大學 === 資訊管理學系所 === 98 === In the mainstream research of automatic text categorization, rule-based classifier provides an interesting advantage, the interpretability. Because the rules in the classifier are composed by terms, thus they can be easily understood, modified and maintained by h...

Full description

Bibliographic Details
Main Authors:	Chia-Chuan Wu, 吳家銓
Other Authors:	沈肇基
Format:	Others
Language:	en_US
Published:	2010
Online Access:	http://ndltd.ncl.edu.tw/handle/25870222133144392967

id	ndltd-TW-098NCHU5396002
record_format	oai_dc
spelling	ndltd-TW-098NCHU53960022015-10-30T04:05:02Z http://ndltd.ncl.edu.tw/handle/25870222133144392967 A Study of Automatic Text Categorization based on Directional Term Structure 運用有向詞彙結構於文件自動分類之研究 Chia-Chuan Wu 吳家銓碩士國立中興大學資訊管理學系所 98 In the mainstream research of automatic text categorization, rule-based classifier provides an interesting advantage, the interpretability. Because the rules in the classifier are composed by terms, thus they can be easily understood, modified and maintained by human. And the classifier’s accuracy is also competitive when compared to other accurate classifiers such as SVM and Bayes net, so rule-based classification techniques are very popular. Today’s rule-based techniques identify valuable patterns from training documents to construct classification rules. These techniques did not consider the relationship between terms and paragraphs in documents. Since there must be a dominant topic throughout a document’s content, thus generates certain term structures across paragraphs. Therefore, this study presents a new concept, Meaningful Inner Link Object-MILO, by finding underlying directional term links across paragraphs of document for text categorization. In this study, the process of MILO for text categorization consists of four main procedures. Firstly, feature selection, the purpose is to find representative terms to compose MILO from training documents which have a great quantity of noises terms. Secondly, MILO filtering, through the number of MILOs can be more than ten thousand, to measure MILO’s quality is an important issue, by filtering useless MILOs, the accuracy can be improved. Thirdly, the designing of a scoring model, to correctly classify document, an effective model is needed to accurately assign category to unlabeled document. Finally, classification structures, traditional techniques only use one classifier for classification, while this study presents a hierarchical classification structure to improve accuracy. Summary of our method, firstly, a novel method is presented by observing term’s distribution in document paragraphs to extract MILOs for text categorization. Secondly, an improved method is presented by eliminating noises MILOs and using a hierarchical classification structure. The experimental results of the two methods in this study show competitive performance on famous benchmarks such as Reuters, WebKB and Ohsumed. 沈肇基 2010 學位論文 ; thesis 87 en_US
collection	NDLTD
language	en_US
format	Others
sources	NDLTD
description	碩士 === 國立中興大學 === 資訊管理學系所 === 98 === In the mainstream research of automatic text categorization, rule-based classifier provides an interesting advantage, the interpretability. Because the rules in the classifier are composed by terms, thus they can be easily understood, modified and maintained by human. And the classifier’s accuracy is also competitive when compared to other accurate classifiers such as SVM and Bayes net, so rule-based classification techniques are very popular. Today’s rule-based techniques identify valuable patterns from training documents to construct classification rules. These techniques did not consider the relationship between terms and paragraphs in documents. Since there must be a dominant topic throughout a document’s content, thus generates certain term structures across paragraphs. Therefore, this study presents a new concept, Meaningful Inner Link Object-MILO, by finding underlying directional term links across paragraphs of document for text categorization. In this study, the process of MILO for text categorization consists of four main procedures. Firstly, feature selection, the purpose is to find representative terms to compose MILO from training documents which have a great quantity of noises terms. Secondly, MILO filtering, through the number of MILOs can be more than ten thousand, to measure MILO’s quality is an important issue, by filtering useless MILOs, the accuracy can be improved. Thirdly, the designing of a scoring model, to correctly classify document, an effective model is needed to accurately assign category to unlabeled document. Finally, classification structures, traditional techniques only use one classifier for classification, while this study presents a hierarchical classification structure to improve accuracy. Summary of our method, firstly, a novel method is presented by observing term’s distribution in document paragraphs to extract MILOs for text categorization. Secondly, an improved method is presented by eliminating noises MILOs and using a hierarchical classification structure. The experimental results of the two methods in this study show competitive performance on famous benchmarks such as Reuters, WebKB and Ohsumed.
author2	沈肇基
author_facet	沈肇基 Chia-Chuan Wu 吳家銓
author	Chia-Chuan Wu 吳家銓
spellingShingle	Chia-Chuan Wu 吳家銓 A Study of Automatic Text Categorization based on Directional Term Structure
author_sort	Chia-Chuan Wu
title	A Study of Automatic Text Categorization based on Directional Term Structure
title_short	A Study of Automatic Text Categorization based on Directional Term Structure
title_full	A Study of Automatic Text Categorization based on Directional Term Structure
title_fullStr	A Study of Automatic Text Categorization based on Directional Term Structure
title_full_unstemmed	A Study of Automatic Text Categorization based on Directional Term Structure
title_sort	study of automatic text categorization based on directional term structure
publishDate	2010
url	http://ndltd.ncl.edu.tw/handle/25870222133144392967
work_keys_str_mv	AT chiachuanwu astudyofautomatictextcategorizationbasedondirectionaltermstructure AT wújiāquán astudyofautomatictextcategorizationbasedondirectionaltermstructure AT chiachuanwu yùnyòngyǒuxiàngcíhuìjiégòuyúwénjiànzìdòngfēnlèizhīyánjiū AT wújiāquán yùnyòngyǒuxiàngcíhuìjiégòuyúwénjiànzìdòngfēnlèizhīyánjiū AT chiachuanwu studyofautomatictextcategorizationbasedondirectionaltermstructure AT wújiāquán studyofautomatictextcategorizationbasedondirectionaltermstructure
_version_	1718115701692039168

A Study of Automatic Text Categorization based on Directional Term Structure

Similar Items