A Study on the Techniques and the Evaluation of Automatic Text Classification

碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and p...

Full description

Bibliographic Details
Main Authors:	Feng-Yueh Lu, 陸鳳玥
Other Authors:	Ling-Ling Wu
Format:	Others
Language:	zh-TW
Published:	2005
Online Access:	http://ndltd.ncl.edu.tw/handle/83pa9c

id	ndltd-TW-093NTU05396007
record_format	oai_dc
spelling	ndltd-TW-093NTU053960072019-05-15T19:37:50Z http://ndltd.ncl.edu.tw/handle/83pa9c A Study on the Techniques and the Evaluation of Automatic Text Classification 文件自動分類技術與成效評估之探討 Feng-Yueh Lu 陸鳳玥碩士國立臺灣大學資訊管理學研究所 93 Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management. The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems. In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes. Ling-Ling Wu 吳玲玲 2005 學位論文 ; thesis 62 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management. The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems. In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes.
author2	Ling-Ling Wu
author_facet	Ling-Ling Wu Feng-Yueh Lu 陸鳳玥
author	Feng-Yueh Lu 陸鳳玥
spellingShingle	Feng-Yueh Lu 陸鳳玥 A Study on the Techniques and the Evaluation of Automatic Text Classification
author_sort	Feng-Yueh Lu
title	A Study on the Techniques and the Evaluation of Automatic Text Classification
title_short	A Study on the Techniques and the Evaluation of Automatic Text Classification
title_full	A Study on the Techniques and the Evaluation of Automatic Text Classification
title_fullStr	A Study on the Techniques and the Evaluation of Automatic Text Classification
title_full_unstemmed	A Study on the Techniques and the Evaluation of Automatic Text Classification
title_sort	study on the techniques and the evaluation of automatic text classification
publishDate	2005
url	http://ndltd.ncl.edu.tw/handle/83pa9c
work_keys_str_mv	AT fengyuehlu astudyonthetechniquesandtheevaluationofautomatictextclassification AT lùfèngyuè astudyonthetechniquesandtheevaluationofautomatictextclassification AT fengyuehlu wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo AT lùfèngyuè wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo AT fengyuehlu studyonthetechniquesandtheevaluationofautomatictextclassification AT lùfèngyuè studyonthetechniquesandtheevaluationofautomatictextclassification
_version_	1719092243245039616

A Study on the Techniques and the Evaluation of Automatic Text Classification

Similar Items