A Study on the Techniques and the Evaluation of Automatic Text Classification

碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and p...

Full description

Bibliographic Details
Main Authors: Feng-Yueh Lu, 陸鳳玥
Other Authors: Ling-Ling Wu
Format: Others
Language:zh-TW
Published: 2005
Online Access:http://ndltd.ncl.edu.tw/handle/83pa9c
id ndltd-TW-093NTU05396007
record_format oai_dc
spelling ndltd-TW-093NTU053960072019-05-15T19:37:50Z http://ndltd.ncl.edu.tw/handle/83pa9c A Study on the Techniques and the Evaluation of Automatic Text Classification 文件自動分類技術與成效評估之探討 Feng-Yueh Lu 陸鳳玥 碩士 國立臺灣大學 資訊管理學研究所 93 Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management. The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems. In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes. Ling-Ling Wu 吳玲玲 2005 學位論文 ; thesis 62 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management. The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems. In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes.
author2 Ling-Ling Wu
author_facet Ling-Ling Wu
Feng-Yueh Lu
陸鳳玥
author Feng-Yueh Lu
陸鳳玥
spellingShingle Feng-Yueh Lu
陸鳳玥
A Study on the Techniques and the Evaluation of Automatic Text Classification
author_sort Feng-Yueh Lu
title A Study on the Techniques and the Evaluation of Automatic Text Classification
title_short A Study on the Techniques and the Evaluation of Automatic Text Classification
title_full A Study on the Techniques and the Evaluation of Automatic Text Classification
title_fullStr A Study on the Techniques and the Evaluation of Automatic Text Classification
title_full_unstemmed A Study on the Techniques and the Evaluation of Automatic Text Classification
title_sort study on the techniques and the evaluation of automatic text classification
publishDate 2005
url http://ndltd.ncl.edu.tw/handle/83pa9c
work_keys_str_mv AT fengyuehlu astudyonthetechniquesandtheevaluationofautomatictextclassification
AT lùfèngyuè astudyonthetechniquesandtheevaluationofautomatictextclassification
AT fengyuehlu wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo
AT lùfèngyuè wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo
AT fengyuehlu studyonthetechniquesandtheevaluationofautomatictextclassification
AT lùfèngyuè studyonthetechniquesandtheevaluationofautomatictextclassification
_version_ 1719092243245039616