A Study on the Techniques and the Evaluation of Automatic Text Classification
碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and p...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2005
|
Online Access: | http://ndltd.ncl.edu.tw/handle/83pa9c |
id |
ndltd-TW-093NTU05396007 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-093NTU053960072019-05-15T19:37:50Z http://ndltd.ncl.edu.tw/handle/83pa9c A Study on the Techniques and the Evaluation of Automatic Text Classification 文件自動分類技術與成效評估之探討 Feng-Yueh Lu 陸鳳玥 碩士 國立臺灣大學 資訊管理學研究所 93 Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management. The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems. In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes. Ling-Ling Wu 吳玲玲 2005 學位論文 ; thesis 62 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 資訊管理學研究所 === 93 === Knowledge bases in a corporation have to process thousands of text-based information every day. Those include competitors’ information, industrial analysis reports, and customer requirements outside the corporation; financial statements, technique reports, and patterns inside the corporation, which are considered crucial for business operation. However, the processes of collecting, filtering, and filing are time and labor consuming tasks. Hence, automatic text classification is required to solve the problem. The issue about the employment of automatic techniques to improve manual classification performance and to meet the requirements of considerable quantities of classification tasks has been raised in the area of information services and knowledge management.
The appropriateness of hierarchy of the knowledge base in the company, the representiveness of texts in the classes, and the consistency of data collection will all affect the performance of text classification. In addition, the method of selecting key terms, the level of understanding of unknown texts, how to achieve the equilibrium between speed and accuracy should be taken into consideration during the construction of automatic text classification systems.
In this research, an automatic text classification system is implemented, and the texts are gathered from the Sinica Corpus. Some machine learning methods and non-machine learning methods will be compared in the thesis. Besides, the effect of varying level of understanding about texts will also be measured. Furthermore, the method of measuring corpus similarity and homogeneity is applied to the classes, in order to measure the appropriateness of predefined classes or texts in those classes.
|
author2 |
Ling-Ling Wu |
author_facet |
Ling-Ling Wu Feng-Yueh Lu 陸鳳玥 |
author |
Feng-Yueh Lu 陸鳳玥 |
spellingShingle |
Feng-Yueh Lu 陸鳳玥 A Study on the Techniques and the Evaluation of Automatic Text Classification |
author_sort |
Feng-Yueh Lu |
title |
A Study on the Techniques and the Evaluation of Automatic Text Classification |
title_short |
A Study on the Techniques and the Evaluation of Automatic Text Classification |
title_full |
A Study on the Techniques and the Evaluation of Automatic Text Classification |
title_fullStr |
A Study on the Techniques and the Evaluation of Automatic Text Classification |
title_full_unstemmed |
A Study on the Techniques and the Evaluation of Automatic Text Classification |
title_sort |
study on the techniques and the evaluation of automatic text classification |
publishDate |
2005 |
url |
http://ndltd.ncl.edu.tw/handle/83pa9c |
work_keys_str_mv |
AT fengyuehlu astudyonthetechniquesandtheevaluationofautomatictextclassification AT lùfèngyuè astudyonthetechniquesandtheevaluationofautomatictextclassification AT fengyuehlu wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo AT lùfèngyuè wénjiànzìdòngfēnlèijìshùyǔchéngxiàopínggūzhītàntǎo AT fengyuehlu studyonthetechniquesandtheevaluationofautomatictextclassification AT lùfèngyuè studyonthetechniquesandtheevaluationofautomatictextclassification |
_version_ |
1719092243245039616 |