A Study on Text Categorization Using Combination of Interest Measure and Term Weight

碩士 === 國立臺灣科技大學 === 資訊管理系 === 95 === Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem. In...

Full description

Bibliographic Details
Main Authors: Li-Fei Wei, 魏莉斐
Other Authors: Chiun-Chieh Hsu
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/9ed62f
id ndltd-TW-095NTUS5396033
record_format oai_dc
spelling ndltd-TW-095NTUS53960332019-05-15T19:48:43Z http://ndltd.ncl.edu.tw/handle/9ed62f A Study on Text Categorization Using Combination of Interest Measure and Term Weight 有趣性度量結合詞彙權重之文件分類研究 Li-Fei Wei 魏莉斐 碩士 國立臺灣科技大學 資訊管理系 95 Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem. In data mining, exploration of association rules is an important research issue. Most association rule researches focus on finding positive association rules. However, many studies point out that negative association rules are as important as positive association rules. Therefore, in this thesis, we will find out both positive and negative association rules. Although interest measure is a commonly-used measure for text categorization, we find that it is not enough to use interest measure only for text categorization. Some researches use correlation coefficient to judge the strength of a rule, but correlation coefficient only considers absence or presence between terms, not the weight of terms. Besides, it is important to consider term frequencies in categorization. Hence, we would like to combine interest and term weight to enhance the discriminative power of positive and negative association rules. It will be used to filter association rules to make these rules more meaningful and more representative for the classification criterion of a category. Therefore, the categorization results can be improved and new documents can be classified correctly. Chiun-Chieh Hsu 徐俊傑 2007 學位論文 ; thesis 61 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 95 === Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem. In data mining, exploration of association rules is an important research issue. Most association rule researches focus on finding positive association rules. However, many studies point out that negative association rules are as important as positive association rules. Therefore, in this thesis, we will find out both positive and negative association rules. Although interest measure is a commonly-used measure for text categorization, we find that it is not enough to use interest measure only for text categorization. Some researches use correlation coefficient to judge the strength of a rule, but correlation coefficient only considers absence or presence between terms, not the weight of terms. Besides, it is important to consider term frequencies in categorization. Hence, we would like to combine interest and term weight to enhance the discriminative power of positive and negative association rules. It will be used to filter association rules to make these rules more meaningful and more representative for the classification criterion of a category. Therefore, the categorization results can be improved and new documents can be classified correctly.
author2 Chiun-Chieh Hsu
author_facet Chiun-Chieh Hsu
Li-Fei Wei
魏莉斐
author Li-Fei Wei
魏莉斐
spellingShingle Li-Fei Wei
魏莉斐
A Study on Text Categorization Using Combination of Interest Measure and Term Weight
author_sort Li-Fei Wei
title A Study on Text Categorization Using Combination of Interest Measure and Term Weight
title_short A Study on Text Categorization Using Combination of Interest Measure and Term Weight
title_full A Study on Text Categorization Using Combination of Interest Measure and Term Weight
title_fullStr A Study on Text Categorization Using Combination of Interest Measure and Term Weight
title_full_unstemmed A Study on Text Categorization Using Combination of Interest Measure and Term Weight
title_sort study on text categorization using combination of interest measure and term weight
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/9ed62f
work_keys_str_mv AT lifeiwei astudyontextcategorizationusingcombinationofinterestmeasureandtermweight
AT wèilìfěi astudyontextcategorizationusingcombinationofinterestmeasureandtermweight
AT lifeiwei yǒuqùxìngdùliàngjiéhécíhuìquánzhòngzhīwénjiànfēnlèiyánjiū
AT wèilìfěi yǒuqùxìngdùliàngjiéhécíhuìquánzhòngzhīwénjiànfēnlèiyánjiū
AT lifeiwei studyontextcategorizationusingcombinationofinterestmeasureandtermweight
AT wèilìfěi studyontextcategorizationusingcombinationofinterestmeasureandtermweight
_version_ 1719095477233778688