A Study on Text Categorization Using Combination of Interest Measure and Term Weight
碩士 === 國立臺灣科技大學 === 資訊管理系 === 95 === Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem. In...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/9ed62f |
id |
ndltd-TW-095NTUS5396033 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NTUS53960332019-05-15T19:48:43Z http://ndltd.ncl.edu.tw/handle/9ed62f A Study on Text Categorization Using Combination of Interest Measure and Term Weight 有趣性度量結合詞彙權重之文件分類研究 Li-Fei Wei 魏莉斐 碩士 國立臺灣科技大學 資訊管理系 95 Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem. In data mining, exploration of association rules is an important research issue. Most association rule researches focus on finding positive association rules. However, many studies point out that negative association rules are as important as positive association rules. Therefore, in this thesis, we will find out both positive and negative association rules. Although interest measure is a commonly-used measure for text categorization, we find that it is not enough to use interest measure only for text categorization. Some researches use correlation coefficient to judge the strength of a rule, but correlation coefficient only considers absence or presence between terms, not the weight of terms. Besides, it is important to consider term frequencies in categorization. Hence, we would like to combine interest and term weight to enhance the discriminative power of positive and negative association rules. It will be used to filter association rules to make these rules more meaningful and more representative for the classification criterion of a category. Therefore, the categorization results can be improved and new documents can be classified correctly. Chiun-Chieh Hsu 徐俊傑 2007 學位論文 ; thesis 61 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊管理系 === 95 === Due to the popularity of World Wide Web, there exist a large amount of digital documensts on the Internet. Because text categorization can make it more easily to deal with these documensts, it attracts many researchers to study the text categorization problem.
In data mining, exploration of association rules is an important research issue. Most association rule researches focus on finding positive association rules. However, many studies point out that negative association rules are as important as positive association rules.
Therefore, in this thesis, we will find out both positive and negative association rules. Although interest measure is a commonly-used measure for text categorization, we find that it is not enough to use interest measure only for text categorization. Some researches use correlation coefficient to judge the strength of a rule, but correlation coefficient only considers absence or presence between terms, not the weight of terms. Besides, it is important to consider term frequencies in categorization. Hence, we would like to combine interest and term weight to enhance the discriminative power of positive and negative association rules. It will be used to filter association rules to make these rules more meaningful and more representative for the classification criterion of a category. Therefore, the categorization results can be improved and new documents can be classified correctly.
|
author2 |
Chiun-Chieh Hsu |
author_facet |
Chiun-Chieh Hsu Li-Fei Wei 魏莉斐 |
author |
Li-Fei Wei 魏莉斐 |
spellingShingle |
Li-Fei Wei 魏莉斐 A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
author_sort |
Li-Fei Wei |
title |
A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
title_short |
A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
title_full |
A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
title_fullStr |
A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
title_full_unstemmed |
A Study on Text Categorization Using Combination of Interest Measure and Term Weight |
title_sort |
study on text categorization using combination of interest measure and term weight |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/9ed62f |
work_keys_str_mv |
AT lifeiwei astudyontextcategorizationusingcombinationofinterestmeasureandtermweight AT wèilìfěi astudyontextcategorizationusingcombinationofinterestmeasureandtermweight AT lifeiwei yǒuqùxìngdùliàngjiéhécíhuìquánzhòngzhīwénjiànfēnlèiyánjiū AT wèilìfěi yǒuqùxìngdùliàngjiéhécíhuìquánzhòngzhīwénjiànfēnlèiyánjiū AT lifeiwei studyontextcategorizationusingcombinationofinterestmeasureandtermweight AT wèilìfěi studyontextcategorizationusingcombinationofinterestmeasureandtermweight |
_version_ |
1719095477233778688 |