Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
碩士 === 國立中山大學 === 資訊管理學系研究所 === 94 === According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clusteri...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2006
|
Online Access: | http://ndltd.ncl.edu.tw/handle/83257267183926682714 |
id |
ndltd-TW-094NSYS5396083 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-094NSYS53960832016-05-27T04:18:11Z http://ndltd.ncl.edu.tw/handle/83257267183926682714 Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus 偏好引導的情境式文件分群技術:字詞關係及統計式字典之影響 Hao-hsiang Lin 林浩翔 碩士 國立中山大學 資訊管理學系研究所 94 According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users’ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user’s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates “whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is “whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.” Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering. Chih-Ping Wei 魏志平 2006 學位論文 ; thesis 50 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊管理學系研究所 === 94 === According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users’ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user’s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates “whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is “whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.” Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering.
|
author2 |
Chih-Ping Wei |
author_facet |
Chih-Ping Wei Hao-hsiang Lin 林浩翔 |
author |
Hao-hsiang Lin 林浩翔 |
spellingShingle |
Hao-hsiang Lin 林浩翔 Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
author_sort |
Hao-hsiang Lin |
title |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
title_short |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
title_full |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
title_fullStr |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
title_full_unstemmed |
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus |
title_sort |
preference-anchored document clustering technique: effects of term relationships and thesaurus |
publishDate |
2006 |
url |
http://ndltd.ncl.edu.tw/handle/83257267183926682714 |
work_keys_str_mv |
AT haohsianglin preferenceanchoreddocumentclusteringtechniqueeffectsoftermrelationshipsandthesaurus AT línhàoxiáng preferenceanchoreddocumentclusteringtechniqueeffectsoftermrelationshipsandthesaurus AT haohsianglin piānhǎoyǐndǎodeqíngjìngshìwénjiànfēnqúnjìshùzìcíguānxìjítǒngjìshìzìdiǎnzhīyǐngxiǎng AT línhàoxiáng piānhǎoyǐndǎodeqíngjìngshìwénjiànfēnqúnjìshùzìcíguānxìjítǒngjìshìzìdiǎnzhīyǐngxiǎng |
_version_ |
1718282141743185920 |