Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus

碩士 === 國立中山大學 === 資訊管理學系研究所 === 94 === According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clusteri...

Full description

Bibliographic Details
Main Authors: Hao-hsiang Lin, 林浩翔
Other Authors: Chih-Ping Wei
Format: Others
Language:en_US
Published: 2006
Online Access:http://ndltd.ncl.edu.tw/handle/83257267183926682714
id ndltd-TW-094NSYS5396083
record_format oai_dc
spelling ndltd-TW-094NSYS53960832016-05-27T04:18:11Z http://ndltd.ncl.edu.tw/handle/83257267183926682714 Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus 偏好引導的情境式文件分群技術:字詞關係及統計式字典之影響 Hao-hsiang Lin 林浩翔 碩士 國立中山大學 資訊管理學系研究所 94 According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users’ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user’s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates “whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is “whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.” Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering. Chih-Ping Wei 魏志平 2006 學位論文 ; thesis 50 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊管理學系研究所 === 94 === According to the context theory of classification, the document-clustering behaviors of individuals not only involve the attributes (including contents) of documents but also depend on who is doing the task and in what context. Thus, effective document-clustering techniques need to be able to take into account users’ categorization preferences and thus can generate document clusters from different preferential perspectives. The Preference-Anchored Document Clustering (PAC) technique was proposed for supporting preference-based document-clustering. Specifically, PAC takes a user’s categorization preference into consideration and subsequently generates a set of document clusters from this specific preferential perspective. In this study, we attempt to investigate two research questions concerning the PAC technique. The first research question investigates “whether the incorporation of the broader-term expansion (i.e., the proposed PAC2 technique in this study) will improve the effectiveness of preference-based document-clustering, whereas the second research question is “whether the use of a statistical-based thesaurus constructed from a larger document corpus will improve the effectiveness of preference-based document-clustering.” Compared with the effectiveness achieved by PAC, our empirical results show that the proposed PAC2 technique neither improves nor deteriorates the effectiveness of preference-based document-clustering when the complete set of anchoring terms is used. However, when only a partial set of anchoring terms is provided, PAC2 cannot improve and even deteriorate the effectiveness of preference-based document-clustering. As to the second research question, our empirical results suggest the use of a statistical-based thesaurus constructed from a larger document corpus (i.e., the ACM corpus consisting of 14,729 documents) does not improve the effectiveness of PAC and PAC2 for preference-based document-clustering.
author2 Chih-Ping Wei
author_facet Chih-Ping Wei
Hao-hsiang Lin
林浩翔
author Hao-hsiang Lin
林浩翔
spellingShingle Hao-hsiang Lin
林浩翔
Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
author_sort Hao-hsiang Lin
title Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
title_short Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
title_full Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
title_fullStr Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
title_full_unstemmed Preference-Anchored Document Clustering Technique: Effects of Term Relationships and Thesaurus
title_sort preference-anchored document clustering technique: effects of term relationships and thesaurus
publishDate 2006
url http://ndltd.ncl.edu.tw/handle/83257267183926682714
work_keys_str_mv AT haohsianglin preferenceanchoreddocumentclusteringtechniqueeffectsoftermrelationshipsandthesaurus
AT línhàoxiáng preferenceanchoreddocumentclusteringtechniqueeffectsoftermrelationshipsandthesaurus
AT haohsianglin piānhǎoyǐndǎodeqíngjìngshìwénjiànfēnqúnjìshùzìcíguānxìjítǒngjìshìzìdiǎnzhīyǐngxiǎng
AT línhàoxiáng piānhǎoyǐndǎodeqíngjìngshìwénjiànfēnqúnjìshùzìcíguānxìjítǒngjìshìzìdiǎnzhīyǐngxiǎng
_version_ 1718282141743185920