Clustering Articles in a Literature Digital Library Based on Content and Usage

碩士 === 國立中山大學 === 資訊管理學系研究所 === 92 === Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsi...

Full description

Bibliographic Details
Main Authors: Kang-Di Ting, 丁康迪
Other Authors: San-Yih Hwang
Format: Others
Language:en_US
Published: 2004
Online Access:http://ndltd.ncl.edu.tw/handle/21635162477974617535
id ndltd-TW-092NSYS5396069
record_format oai_dc
spelling ndltd-TW-092NSYS53960692015-10-13T13:05:08Z http://ndltd.ncl.edu.tw/handle/21635162477974617535 Clustering Articles in a Literature Digital Library Based on Content and Usage 結合文件內容和使用紀錄的文獻數位圖書館文件分群技術 Kang-Di Ting 丁康迪 碩士 國立中山大學 資訊管理學系研究所 92 Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis. Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user’s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster. San-Yih Hwang 黃三益 2004 學位論文 ; thesis 52 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立中山大學 === 資訊管理學系研究所 === 92 === Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis. Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user’s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.
author2 San-Yih Hwang
author_facet San-Yih Hwang
Kang-Di Ting
丁康迪
author Kang-Di Ting
丁康迪
spellingShingle Kang-Di Ting
丁康迪
Clustering Articles in a Literature Digital Library Based on Content and Usage
author_sort Kang-Di Ting
title Clustering Articles in a Literature Digital Library Based on Content and Usage
title_short Clustering Articles in a Literature Digital Library Based on Content and Usage
title_full Clustering Articles in a Literature Digital Library Based on Content and Usage
title_fullStr Clustering Articles in a Literature Digital Library Based on Content and Usage
title_full_unstemmed Clustering Articles in a Literature Digital Library Based on Content and Usage
title_sort clustering articles in a literature digital library based on content and usage
publishDate 2004
url http://ndltd.ncl.edu.tw/handle/21635162477974617535
work_keys_str_mv AT kangditing clusteringarticlesinaliteraturedigitallibrarybasedoncontentandusage
AT dīngkāngdí clusteringarticlesinaliteraturedigitallibrarybasedoncontentandusage
AT kangditing jiéhéwénjiànnèirónghéshǐyòngjìlùdewénxiànshùwèitúshūguǎnwénjiànfēnqúnjìshù
AT dīngkāngdí jiéhéwénjiànnèirónghéshǐyòngjìlùdewénxiànshùwèitúshūguǎnwénjiànfēnqúnjìshù
_version_ 1717731626743496704