Clustering Articles in a Literature Digital Library Based on Content and Usage
碩士 === 國立中山大學 === 資訊管理學系研究所 === 92 === Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsi...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2004
|
Online Access: | http://ndltd.ncl.edu.tw/handle/21635162477974617535 |
id |
ndltd-TW-092NSYS5396069 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-092NSYS53960692015-10-13T13:05:08Z http://ndltd.ncl.edu.tw/handle/21635162477974617535 Clustering Articles in a Literature Digital Library Based on Content and Usage 結合文件內容和使用紀錄的文獻數位圖書館文件分群技術 Kang-Di Ting 丁康迪 碩士 國立中山大學 資訊管理學系研究所 92 Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis. Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user’s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster. San-Yih Hwang 黃三益 2004 學位論文 ; thesis 52 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立中山大學 === 資訊管理學系研究所 === 92 === Literature digital library is one of the most important resources to preserve civilized asset. To provide more effective and efficient information search, many systems are equipped with a browsing interface that aims to ease the article searching task. A browsing interface is associated with a subject directory, which guides the users to identify articles that need their information need. A subject directory contains a set (or a hierarchy) of subject categories, each containing a number of similar articles. How to group articles in a literature digital library is the theme of this thesis.
Previous work used either document classification or document clustering approaches to dispatching articles into a set of article clusters based on their content. We observed that articles that meet a single user’s information need may not necessarily fall in a single cluster. In this thesis, we propose to make use of both Web log and article content is clustering articles. We proposed two hybrid approaches, namely document categorization based method and document clustering based method. These alternatives were compared to other content-based methods. It has been found that the document categorization based method effectively reduces the number of required click-through at the expense of slight increase of entropy that measures the content heterogeneity of each generated cluster.
|
author2 |
San-Yih Hwang |
author_facet |
San-Yih Hwang Kang-Di Ting 丁康迪 |
author |
Kang-Di Ting 丁康迪 |
spellingShingle |
Kang-Di Ting 丁康迪 Clustering Articles in a Literature Digital Library Based on Content and Usage |
author_sort |
Kang-Di Ting |
title |
Clustering Articles in a Literature Digital Library Based on Content and Usage |
title_short |
Clustering Articles in a Literature Digital Library Based on Content and Usage |
title_full |
Clustering Articles in a Literature Digital Library Based on Content and Usage |
title_fullStr |
Clustering Articles in a Literature Digital Library Based on Content and Usage |
title_full_unstemmed |
Clustering Articles in a Literature Digital Library Based on Content and Usage |
title_sort |
clustering articles in a literature digital library based on content and usage |
publishDate |
2004 |
url |
http://ndltd.ncl.edu.tw/handle/21635162477974617535 |
work_keys_str_mv |
AT kangditing clusteringarticlesinaliteraturedigitallibrarybasedoncontentandusage AT dīngkāngdí clusteringarticlesinaliteraturedigitallibrarybasedoncontentandusage AT kangditing jiéhéwénjiànnèirónghéshǐyòngjìlùdewénxiànshùwèitúshūguǎnwénjiànfēnqúnjìshù AT dīngkāngdí jiéhéwénjiànnèirónghéshǐyòngjìlùdewénxiànshùwèitúshūguǎnwénjiànfēnqúnjìshù |
_version_ |
1717731626743496704 |