A Cluster-Based Information Presentation

碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant doc...

Full description

Bibliographic Details
Main Authors: Chi-Chou Chiang, 江季洲
Other Authors: Chiun-Chien Hsu
Format: Others
Language:zh-TW
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/43959373901816772375
id ndltd-TW-090NTUST396006
record_format oai_dc
spelling ndltd-TW-090NTUST3960062015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/43959373901816772375 A Cluster-Based Information Presentation 以分群為基礎的資訊呈現 Chi-Chou Chiang 江季洲 碩士 國立臺灣科技大學 資訊管理系 90 World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation. The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering(LDC), published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering(ILDC) to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit(NNH) and Common Nearest Neighbor(CNN). The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster. We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system. Chiun-Chien Hsu 徐俊傑 2002 學位論文 ; thesis 61 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation. The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering(LDC), published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering(ILDC) to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit(NNH) and Common Nearest Neighbor(CNN). The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster. We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system.
author2 Chiun-Chien Hsu
author_facet Chiun-Chien Hsu
Chi-Chou Chiang
江季洲
author Chi-Chou Chiang
江季洲
spellingShingle Chi-Chou Chiang
江季洲
A Cluster-Based Information Presentation
author_sort Chi-Chou Chiang
title A Cluster-Based Information Presentation
title_short A Cluster-Based Information Presentation
title_full A Cluster-Based Information Presentation
title_fullStr A Cluster-Based Information Presentation
title_full_unstemmed A Cluster-Based Information Presentation
title_sort cluster-based information presentation
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/43959373901816772375
work_keys_str_mv AT chichouchiang aclusterbasedinformationpresentation
AT jiāngjìzhōu aclusterbasedinformationpresentation
AT chichouchiang yǐfēnqúnwèijīchǔdezīxùnchéngxiàn
AT jiāngjìzhōu yǐfēnqúnwèijīchǔdezīxùnchéngxiàn
AT chichouchiang clusterbasedinformationpresentation
AT jiāngjìzhōu clusterbasedinformationpresentation
_version_ 1717756283862384640