A Cluster-Based Information Presentation
碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant doc...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2002
|
Online Access: | http://ndltd.ncl.edu.tw/handle/43959373901816772375 |
id |
ndltd-TW-090NTUST396006 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-090NTUST3960062015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/43959373901816772375 A Cluster-Based Information Presentation 以分群為基礎的資訊呈現 Chi-Chou Chiang 江季洲 碩士 國立臺灣科技大學 資訊管理系 90 World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation. The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering(LDC), published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering(ILDC) to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit(NNH) and Common Nearest Neighbor(CNN). The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster. We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system. Chiun-Chien Hsu 徐俊傑 2002 學位論文 ; thesis 61 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web (WWW)is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation.
The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering(LDC), published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering(ILDC) to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit(NNH) and Common Nearest Neighbor(CNN).
The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster.
We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system.
|
author2 |
Chiun-Chien Hsu |
author_facet |
Chiun-Chien Hsu Chi-Chou Chiang 江季洲 |
author |
Chi-Chou Chiang 江季洲 |
spellingShingle |
Chi-Chou Chiang 江季洲 A Cluster-Based Information Presentation |
author_sort |
Chi-Chou Chiang |
title |
A Cluster-Based Information Presentation |
title_short |
A Cluster-Based Information Presentation |
title_full |
A Cluster-Based Information Presentation |
title_fullStr |
A Cluster-Based Information Presentation |
title_full_unstemmed |
A Cluster-Based Information Presentation |
title_sort |
cluster-based information presentation |
publishDate |
2002 |
url |
http://ndltd.ncl.edu.tw/handle/43959373901816772375 |
work_keys_str_mv |
AT chichouchiang aclusterbasedinformationpresentation AT jiāngjìzhōu aclusterbasedinformationpresentation AT chichouchiang yǐfēnqúnwèijīchǔdezīxùnchéngxiàn AT jiāngjìzhōu yǐfēnqúnwèijīchǔdezīxùnchéngxiàn AT chichouchiang clusterbasedinformationpresentation AT jiāngjìzhōu clusterbasedinformationpresentation |
_version_ |
1717756283862384640 |