A Cluster-Based Information Presentation

碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web （WWW）is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant doc...

Full description

Bibliographic Details
Main Authors:	Chi-Chou Chiang, 江季洲
Other Authors:	Chiun-Chien Hsu
Format:	Others
Language:	zh-TW
Published:	2002
Online Access:	http://ndltd.ncl.edu.tw/handle/43959373901816772375

id	ndltd-TW-090NTUST396006
record_format	oai_dc
spelling	ndltd-TW-090NTUST3960062015-10-13T14:41:23Z http://ndltd.ncl.edu.tw/handle/43959373901816772375 A Cluster-Based Information Presentation 以分群為基礎的資訊呈現 Chi-Chou Chiang 江季洲碩士國立臺灣科技大學資訊管理系 90 World Wide Web （WWW）is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation. The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering（LDC）, published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering（ILDC） to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit（NNH） and Common Nearest Neighbor（CNN）. The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster. We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system. Chiun-Chien Hsu 徐俊傑 2002 學位論文 ; thesis 61 zh-TW
collection	NDLTD
language	zh-TW
format	Others
sources	NDLTD
description	碩士 === 國立臺灣科技大學 === 資訊管理系 === 90 === World Wide Web （WWW）is the most popular platform in the world and information retrieval is one of the most important services offered in WWW. Users in WWW often utilize search engine to obtain relevant documents and complete their daily work. Finding relevant documents is a difficult job because users have to wade through a large set of returned documents before finding them. One approach clusters and reorganizes the returned documents as well as presents the clustered results to the users, which could alleviate users’ searching loads. The approach could be though as cluster-based information presentation. The thesis begins from investigating document clustering methods and finds that one method, Lightweight Document Clustering（LDC）, published by Weiss has many interesting properties. After studying LDC, we propose Improved Lightweight Document Clustering（ILDC） to prevent some clustering misses that could happen in LDC. Because LDC is a clustering method based on documents’ nearest neighbors, we also present two methods based on documents’ nearest neighbors, named as Nearest Neighbor Hit（NNH） and Common Nearest Neighbor（CNN）. The idea of NNH is to calculate the number that each document becomes the nearest neighbors of other documents and to cluster documents according to the nearest neighbor’s hit times of a document. The idea of CNN is to use a data structure called suffix tree to find documents that have the common nearest neighbors. Then these documents are put into the same cluster. We evaluate and analyze our three proposed methods by performing a lot of experiments. Moreover, we choose the best performed method, CNN, to implement a cluster-based user document query system.
author2	Chiun-Chien Hsu
author_facet	Chiun-Chien Hsu Chi-Chou Chiang 江季洲
author	Chi-Chou Chiang 江季洲
spellingShingle	Chi-Chou Chiang 江季洲 A Cluster-Based Information Presentation
author_sort	Chi-Chou Chiang
title	A Cluster-Based Information Presentation
title_short	A Cluster-Based Information Presentation
title_full	A Cluster-Based Information Presentation
title_fullStr	A Cluster-Based Information Presentation
title_full_unstemmed	A Cluster-Based Information Presentation
title_sort	cluster-based information presentation
publishDate	2002
url	http://ndltd.ncl.edu.tw/handle/43959373901816772375
work_keys_str_mv	AT chichouchiang aclusterbasedinformationpresentation AT jiāngjìzhōu aclusterbasedinformationpresentation AT chichouchiang yǐfēnqúnwèijīchǔdezīxùnchéngxiàn AT jiāngjìzhōu yǐfēnqúnwèijīchǔdezīxùnchéngxiàn AT chichouchiang clusterbasedinformationpresentation AT jiāngjìzhōu clusterbasedinformationpresentation
_version_	1717756283862384640

A Cluster-Based Information Presentation

Similar Items