The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset

碩士 === 淡江大學 === 資訊管理學系碩士班 === 95 === Usually the document ranking returned by the traditional vector space model of an information retrieval system is unorganized. It is often found that related documents do not have adjacent ranks. In order not to miss the needed information, the user still has to...

Full description

Bibliographic Details
Main Authors: Yung-Chieh Sung, 宋永杰
Other Authors: 魏世杰
Format: Others
Language:zh-TW
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/32516765200970344973
id ndltd-TW-095TKU05396026
record_format oai_dc
spelling ndltd-TW-095TKU053960262015-10-13T14:08:18Z http://ndltd.ncl.edu.tw/handle/32516765200970344973 The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset 以聚合法(AGNES)提升檢索效果之研究—以中文新聞為例 Yung-Chieh Sung 宋永杰 碩士 淡江大學 資訊管理學系碩士班 95 Usually the document ranking returned by the traditional vector space model of an information retrieval system is unorganized. It is often found that related documents do not have adjacent ranks. In order not to miss the needed information, the user still has to read several unrelated documents before finding another related document. In this research, we cluster the documents from the traditional vector space model based on the binary tree hierarchy constructed by the AGglomerative NESting (AGNES) algorithm. The clusters are ranked by the average of the coupling and the cohesion measures proposed in this thesis, and the documents in the cluster are ranked by the similarity between the query and the document. We try to improve the precision by such ranking adjustment. We used the Chinese news dataset and went through the word segmentation, vector representation, AGNES clustering, query based document retrieval and the final ranking adjustments for evaluation. As result, our system can improve the precision by 20.9% to 24.0% compared to the traditional vector space model. We also tested the result by the Wilcoxon Signed Ranks Test. It shows that our system is significantly better than the traditional vector space model for queries of one or two keywords. 魏世杰 2007 學位論文 ; thesis 54 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 淡江大學 === 資訊管理學系碩士班 === 95 === Usually the document ranking returned by the traditional vector space model of an information retrieval system is unorganized. It is often found that related documents do not have adjacent ranks. In order not to miss the needed information, the user still has to read several unrelated documents before finding another related document. In this research, we cluster the documents from the traditional vector space model based on the binary tree hierarchy constructed by the AGglomerative NESting (AGNES) algorithm. The clusters are ranked by the average of the coupling and the cohesion measures proposed in this thesis, and the documents in the cluster are ranked by the similarity between the query and the document. We try to improve the precision by such ranking adjustment. We used the Chinese news dataset and went through the word segmentation, vector representation, AGNES clustering, query based document retrieval and the final ranking adjustments for evaluation. As result, our system can improve the precision by 20.9% to 24.0% compared to the traditional vector space model. We also tested the result by the Wilcoxon Signed Ranks Test. It shows that our system is significantly better than the traditional vector space model for queries of one or two keywords.
author2 魏世杰
author_facet 魏世杰
Yung-Chieh Sung
宋永杰
author Yung-Chieh Sung
宋永杰
spellingShingle Yung-Chieh Sung
宋永杰
The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
author_sort Yung-Chieh Sung
title The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
title_short The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
title_full The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
title_fullStr The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
title_full_unstemmed The Research on Improving the Performance of Information Retrieval with the AGglomerative NESting (AGNES) Algorithm — Using a Chinese News Dataset
title_sort research on improving the performance of information retrieval with the agglomerative nesting (agnes) algorithm — using a chinese news dataset
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/32516765200970344973
work_keys_str_mv AT yungchiehsung theresearchonimprovingtheperformanceofinformationretrievalwiththeagglomerativenestingagnesalgorithmusingachinesenewsdataset
AT sòngyǒngjié theresearchonimprovingtheperformanceofinformationretrievalwiththeagglomerativenestingagnesalgorithmusingachinesenewsdataset
AT yungchiehsung yǐjùhéfǎagnestíshēngjiǎnsuǒxiàoguǒzhīyánjiūyǐzhōngwénxīnwénwèilì
AT sòngyǒngjié yǐjùhéfǎagnestíshēngjiǎnsuǒxiàoguǒzhīyánjiūyǐzhōngwénxīnwénwèilì
AT yungchiehsung researchonimprovingtheperformanceofinformationretrievalwiththeagglomerativenestingagnesalgorithmusingachinesenewsdataset
AT sòngyǒngjié researchonimprovingtheperformanceofinformationretrievalwiththeagglomerativenestingagnesalgorithmusingachinesenewsdataset
_version_ 1717749103827353600