A Preliminary Study of Text Retrieval Techniques Utilizing Character/Word Positions

碩士 === 國立臺灣大學 === 資訊管理研究所 === 88 === Text retrieval systems can be roughly categorized into two types : (a) systems with characters/words as index terms and (b) systems with phrases as index terms. The first type of systems can be implemented automatically. However, their retrieval results usually h...

Full description

Bibliographic Details
Main Authors: Lung-Chi Lin, 林隆祺
Other Authors: Yih-Kuen Tsay
Format: Others
Language:zh-TW
Published: 2000
Online Access:http://ndltd.ncl.edu.tw/handle/45115638176919765401
Description
Summary:碩士 === 國立臺灣大學 === 資訊管理研究所 === 88 === Text retrieval systems can be roughly categorized into two types : (a) systems with characters/words as index terms and (b) systems with phrases as index terms. The first type of systems can be implemented automatically. However, their retrieval results usually have high recall but low precision. The second type of systems can achieve high precision, but require human intervention in selecting key phrases of a document. Moreover, they have to deal with the phrase segmentation problem when handling a query. We seek a retrieval method that can achieve both high recall and high precision and also can be implemented automatically. This thesis proposes such a method that utilizes character/word positions. Though our method is suitable for Chinese/English text retrieval, we focus on its use in Chinese text retrieval. The main idea is to record the position of a character/word in the index. This extra information is then used to compute the similarity between the query and a stored document. We conduct a preliminary but systematic study of the algorithms for determining similarity that utilize a character/word index and we show by experiments that such algorithms do produce good retrieval results.