Sequence-Based Text Retrieval: Design and Implementation

碩士 === 國立臺灣大學 === 資訊管理研究所 === 90 === Information retrieval (IR) concerns how computers can help people effectively and efficiently find information that meets their needs. Most approaches to text information retrieval view documents and queries as sets of terms with or without weights and...

Full description

Bibliographic Details
Main Authors: Ching-Lin Yu, 游景麟
Other Authors: Tsay, Yih-Kuen
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/86301821483889612691
Description
Summary:碩士 === 國立臺灣大學 === 資訊管理研究所 === 90 === Information retrieval (IR) concerns how computers can help people effectively and efficiently find information that meets their needs. Most approaches to text information retrieval view documents and queries as sets of terms with or without weights and base their relevance judgement on term appearances. These approaches ignore information that can be derived from term positions. Such information can associate terms with terms and may be used to improve retrieval effectiveness. However, no text retrieval system seems to exploit positional information to its full potential. In this thesis, we develop a sequence-based IR approach motivated by the above observation. The sequence-based approach views documents and queries as sequences of terms and bases relevance judgement on sequence similarity, which is an generalization of string similarity. We focus on applying our approach to Chinese text retrieval. We implement a text retrieval system based on this approach and conduct a number of experiments to test the effectiveness of this approach. The experimental results show the preferability of our sequence-based approach compared to appearance-based approaches. We preliminarily discuss the influence of incorporating positional information on indexing. We also demonstrate the potential of our approach through extensions to our approach and integration with other IR models/systems.