Interactive Information Retrieval Evaluations by Implementation of Terrier IR Platform: TREC 6 Dataset

碩士 === 輔仁大學 === 資訊管理學系碩士班 === 102 === In this study, we used TREC-6 document and topic sets which, through Text Retrieval Conference, served as our data set. We used different information retrieval models (i.e., TF_IDF, BM25, and Hiemstra_LM) and different tags (with title “T” and with title and des...

Full description

Bibliographic Details
Main Authors: CHIOU, SHAO-SYUAN, 邱紹軒
Other Authors: Wu, I-Chin
Format: Others
Language:zh-TW
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/hypstr
Description
Summary:碩士 === 輔仁大學 === 資訊管理學系碩士班 === 102 === In this study, we used TREC-6 document and topic sets which, through Text Retrieval Conference, served as our data set. We used different information retrieval models (i.e., TF_IDF, BM25, and Hiemstra_LM) and different tags (with title “T” and with title and description “T+D”) by Terrier IR Command Line Mode to retrieve documents from TREC-6. Our aim was to determine which model would achieve higher precision under different experimental settings. We then tested the query expansion function to investigate the effects of the length of topics for different IR models. Furthermore, we created an interactive information retrieval (IR) interface using a modified query expansion class provided by Terrier IR, and we selected two topics from the TREC-6 document and topic sets. The study involved 18 users who were recruited as our evaluation subjects. We aimed to investigate (1) the effectiveness of the three models in terms of the number of retrieved relevant documents by subject as well as average precision; (2) the effectiveness of the interactive IR with the help of term suggestions or without term suggestions. We also recorded and analyzed the subjects’ search behaviors using Morae software. The results indicate that in Command Line Mode, when we used title with description “T+D” but no query expansion, the language model of Hiemstra_LM showed the best performance. Interestingly, when we used query expansion, the Hiemstra_LM model demonstrated the worst performance. This suggests that this model as a Terrier IR tool is not fit to process topics with more words. In our proposed interactive IR interface, regardless of whether term suggestions were given, subjects who used the Hiemstra_LM model achieved the best performance. Moreover, when subjects used the interface with the term suggestion function, they were also able to find a greater number of relevant documents compared to the interface without term suggestion. Finally, according to our recording videos, we found that when using the interface without the term suggestion function, the Hiemstra_LM model let users find relevant documents more quickly and accurately. In conclusion, the Hiemstra_LM model is capable of building good index terms and suggesting proper terms for helping users achieve better interactive IR.